unique gene clusters: Topics by Science.gov

Sample records for unique gene clusters

Conditional clustering of temporal expression profiles

PubMed Central

Wang, Ling; Montano, Monty; Rarick, Matt; Sebastiani, Paola

2008-01-01

Background Many microarray experiments produce temporal profiles in different biological conditions but common cluster techniques are not able to analyze the data conditional on the biological conditions. Results This article presents a novel technique to cluster data from time course microarray experiments performed across several experimental conditions. Our algorithm uses polynomial models to describe the gene expression patterns over time, a full Bayesian approach with proper conjugate priors to make the algorithm invariant to linear transformations, and an iterative procedure to identify genes that have a common temporal expression profile across two or more experimental conditions, and genes that have a unique temporal profile in a specific condition. Conclusion We use simulated data to evaluate the effectiveness of this new algorithm in finding the correct number of clusters and in identifying genes with common and unique profiles. We also use the algorithm to characterize the response of human T cells to stimulations of antigen-receptor signaling gene expression temporal profiles measured in six different biological conditions and we identify common and unique genes. These studies suggest that the methodology proposed here is useful in identifying and distinguishing uniquely stimulated genes from commonly stimulated genes in response to variable stimuli. Software for using this clustering method is available from the project home page. PMID:18334028
A Putative Gene Cluster from a Lyngbya wollei Bloom that Encodes Paralytic Shellfish Toxin Biosynthesis

PubMed Central

Mihali, Troco K.; Carmichael, Wayne W.; Neilan, Brett A.

2011-01-01

Saxitoxin and its analogs cause the paralytic shellfish-poisoning syndrome, adversely affecting human health and coastal shellfish industries worldwide. Here we report the isolation, sequencing, annotation, and predicted pathway of the saxitoxin biosynthetic gene cluster in the cyanobacterium Lyngbya wollei. The gene cluster spans 36 kb and encodes enzymes for the biosynthesis and export of the toxins. The Lyngbya wollei saxitoxin gene cluster differs from previously identified saxitoxin clusters as it contains genes that are unique to this cluster, whereby the carbamoyltransferase is truncated and replaced by an acyltransferase, explaining the unique toxin profile presented by Lyngbya wollei. These findings will enable the creation of toxin probes, for water monitoring purposes, as well as proof-of-concept for the combinatorial biosynthesis of these natural occurring alkaloids for the production of novel, biologically active compounds. PMID:21347365
Comparative analyses of Xanthomonas and Xylella complete genomes.

PubMed

Moreira, Leandro M; De Souza, Robson F; Digiampietri, Luciano A; Da Silva, Ana C R; Setubal, João C

2005-01-01

Computational analyses of four bacterial genomes of the Xanthomonadaceae family reveal new unique genes that may be involved in adaptation, pathogenicity, and host specificity. The Xanthomonas genus presents 3636 unique genes distributed in 1470 families, while Xylella genus presents 1026 unique genes distributed in 375 families. Among Xanthomonas-specific genes, we highlight a large number of cell wall degrading enzymes, proteases, and iron receptors, a set of energy metabolism genes, second copy of the type II secretion system, type III secretion system, flagella and chemotactic machinery, and the xanthomonadin synthesis gene cluster. Important genes unique to the Xylella genus are an additional copy of a type IV pili gene cluster and the complete machinery of colicin V synthesis and secretion. Intersections of gene sets from both genera reveal a cluster of genes homologous to Salmonella's SPI-7 island in Xanthomonas axonopodis pv citri and Xylella fastidiosa 9a5c, which might be involved in host specificity. Each genome also presents important unique genes, such as an HMS cluster, the kdgT gene, and O-antigen in Xanthomonas axonopodis pv citri; a number of avrBS genes and a distinct O-antigen in Xanthomonas campestris pv campestris, a type I restriction-modification system and a nickase gene in Xylella fastidiosa 9a5c, and a type II restriction-modification system and two genes related to peptidoglycan biosynthesis in Xylella fastidiosa temecula 1. All these differences imply a considerable number of gene gains and losses during the divergence of the four lineages, and are associated with structural genome modifications that may have a direct relation with the mode of transmission, adaptation to specific environments and pathogenicity of each organism.
Comparative genomics reveals phylogenetic distribution patterns of secondary metabolites in Amycolatopsis species.

PubMed

Adamek, Martina; Alanjary, Mohammad; Sales-Ortells, Helena; Goodfellow, Michael; Bull, Alan T; Winkler, Anika; Wibberg, Daniel; Kalinowski, Jörn; Ziemert, Nadine

2018-06-01

Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis' strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.
Clustered Xenopus keratin genes: A genomic, transcriptomic, and proteomic analysis.

PubMed

Suzuki, Ken-Ichi T; Suzuki, Miyuki; Shigeta, Mitsuki; Fortriede, Joshua D; Takahashi, Shuji; Mawaribuchi, Shuuji; Yamamoto, Takashi; Taira, Masanori; Fukui, Akimasa

2017-06-15

Keratin genes belong to the intermediate filament superfamily and their expression is altered following morphological and physiological changes in vertebrate epithelial cells. Keratin genes are divided into two groups, type I and II, and are clustered on vertebrate genomes, including those of Xenopus species. Various keratin genes have been identified and characterized by their unique expression patterns throughout ontogeny in Xenopus laevis; however, compilation of previously reported and newly identified keratin genes in two Xenopus species is required for our further understanding of keratin gene evolution, not only in amphibians but also in all terrestrial vertebrates. In this study, 120 putative type I and II keratin genes in total were identified based on the genome data from two Xenopus species. We revealed that most of these genes are highly clustered on two homeologous chromosomes, XLA9_10 and XLA2 in X. laevis, and XTR10 and XTR2 in X. tropicalis, which are orthologous to those of human, showing conserved synteny among tetrapods. RNA-Seq data from various embryonic stages and adult tissues highlighted the unique expression profiles of orthologous and homeologous keratin genes in developmental stage- and tissue-specific manners. Moreover, we identified dozens of epidermal keratin proteins from the whole embryo, larval skin, tail, and adult skin using shotgun proteomics. In light of our results, we discuss the radiation, diversification, and unique expression of the clustered keratin genes, which are closely related to epidermal development and terrestrial adaptation during amphibian evolution, including Xenopus speciation. Copyright © 2016 Elsevier Inc. All rights reserved.
Epigenetic transgenerational inheritance of somatic transcriptomes and epigenetic control regions

PubMed Central

2012-01-01

Background Environmentally induced epigenetic transgenerational inheritance of adult onset disease involves a variety of phenotypic changes, suggesting a general alteration in genome activity. Results Investigation of different tissue transcriptomes in male and female F3 generation vinclozolin versus control lineage rats demonstrated all tissues examined had transgenerational transcriptomes. The microarrays from 11 different tissues were compared with a gene bionetwork analysis. Although each tissue transgenerational transcriptome was unique, common cellular pathways and processes were identified between the tissues. A cluster analysis identified gene modules with coordinated gene expression and each had unique gene networks regulating tissue-specific gene expression and function. A large number of statistically significant over-represented clusters of genes were identified in the genome for both males and females. These gene clusters ranged from 2-5 megabases in size, and a number of them corresponded to the epimutations previously identified in sperm that transmit the epigenetic transgenerational inheritance of disease phenotypes. Conclusions Combined observations demonstrate that all tissues derived from the epigenetically altered germ line develop transgenerational transcriptomes unique to the tissue, but common epigenetic control regions in the genome may coordinately regulate these tissue-specific transcriptomes. This systems biology approach provides insight into the molecular mechanisms involved in the epigenetic transgenerational inheritance of a variety of adult onset disease phenotypes. PMID:23034163
Degradation of Benzene by Pseudomonas veronii 1YdBTEX2 and 1YB2 Is Catalyzed by Enzymes Encoded in Distinct Catabolism Gene Clusters.

PubMed

de Lima-Morales, Daiana; Chaves-Moreno, Diego; Wos-Oxley, Melissa L; Jáuregui, Ruy; Vilchez-Vargas, Ramiro; Pieper, Dietmar H

2016-01-01

Pseudomonas veronii 1YdBTEX2, a benzene and toluene degrader, and Pseudomonas veronii 1YB2, a benzene degrader, have previously been shown to be key players in a benzene-contaminated site. These strains harbor unique catabolic pathways for the degradation of benzene comprising a gene cluster encoding an isopropylbenzene dioxygenase where genes encoding downstream enzymes were interrupted by stop codons. Extradiol dioxygenases were recruited from gene clusters comprising genes encoding a 2-hydroxymuconic semialdehyde dehydrogenase necessary for benzene degradation but typically absent from isopropylbenzene dioxygenase-encoding gene clusters. The benzene dihydrodiol dehydrogenase-encoding gene was not clustered with any other aromatic degradation genes, and the encoded protein was only distantly related to dehydrogenases of aromatic degradation pathways. The involvement of the different gene clusters in the degradation pathways was suggested by real-time quantitative reverse transcription PCR. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
The human TREM gene cluster at 6p21.1 encodes both activating and inhibitory single IgV domain receptors and includes NKp44.

PubMed

Allcock, Richard J N; Barrow, Alexander D; Forbes, Simon; Beck, Stephan; Trowsdale, John

2003-02-01

We have characterized a cluster of single immunoglobulin variable (IgV) domain receptors centromeric of the major histocompatibility complex (MHC) on human chromosome 6. In addition to triggering receptor expressed on myeloid cells (TREM)-1 and TREM2, the cluster contains NKp44, a triggering receptor whose expression is limited to NK cells. We identified three new related genes and two gene fragments within a cluster of approximately 200 kb. Two of the three new genes lack charged residues in their transmembrane domain tails. Further, one of the genes contains two potential immunotyrosine Inhibitory motifs in its cytoplasmic tail, suggesting that it delivers inhibitory signals. The human and mouse TREM clusters appear to have diverged such that there are unique sequences in each species. Finally, each gene in the TREM cluster was expressed in a different range of cell types.
Radiation-induced gene expression in the nematode Caenorhabditis elegans

NASA Technical Reports Server (NTRS)

Nelson, Gregory A.; Jones, Tamako A.; Chesnut, Aaron; Smith, Anna L.

2002-01-01

We used the nematode C. elegans to characterize the genotoxic and cytotoxic effects of ionizing radiation in a simple animal model emphasizing the unique effects of charged particle radiation. Here we demonstrate by RT-PCR differential display and whole genome microarray hybridization experiments that gamma rays, accelerated protons and iron ions at the same physical dose lead to unique transcription profiles. 599 of 17871 genes analyzed (3.4%) showed differential expression 3 hrs after exposure to 3 Gy of radiation. 193 were up-regulated, 406 were down-regulated and 90% were affected only by a single species of radiation. A novel statistical clustering technique identified the regulatory relationships between the radiation-modulated genes and showed that genes affected by each radiation species were associated with unique regulatory clusters. This suggests that independent homeostatic mechanisms are activated in response to radiation exposure as a function of track structure or ionization density.
Unusual Gene Order and Organization of the Sea Urchin Hox Cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cameron, R A; Rowen, L; Nesbitt, R

2005-10-11

The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3 gene is Hox5. (The gene order is :more » 5-Hox1, 2, 3, 11/13c, 11/13b, 11/13a, 9/10, 8, 7, 6, 5 - 3). The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.« less
Unusual Gene Order and Organization of the Sea Urchin HoxCluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew

2005-05-10

The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is :more » 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.« less
CYP76M7 Is an ent-Cassadiene C11α-Hydroxylase Defining a Second Multifunctional Diterpenoid Biosynthetic Gene Cluster in Rice[W][OA

PubMed Central

Swaminathan, Sivakumar; Morrone, Dana; Wang, Qiang; Fulton, D. Bruce; Peters, Reuben J.

2009-01-01

Biosynthetic gene clusters are common in microbial organisms, but rare in plants, raising questions regarding the evolutionary forces that drive their assembly in multicellular eukaryotes. Here, we characterize the biochemical function of a rice (Oryza sativa) cytochrome P450 monooxygenase, CYP76M7, which seems to act in the production of antifungal phytocassanes and defines a second diterpenoid biosynthetic gene cluster in rice. This cluster is uniquely multifunctional, containing enzymatic genes involved in the production of two distinct sets of phytoalexins, the antifungal phytocassanes and antibacterial oryzalides/oryzadiones, with the corresponding genes being subject to distinct transcriptional regulation. The lack of uniform coregulation of the genes within this multifunctional cluster suggests that this was not a primary driving force in its assembly. However, the cluster is dedicated to specialized metabolism, as all genes in the cluster are involved in phytoalexin metabolism. We hypothesize that this dedication to specialized metabolism led to the assembly of the corresponding biosynthetic gene cluster. Consistent with this hypothesis, molecular phylogenetic comparison demonstrates that the two rice diterpenoid biosynthetic gene clusters have undergone independent elaboration to their present-day forms, indicating continued evolutionary pressure for coclustering of enzymatic genes encoding components of related biosynthetic pathways. PMID:19825834
Serial analysis of gene expression (SAGE) in normal human trabecular meshwork.

PubMed

Liu, Yutao; Munro, Drew; Layfield, David; Dellinger, Andrew; Walter, Jeffrey; Peterson, Katherine; Rickman, Catherine Bowes; Allingham, R Rand; Hauser, Michael A

2011-04-08

To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma. Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map. A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified. This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.
White lupin cluster root acclimation to phosphorus deficiency and root hair development involve unique glycerophosphodiester phosphodiesterases

USDA-ARS?s Scientific Manuscript database

White lupin (Lupinus albus L.) is a phosphate (Pi) deficiency tolerant legume which develops short, densely clustered tertiary lateral roots (cluster/proteoid roots) in response to Pi limitation. In this report we characterize two glycerophosphodiester phosphodiesterase (GPX-PDE) genes (GPX-PDE1 and...
ClusterMine360: a database of microbial PKS/NRPS biosynthesis

PubMed Central

Conway, Kyle R.; Boddy, Christopher N.

2013-01-01

ClusterMine360 (http://www.clustermine360.ca/) is a database of microbial polyketide and non-ribosomal peptide gene clusters. It takes advantage of crowd-sourcing by allowing members of the community to make contributions while automation is used to help achieve high data consistency and quality. The database currently has >200 gene clusters from >185 compound families. It also features a unique sequence repository containing >10 000 polyketide synthase/non-ribosomal peptide synthetase domains. The sequences are filterable and downloadable as individual or multiple sequence FASTA files. We are confident that this database will be a useful resource for members of the polyketide synthases/non-ribosomal peptide synthetases research community, enabling them to keep up with the growing number of sequenced gene clusters and rapidly mine these clusters for functional information. PMID:23104377
Identification of an unusual type II thioesterase in the dithiolopyrrolone antibiotics biosynthetic pathway

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhai, Ying; Bai, Silei; Liu, Jingjing

Dithiolopyrrolone group antibiotics characterized by an electronically unique dithiolopyrrolone heterobicyclic core are known for their antibacterial, antifungal, insecticidal and antitumor activities. Recently the biosynthetic gene clusters for two dithiolopyrrolone compounds, holomycin and thiomarinol, have been identified respectively in different bacterial species. Here, we report a novel dithiolopyrrolone biosynthetic gene cluster (aut) isolated from Streptomyces thioluteus DSM 40027 which produces two pyrrothine derivatives, aureothricin and thiolutin. By comparison with other characterized dithiolopyrrolone clusters, eight genes in the aut cluster were verified to be responsible for the assembly of dithiolopyrrolone core. The aut cluster was further confirmed by heterologous expression and in-framemore » gene deletion experiments. Intriguingly, we found that the heterogenetic thioesterase HlmK derived from the holomycin (hlm) gene cluster in Streptomyces clavuligerus significantly improved heterologous biosynthesis of dithiolopyrrolones in Streptomyces albus through coexpression with the aut cluster. In the previous studies, HlmK was considered invalid because it has a Ser to Gly point mutation within the canonical Ser-His-Asp catalytic triad of thioesterases. However, gene inactivation and complementation experiments in our study unequivocally demonstrated that HlmK is an active distinctive type II thioesterase that plays a beneficial role in dithiolopyrrolone biosynthesis. - Highlights: • Cloning of the aureothricin biosynthetic gene cluster from Streptomyces thioluteus DSM 40027. • Identification of the aureothricin gene cluster by heterologous expression and in-frame gene deletion. • The heterogenetic thioesterase HlmK significantly improved dithiolopyrrolones production of the aureothricin gene cluster. • Identification of HlmK as an unusual type II thioesterase.« less
Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

PubMed

Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

2016-12-01

Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.
Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters

PubMed Central

Schorn, Michelle A.; Alanjary, Mohammad M.; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R.; Ziemert, Nadine

2016-01-01

Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites. PMID:27902408
Genetic homogeneity of Clostridium botulinum type A1 strains with unique toxin gene clusters.

PubMed

Raphael, Brian H; Luquez, Carolina; McCroskey, Loretta M; Joseph, Lavin A; Jacobson, Mark J; Johnson, Eric A; Maslanka, Susan E; Andreadis, Joanne D

2008-07-01

A group of five clonally related Clostridium botulinum type A strains isolated from different sources over a period of nearly 40 years harbored several conserved genetic properties. These strains contained a variant bont/A1 with five nucleotide polymorphisms compared to the gene in C. botulinum strain ATCC 3502. The strains also had a common toxin gene cluster composition (ha-/orfX+) similar to that associated with bont/A in type A strains containing an unexpressed bont/B [termed A(B) strains]. However, bont/B was not identified in the strains examined. Comparative genomic hybridization demonstrated identical genomic content among the strains relative to C. botulinum strain ATCC 3502. In addition, microarray data demonstrated the absence of several genes flanking the toxin gene cluster among the ha-/orfX+ A1 strains, suggesting the presence of genomic rearrangements with respect to this region compared to the C. botulinum ATCC 3502 strain. All five strains were shown to have identical flaA variable region nucleotide sequences. The pulsed-field gel electrophoresis patterns of the strains were indistinguishable when digested with SmaI, and a shift in the size of at least one band was observed in a single strain when digested with XhoI. These results demonstrate surprising genomic homogeneity among a cluster of unique C. botulinum type A strains of diverse origin.
Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species

PubMed Central

Lind, Abigail L.; Wisecaver, Jennifer H.; Lameiras, Catarina; Wiemann, Philipp; Palmer, Jonathan M.; Keller, Nancy P.; Rodrigues, Fernando; Goldman, Gustavo H.

2017-01-01

Filamentous fungi produce a diverse array of secondary metabolites (SMs) critical for defense, virulence, and communication. The metabolic pathways that produce SMs are found in contiguous gene clusters in fungal genomes, an atypical arrangement for metabolic pathways in other eukaryotes. Comparative studies of filamentous fungal species have shown that SM gene clusters are often either highly divergent or uniquely present in one or a handful of species, hampering efforts to determine the genetic basis and evolutionary drivers of SM gene cluster divergence. Here, we examined SM variation in 66 cosmopolitan strains of a single species, the opportunistic human pathogen Aspergillus fumigatus. Investigation of genome-wide within-species variation revealed 5 general types of variation in SM gene clusters: nonfunctional gene polymorphisms; gene gain and loss polymorphisms; whole cluster gain and loss polymorphisms; allelic polymorphisms, in which different alleles corresponded to distinct, nonhomologous clusters; and location polymorphisms, in which a cluster was found to differ in its genomic location across strains. These polymorphisms affect the function of representative A. fumigatus SM gene clusters, such as those involved in the production of gliotoxin, fumigaclavine, and helvolic acid as well as the function of clusters with undefined products. In addition to enabling the identification of polymorphisms, the detection of which requires extensive genome-wide synteny conservation (e.g., mobile gene clusters and nonhomologous cluster alleles), our approach also implicated multiple underlying genetic drivers, including point mutations, recombination, and genomic deletion and insertion events as well as horizontal gene transfer from distant fungi. Finally, most of the variants that we uncover within A. fumigatus have been previously hypothesized to contribute to SM gene cluster diversity across entire fungal classes and phyla. We suggest that the drivers of genetic diversity operating within a fungal species shown here are sufficient to explain SM cluster macroevolutionary patterns. PMID:29149178

Identification of an unusual type II thioesterase in the dithiolopyrrolone antibiotics biosynthetic pathway.

PubMed

Zhai, Ying; Bai, Silei; Liu, Jingjing; Yang, Liyuan; Han, Li; Huang, Xueshi; He, Jing

2016-04-22

Dithiolopyrrolone group antibiotics characterized by an electronically unique dithiolopyrrolone heterobicyclic core are known for their antibacterial, antifungal, insecticidal and antitumor activities. Recently the biosynthetic gene clusters for two dithiolopyrrolone compounds, holomycin and thiomarinol, have been identified respectively in different bacterial species. Here, we report a novel dithiolopyrrolone biosynthetic gene cluster (aut) isolated from Streptomyces thioluteus DSM 40027 which produces two pyrrothine derivatives, aureothricin and thiolutin. By comparison with other characterized dithiolopyrrolone clusters, eight genes in the aut cluster were verified to be responsible for the assembly of dithiolopyrrolone core. The aut cluster was further confirmed by heterologous expression and in-frame gene deletion experiments. Intriguingly, we found that the heterogenetic thioesterase HlmK derived from the holomycin (hlm) gene cluster in Streptomyces clavuligerus significantly improved heterologous biosynthesis of dithiolopyrrolones in Streptomyces albus through coexpression with the aut cluster. In the previous studies, HlmK was considered invalid because it has a Ser to Gly point mutation within the canonical Ser-His-Asp catalytic triad of thioesterases. However, gene inactivation and complementation experiments in our study unequivocally demonstrated that HlmK is an active distinctive type II thioesterase that plays a beneficial role in dithiolopyrrolone biosynthesis. Copyright © 2016 Elsevier Inc. All rights reserved.
Genome-Wide Analysis of Secondary Metabolite Gene Clusters in Ophiostoma ulmi and Ophiostoma novo-ulmi Reveals a Fujikurin-Like Gene Cluster with a Putative Role in Infection.

PubMed

Sbaraini, Nicolau; Andreis, Fábio C; Thompson, Claudia E; Guedes, Rafael L M; Junges, Ângela; Campos, Thais; Staats, Charley C; Vainstein, Marilene H; Ribeiro de Vasconcelos, Ana T; Schrank, Augusto

2017-01-01

The emergence of new microbial pathogens can result in destructive outbreaks, since their hosts have limited resistance and pathogens may be excessively aggressive. Described as the major ecological incident of the twentieth century, Dutch elm disease, caused by ascomycete fungi from the Ophiostoma genus, has caused a significant decline in elm tree populations ( Ulmus sp.) in North America and Europe. Genome sequencing of the two main causative agents of Dutch elm disease ( Ophiostoma ulmi and Ophiostoma novo-ulmi ), along with closely related species with different lifestyles, allows for unique comparisons to be made to identify how pathogens and virulence determinants have emerged. Among several established virulence determinants, secondary metabolites (SMs) have been suggested to play significant roles during phytopathogen infection. Interestingly, the secondary metabolism of Dutch elm pathogens remains almost unexplored, and little is known about how SM biosynthetic genes are organized in these species. To better understand the metabolic potential of O. ulmi and O. novo-ulmi , we performed a deep survey and description of SM biosynthetic gene clusters (BGCs) in these species and assessed their conservation among eight species from the Ophiostomataceae family. Among 19 identified BGCs, a fujikurin-like gene cluster (OpPKS8) was unique to Dutch elm pathogens. Phylogenetic analysis revealed that orthologs for this gene cluster are widespread among phytopathogens and plant-associated fungi, suggesting that OpPKS8 may have been horizontally acquired by the Ophiostoma genus. Moreover, the detailed identification of several BGCs paves the way for future in-depth research and supports the potential impact of secondary metabolism on Ophiostoma genus' lifestyle.
Comparison of 16S ribosomal RNA genes in Clavibacter michiganensis subspecies with other coryneform bacteria.

PubMed

Li, X; De Boer, S H

1995-10-01

Nearly complete sequences (97-99%) of the 16S rRNA genes were determined for type strains of Clavibacter michiganensis subsp. michiganensis, Clavibacter michiganensis subsp. insidiosus, Clavibacter michiganensis subsp. sepedonicus, and Clavibacter michiganensis subsp. nebraskensis. The four subspecies had less than 1% dissimilarity in their 16S rRNA genes. Comparative studies indicated that the C. michiganensis subsp. shared relatively high homology with the 16S rRNA gene of Clavibacter xyli. Further comparison with representatives of other Gram-positive coryneform and related bacteria with high G+C% values showed that this group of bacteria was subdivided into three clusters. One cluster consisted of the Clavibacter michiganensis subsp., Clavibacter xyli, Arthrobacter globiformis, Arthrobacter simplex, and Frankia sp.; another cluster consisted of members of the corynebacteria-mycobacteria-nocardia (CMN) group of Mycobacteriaceae including Tsukamurella paurometabolum; and Propionibacterium freudenreichii alone formed a unique cluster, which was remote from other coryneform bacteria analyzed. The three clusters may reflect a systematic rank higher than the genus level among these bacteria.
Characterisation of the paralytic shellfish toxin biosynthesis gene clusters in Anabaena circinalis AWQC131C and Aphanizomenon sp. NH-5.

PubMed

Mihali, Troco K; Kellmann, Ralf; Neilan, Brett A

2009-03-30

Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs) are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved in the biosynthesis, may also afford the identification of these gene clusters in dinoflagellates, the cause of human mortalities and significant financial loss to the tourism and shellfish industries.
Characterisation of the paralytic shellfish toxin biosynthesis gene clusters in Anabaena circinalis AWQC131C and Aphanizomenon sp. NH-5

PubMed Central

Mihali, Troco K; Kellmann, Ralf; Neilan, Brett A

2009-01-01

Background Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs) are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. Results We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. Conclusion The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved in the biosynthesis, may also afford the identification of these gene clusters in dinoflagellates, the cause of human mortalities and significant financial loss to the tourism and shellfish industries. PMID:19331657
Identification of the first diphenyl ether gene cluster for pestheic acid biosynthesis in plant endophyte Pestalotiopsis fici.

PubMed

Xu, Xinxin; Liu, Ling; Zhang, Fan; Wang, Wenzhao; Li, Jinyang; Guo, Liangdong; Che, Yongsheng; Liu, Gang

2014-01-24

The diphenyl ether pestheic acid was isolated from the endophytic fungus Pestalotiopsis fici, which is proposed to be the biosynthetic precursor of the unique chloropupukeananes. The pestheic acid biosynthetic gene (pta) cluster was identified in the fungus through genome scanning. Sequence analysis revealed that this gene cluster encodes a nonreducing polyketide synthase, a number of modification enzymes, and three regulators. Gene disruption and intermediate analysis demonstrated that the biosynthesis proceeded through formation of the polyketide backbone, cyclization of a polyketo acid to a benzophenone, chlorination, and formation of the diphenyl ether skeleton through oxidation and hydrolyzation. A dihydrogeodin oxidase gene, ptaE, was essential for diphenyl ether formation, and ptaM encoded a flavin-dependent halogenase catalyzing chlorination in the biosynthesis. Identification of the pta cluster laid the foundation to decipher the genetic and biochemical mechanisms involved in the pathway. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Molecular Evolution of Clustered MIC-3 (Meloidogyne Induced Cotton -3) Multigene Family of Gossypium Species

USDA-ARS?s Scientific Manuscript database

Uniqueness, content, localization, and defense-related features of the root-knot nematode resistance-associated MIC-3 multigene cluster in the genus Gossypium are all of interest for molecular evolutionary studies of duplicate genes in allopolyploids. Here we report molecular evolutionary rates of t...
Evolution of Chemical Diversity in Echinocandin Lipopeptide Antifungal Metabolites

PubMed Central

Yue, Qun; Chen, Li; Zhang, Xiaoling; Li, Kuan; Sun, Jingzu; Liu, Xingzhong

2015-01-01

The echinocandins are a class of antifungal drugs that includes caspofungin, micafungin, and anidulafungin. Gene clusters encoding most of the structural complexity of the echinocandins provided a framework for hypotheses about the evolutionary history and chemical logic of echinocandin biosynthesis. Gene orthologs among echinocandin-producing fungi were identified. Pathway genes, including the nonribosomal peptide synthetases (NRPSs), were analyzed phylogenetically to address the hypothesis that these pathways represent descent from a common ancestor. The clusters share cooperative gene contents and linkages among the different strains. Individual pathway genes analyzed in the context of similar genes formed unique echinocandin-exclusive phylogenetic lineages. The echinocandin NRPSs, along with the NRPS from the inp gene cluster in Aspergillus nidulans and its orthologs, comprise a novel lineage among fungal NRPSs. NRPS adenylation domains from different species exhibited a one-to-one correspondence between modules and amino acid specificity that is consistent with models of tandem duplication and subfunctionalization. Pathway gene trees and Ascomycota phylogenies are congruent and consistent with the hypothesis that the echinocandin gene clusters have a common origin. The disjunct Eurotiomycete-Leotiomycete distribution appears to be consistent with a scenario of vertical descent accompanied by incomplete lineage sorting and loss of the clusters from most lineages of the Ascomycota. We present evidence for a single evolutionary origin of the echinocandin family of gene clusters and a progression of structural diversification in two fungal classes that diverged approximately 290 to 390 million years ago. Lineage-specific gene cluster evolution driven by selection of new chemotypes contributed to diversification of the molecular functionalities. PMID:26024901
Strain-Level Diversity of Secondary Metabolism in Streptomyces albus

PubMed Central

Seipke, Ryan F.

2015-01-01

Streptomyces spp. are robust producers of medicinally-, industrially- and agriculturally-important small molecules. Increased resistance to antibacterial agents and the lack of new antibiotics in the pipeline have led to a renaissance in natural product discovery. This endeavor has benefited from inexpensive high quality DNA sequencing technology, which has generated more than 140 genome sequences for taxonomic type strains and environmental Streptomyces spp. isolates. Many of the sequenced streptomycetes belong to the same species. For instance, Streptomyces albus has been isolated from diverse environmental niches and seven strains have been sequenced, consequently this species has been sequenced more than any other streptomycete, allowing valuable analyses of strain-level diversity in secondary metabolism. Bioinformatics analyses identified a total of 48 unique biosynthetic gene clusters harboured by Streptomyces albus strains. Eighteen of these gene clusters specify the core secondary metabolome of the species. Fourteen of the gene clusters are contained by one or more strain and are considered auxiliary, while 16 of the gene clusters encode the production of putative strain-specific secondary metabolites. Analysis of Streptomyces albus strains suggests that each strain of a Streptomyces species likely harbours at least one strain-specific biosynthetic gene cluster. Importantly, this implies that deep sequencing of a species will not exhaust gene cluster diversity and will continue to yield novelty. PMID:25635820
Heterologous expression of pikromycin biosynthetic gene cluster using Streptomyces artificial chromosome system.

PubMed

Pyeon, Hye-Rim; Nah, Hee-Ju; Kang, Seung-Hoon; Choi, Si-Sun; Kim, Eung-Soo

2017-05-31

Heterologous expression of biosynthetic gene clusters of natural microbial products has become an essential strategy for titer improvement and pathway engineering of various potentially-valuable natural products. A Streptomyces artificial chromosomal conjugation vector, pSBAC, was previously successfully applied for precise cloning and tandem integration of a large polyketide tautomycetin (TMC) biosynthetic gene cluster (Nah et al. in Microb Cell Fact 14(1):1, 2015), implying that this strategy could be employed to develop a custom overexpression scheme of natural product pathway clusters present in actinomycetes. To validate the pSBAC system as a generally-applicable heterologous overexpression system for a large-sized polyketide biosynthetic gene cluster in Streptomyces, another model polyketide compound, the pikromycin biosynthetic gene cluster, was preciously cloned and heterologously expressed using the pSBAC system. A unique HindIII restriction site was precisely inserted at one of the border regions of the pikromycin biosynthetic gene cluster within the chromosome of Streptomyces venezuelae, followed by site-specific recombination of pSBAC into the flanking region of the pikromycin gene cluster. Unlike the previous cloning process, one HindIII site integration step was skipped through pSBAC modification. pPik001, a pSBAC containing the pikromycin biosynthetic gene cluster, was directly introduced into two heterologous hosts, Streptomyces lividans and Streptomyces coelicolor, resulting in the production of 10-deoxymethynolide, a major pikromycin derivative. When two entire pikromycin biosynthetic gene clusters were tandemly introduced into the S. lividans chromosome, overproduction of 10-deoxymethynolide and the presence of pikromycin, which was previously not detected, were both confirmed. Moreover, comparative qRT-PCR results confirmed that the transcription of pikromycin biosynthetic genes was significantly upregulated in S. lividans containing tandem clusters of pikromycin biosynthetic gene clusters. The 60 kb pikromycin biosynthetic gene cluster was isolated in a single integration pSBAC vector. Introduction of the pikromycin biosynthetic gene cluster into the pikromycin non-producing strains resulted in higher pikromycin production. The utility of the pSBAC system as a precise cloning tool for large-sized biosynthetic gene clusters was verified through heterologous expression of the pikromycin biosynthetic gene cluster. Moreover, this pSBAC-driven heterologous expression strategy was confirmed to be an ideal approach for production of low and inconsistent natural products such as pikromycin in S. venezuelae, implying that this strategy could be employed for development of a custom overexpression scheme of natural product biosynthetic gene clusters in actinomycetes.
Diversity amongst trigeminal neurons revealed by high throughput single cell sequencing

PubMed Central

Nguyen, Minh Q.; Wu, Youmei; Bonilla, Lauren S.; von Buchholtz, Lars J.

2017-01-01

The trigeminal ganglion contains somatosensory neurons that detect a range of thermal, mechanical and chemical cues and innervate unique sensory compartments in the head and neck including the eyes, nose, mouth, meninges and vibrissae. We used single-cell sequencing and in situ hybridization to examine the cellular diversity of the trigeminal ganglion in mice, defining thirteen clusters of neurons. We show that clusters are well conserved in dorsal root ganglia suggesting they represent distinct functional classes of somatosensory neurons and not specialization associated with their sensory targets. Notably, functionally important genes (e.g. the mechanosensory channel Piezo2 and the capsaicin gated ion channel Trpv1) segregate into multiple clusters and often are expressed in subsets of cells within a cluster. Therefore, the 13 genetically-defined classes are likely to be physiologically heterogeneous rather than highly parallel (i.e., redundant) lines of sensory input. Our analysis harnesses the power of single-cell sequencing to provide a unique platform for in silico expression profiling that complements other approaches linking gene-expression with function and exposes unexpected diversity in the somatosensory system. PMID:28957441
Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

USDA-ARS?s Scientific Manuscript database

The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...
Transcriptional profiles of Arabidopsis stomataless mutants reveal developmental and physiological features of life in the absence of stomata

PubMed Central

de Marcos, Alberto; Triviño, Magdalena; Pérez-Bueno, María Luisa; Ballesteros, Isabel; Barón, Matilde; Mena, Montaña; Fenoll, Carmen

2015-01-01

Loss of function of the positive stomata development regulators SPCH or MUTE in Arabidopsis thaliana renders stomataless plants; spch-3 and mute-3 mutants are extreme dwarfs, but produce cotyledons and tiny leaves, providing a system to interrogate plant life in the absence of stomata. To this end, we compared their cotyledon transcriptomes with that of wild-type plants. K-means clustering of differentially expressed genes generated four clusters: clusters 1 and 2 grouped genes commonly regulated in the mutants, while clusters 3 and 4 contained genes distinctively regulated in mute-3. Classification in functional categories and metabolic pathways of genes in clusters 1 and 2 suggested that both mutants had depressed secondary, nitrogen and sulfur metabolisms, while only a few photosynthesis-related genes were down-regulated. In situ quenching analysis of chlorophyll fluorescence revealed limited inhibition of photosynthesis. This and other fluorescence measurements matched the mutant transcriptomic features. Differential transcriptomes of both mutants were enriched in growth-related genes, including known stomata development regulators, which paralleled their epidermal phenotypes. Analysis of cluster 3 was not informative for developmental aspects of mute-3. Cluster 4 comprised genes differentially up−regulated in mute−3, 35% of which were direct targets for SPCH and may relate to the unique cell types of mute−3. A screen of T-DNA insertion lines in genes differentially expressed in the mutants identified a gene putatively involved in stomata development. A collection of lines for conditional overexpression of transcription factors differentially expressed in the mutants rendered distinct epidermal phenotypes, suggesting that these proteins may be novel stomatal development regulators. Thus, our transcriptome analysis represents a useful source of new genes for the study of stomata development and for characterizing physiology and growth in the absence of stomata. PMID:26157447
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

PubMed

Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K

2003-11-01

Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Wheat EST resources for functional genomics of abiotic stress

PubMed Central

Houde, Mario; Belcaid, Mahdi; Ouellet, François; Danyluk, Jean; Monroy, Antonio F; Dryanova, Ani; Gulick, Patrick; Bergeron, Anne; Laroche, André; Links, Matthew G; MacCarthy, Luke; Crosby, William L; Sarhan, Fathey

2006-01-01

Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID:16772040
Recent increased identification and transmission of HIV-1 unique recombinant forms in Sweden.

PubMed

Neogi, Ujjwal; Siddik, Abu Bakar; Kalaghatgi, Prabhav; Gisslén, Magnus; Bratt, Göran; Marrone, Gaetano; Sönnerborg, Anders

2017-07-25

A temporal increase in non-B subtypes has earlier been described in Sweden by us and we hypothesized that this increased viral heterogeneity may become a hotspot for the development of more complex and unique recombinant forms (URFs) if the epidemics converge. In the present study, we performed subtyping using four automated tools and phylogenetic analysis by RAxML of pol gene sequences (n = 5246) and HIV-1 near full-length genome (HIV-NFLG) sequences (n = 104). A CD4 + T-cell decline trajectory algorithm was used to estimate time of HIV infection. Transmission clusters were identified using the family-joining method. The analysis of HIV-NFLG and pol gene described 10.6% (11/104) and 2.6% (137/5246) of the strains as URFs, respectively. An increasing trend of URFs was observed in recent years by both approaches (p = 0·0082; p < 0·0001). Transmission cluster analysis using the pol gene of all URFs identified 14 clusters with two to eight sequences. Larger transmission clusters of URFs (BF1 and 01B) were observed among MSM who mostly were sero-diagnosed in recent time. Understanding the increased appearance and transmission of URFs in recent years could have importance for public health interventions and the use of HIV-NFLG would provide better statistical support for such assessments.
Identification of the Monooxygenase Gene Clusters Responsible for the Regioselective Oxidation of Phenol to Hydroquinone in Mycobacteria▿

PubMed Central

Furuya, Toshiki; Hirose, Satomi; Osanai, Hisashi; Semba, Hisashi; Kino, Kuniki

2011-01-01

Mycobacterium goodii strain 12523 is an actinomycete that is able to oxidize phenol regioselectively at the para position to produce hydroquinone. In this study, we investigated the genes responsible for this unique regioselective oxidation. On the basis of the fact that the oxidation activity of M. goodii strain 12523 toward phenol is induced in the presence of acetone, we first identified acetone-induced proteins in this microorganism by two-dimensional electrophoretic analysis. The N-terminal amino acid sequence of one of these acetone-induced proteins shares 100% identity with that of the protein encoded by the open reading frame Msmeg_1971 in Mycobacterium smegmatis strain mc2155, whose genome sequence has been determined. Since Msmeg_1971, Msmeg_1972, Msmeg_1973, and Msmeg_1974 constitute a putative binuclear iron monooxygenase gene cluster, we cloned this gene cluster of M. smegmatis strain mc2155 and its homologous gene cluster found in M. goodii strain 12523. Sequence analysis of these binuclear iron monooxygenase gene clusters revealed the presence of four genes designated mimABCD, which encode an oxygenase large subunit, a reductase, an oxygenase small subunit, and a coupling protein, respectively. When the mimA gene (Msmeg_1971) of M. smegmatis strain mc2155, which was also found to be able to oxidize phenol to hydroquinone, was deleted, this mutant lost the oxidation ability. This ability was restored by introduction of the mimA gene of M. smegmatis strain mc2155 or of M. goodii strain 12523 into this mutant. Interestingly, we found that these gene clusters also play essential roles in propane and acetone metabolism in these mycobacteria. PMID:21183637
Genetic diversity of K-antigen gene clusters of Escherichia coli and their molecular typing using a suspension array.

PubMed

Yang, Shuang; Xi, Daoyi; Jing, Fuyi; Kong, Deju; Wu, Junli; Feng, Lu; Cao, Boyang; Wang, Lei

2018-04-01

Capsular polysaccharides (CPSs), or K-antigens, are the major surface antigens of Escherichia coli. More than 80 serologically unique K-antigens are classified into 4 groups (Groups 1-4) of capsules. Groups 1 and 4 contain the Wzy-dependent polymerization pathway and the gene clusters are in the order galF to gnd; Groups 2 and 3 contain the ABC-transporter-dependent pathway and the gene clusters consist of 3 regions, regions 1, 2 and 3. Little is known about the variations among the gene clusters. In this study, 9 serotypes of K-antigen gene clusters (K2ab, K11, K20, K24, K38, K84, K92, K96, and K102) were sequenced and correlated with their CPS chemical structures. On the basis of sequence data, a K-antigen-specific suspension array that detects 10 distinct CPSs, including the above 9 CPSs plus K30, was developed. This is the first report to catalog the genetic features of E. coli K-antigen variations and to develop a suspension array for their molecular typing. The method has a number of advantages over traditional bacteriophage and serum agglutination methods and lays the foundation for straightforward identification and detection of additional K-antigens in the future.
Genome mining of the sordarin biosynthetic gene cluster from Sordaria araneosa Cain ATCC 36386: characterization of cycloaraneosene synthase and GDP-6-deoxyaltrose transferase.

PubMed

Kudo, Fumitaka; Matsuura, Yasunori; Hayashi, Takaaki; Fukushima, Masayuki; Eguchi, Tadashi

2016-07-01

Sordarin is a glycoside antibiotic with a unique tetracyclic diterpene aglycone structure called sordaricin. To understand its intriguing biosynthetic pathway that may include a Diels-Alder-type [4+2]cycloaddition, genome mining of the gene cluster from the draft genome sequence of the producer strain, Sordaria araneosa Cain ATCC 36386, was carried out. A contiguous 67 kb gene cluster consisting of 20 open reading frames encoding a putative diterpene cyclase, a glycosyltransferase, a type I polyketide synthase, and six cytochrome P450 monooxygenases were identified. In vitro enzymatic analysis of the putative diterpene cyclase SdnA showed that it catalyzes the transformation of geranylgeranyl diphosphate to cycloaraneosene, a known biosynthetic intermediate of sordarin. Furthermore, a putative glycosyltransferase SdnJ was found to catalyze the glycosylation of sordaricin in the presence of GDP-6-deoxy-d-altrose to give 4'-O-demethylsordarin. These results suggest that the identified sdn gene cluster is responsible for the biosynthesis of sordarin. Based on the isolated potential biosynthetic intermediates and bioinformatics analysis, a plausible biosynthetic pathway for sordarin is proposed.
A strategy of gene overexpression based on tandem repetitive promoters in Escherichia coli.

PubMed

Li, Mingji; Wang, Junshu; Geng, Yanping; Li, Yikui; Wang, Qian; Liang, Quanfeng; Qi, Qingsheng

2012-02-06

For metabolic engineering, many rate-limiting steps may exist in the pathways of accumulating the target metabolites. Increasing copy number of the desired genes in these pathways is a general method to solve the problem, for example, the employment of the multi-copy plasmid-based expression system. However, this method may bring genetic instability, structural instability and metabolic burden to the host, while integrating of the desired gene into the chromosome may cause inadequate transcription or expression. In this study, we developed a strategy for obtaining gene overexpression by engineering promoter clusters consisted of multiple core-tac-promoters (MCPtacs) in tandem. Through a uniquely designed in vitro assembling process, a series of promoter clusters were constructed. The transcription strength of these promoter clusters showed a stepwise enhancement with the increase of tandem repeats number until it reached the critical value of five. Application of the MCPtacs promoter clusters in polyhydroxybutyrate (PHB) production proved that it was efficient. Integration of the phaCAB genes with the 5CPtacs promoter cluster resulted in an engineered E.coli that can accumulate 23.7% PHB of the cell dry weight in batch cultivation. The transcription strength of the MCPtacs promoter cluster can be greatly improved by increasing the tandem repeats number of the core-tac-promoter. By integrating the desired gene together with the MCPtacs promoter cluster into the chromosome of E. coli, we can achieve high and stale overexpression with only a small size. This strategy has an application potential in many fields and can be extended to other bacteria.

Clustering, haplotype diversity and locations of MIC-3: a unique root-specific defense-related gene family in upland cotton (Gossypium hirsutum L.)

USDA-ARS?s Scientific Manuscript database

MIC-3-related genes of cotton (Gossypium spp.) were identified and shown to have root-specific expression, associated with pathogen defense-related function and specifically increased expression in root-knot nematode (RKN) resistant plants after nematode infection. Here we cloned and sequenced MIC-...
Polyketide synthesis genes associated with toxin production in two species of Gambierdiscus (Dinophyceae).

PubMed

Kohli, Gurjeet S; John, Uwe; Figueroa, Rosa I; Rhodes, Lesley L; Harwood, D Tim; Groth, Marco; Bolch, Christopher J S; Murray, Shauna A

2015-05-28

Marine microbial protists, in particular, dinoflagellates, produce polyketide toxins with ecosystem-wide and human health impacts. Species of Gambierdiscus produce the polyether ladder compounds ciguatoxins and maitotoxins, which can lead to ciguatera fish poisoning, a serious human illness associated with reef fish consumption. Genes associated with the biosynthesis of polyether ladder compounds are yet to be elucidated, however, stable isotope feeding studies of such compounds consistently support their polyketide origin indicating that polyketide synthases are involved in their biosynthesis. Here, we report the toxicity, genome size, gene content and transcriptome of Gambierdiscus australes and G. belizeanus. G. australes produced maitotoxin-1 and maitotoxin-3, while G. belizeanus produced maitotoxin-3, for which cell extracts were toxic to mice by IP injection (LD50 = 3.8 mg kg(-1)). The gene catalogues comprised 83,353 and 84,870 unique contigs, with genome sizes of 32.5 ± 3.7 Gbp and 35 ± 0.88 Gbp, respectively, and are amongst the most comprehensive yet reported from a dinoflagellate. We found three hundred and six genes involved in polyketide biosynthesis, including one hundred and ninety-two ketoacyl synthase transcripts, which formed five unique phylogenetic clusters. Two clusters were unique to these maitotoxin-producing dinoflagellate species, suggesting that they may be associated with maitotoxin biosynthesis. This work represents a significant step forward in our understanding of the genetic basis of polyketide production in dinoflagellates, in particular, species responsible for ciguatera fish poisoning.
Unique core genomes of the bacterial family vibrionaceae: insights into niche adaptation and speciation.

PubMed

Kahlke, Tim; Goesmann, Alexander; Hjerde, Erik; Willassen, Nils Peder; Haugen, Peik

2012-05-10

The criteria for defining bacterial species and even the concept of bacterial species itself are under debate, and the discussion is apparently intensifying as more genome sequence data is becoming available. However, it is still unclear how the new advances in genomics should be used most efficiently to address this question. In this study we identify genes that are common to any group of genomes in our dataset, to determine whether genes specific to a particular taxon exist and to investigate their potential role in adaptation of bacteria to their specific niche. These genes were named unique core genes. Additionally, we investigate the existence and importance of unique core genes that are found in isolates of phylogenetically non-coherent groups. These groups of isolates, that share a genetic feature without sharing a closest common ancestor, are termed genophyletic groups. The bacterial family Vibrionaceae was used as the model, and we compiled and compared genome sequences of 64 different isolates. Using the software orthoMCL we determined clusters of homologous genes among the investigated genome sequences. We used multilocus sequence analysis to build a host phylogeny and mapped the numbers of unique core genes of all distinct groups of isolates onto the tree. The results show that unique core genes are more likely to be found in monophyletic groups of isolates. Genophyletic groups of isolates, in contrast, are less common especially for large groups of isolate. The subsequent annotation of unique core genes that are present in genophyletic groups indicate a high degree of horizontally transferred genes. Finally, the annotation of the unique core genes of Vibrio cholerae revealed genes involved in aerotaxis and biosynthesis of the iron-chelator vibriobactin. The presented work indicates that genes specific for any taxon inside the bacterial family Vibrionaceae exist. These unique core genes encode conserved metabolic functions that can shed light on the adaptation of a species to its ecological niche. Additionally, our study suggests that unique core genes can be used to aid classification of bacteria and contribute to a bacterial species definition on a genomic level. Furthermore, these genes may be of importance in clinical diagnostics and drug development.
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.

PubMed

Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V

2017-09-30

Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species

PubMed Central

Andersen, Ethan J.; Neupane, Surendra; Benson, Benjamin V.

2017-01-01

Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence. PMID:28973974
Comparative genomics of Beauveria bassiana: uncovering signatures of virulence against mosquitoes.

PubMed

Valero-Jiménez, Claudio A; Faino, Luigi; Spring In't Veld, Daphne; Smit, Sandra; Zwaan, Bas J; van Kan, Jan A L

2016-12-01

Entomopathogenic fungi such as Beauveria bassiana are promising biological agents for control of malaria mosquitoes. Indeed, infection with B. bassiana reduces the lifespan of mosquitoes in the laboratory and in the field. Natural isolates of B. bassiana show up to 10-fold differences in virulence between the most and the least virulent isolate. In this study, we sequenced the genomes of five isolates representing the extremes of low/high virulence and three RNA libraries, and applied a genome comparison approach to uncover genetic mechanisms underpinning virulence. A high-quality, near-complete genome assembly was achieved for the highly virulent isolate Bb8028, which was compared to the assemblies of the four other isolates. Whole genome analysis showed a high level of genetic diversity between the five isolates (2.85-16.8 SNPs/kb), which grouped into two distinct phylogenetic clusters. Mating type gene analysis revealed the presence of either the MAT1-1-1 or the MAT1-2-1 gene. Moreover, a putative new MAT gene (MAT1-2-8) was detected in the MAT1-2 locus. Comparative genome analysis revealed that Bb8028 contains 163 genes exclusive for this isolate. These unique genes have a tendency to cluster in the genome and to be often located near the telomeres. Among the genes unique to Bb8028 are a Non-Ribosomal Peptide Synthetase (NRPS) secondary metabolite gene cluster, a polyketide synthase (PKS) gene, and five genes with homology to bacterial toxins. A survey of candidate virulence genes for B. bassiana is presented. Our results indicate several genes and molecular processes that may underpin virulence towards mosquitoes. Thus, the genome sequences of five isolates of B. bassiana provide a better understanding of the natural variation in virulence and will offer a major resource for future research on this important biological control agent.
Identifying conserved gene clusters in the presence of homology families.

PubMed

He, Xin; Goldwasser, Michael H

2005-01-01

The study of conserved gene clusters is important for understanding the forces behind genome organization and evolution, as well as the function of individual genes or gene groups. In this paper, we present a new model and algorithm for identifying conserved gene clusters from pairwise genome comparison. This generalizes a recent model called "gene teams." A gene team is a set of genes that appear homologously in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each gene must have a unique occurrence in each chromosome and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm analyzes a pair of chromosomes in O(mn) time and uses O(m+n) space, where m and n are the number of genes in the respective chromosomes. We demonstrate the utility of our methods by studying two bacterial genomes, E. coli K-12 and B. subtilis. Many of the teams identified by our algorithm correlate with documented E. coli operons, while several others match predicted operons, previously suggested by computational techniques. Our implementation and data are publicly available at euler.slu.edu/ approximately goldwasser/homologyteams/.
Identification and Analysis of the Biosynthetic Gene Cluster Encoding the Thiopeptide Antibiotic Cyclothiazomycin in Streptomyces hygroscopicus 10-22▿ †

PubMed Central

Wang, Jiang; Yu, Yi; Tang, Kexuan; Liu, Wen; He, Xinyi; Huang, Xi; Deng, Zixin

2010-01-01

Thiopeptide antibiotics are an important class of natural products resulting from posttranslational modifications of ribosomally synthesized peptides. Cyclothiazomycin is a typical thiopeptide antibiotic that has a unique bridged macrocyclic structure derived from an 18-amino-acid structural peptide. Here we reported cloning, sequencing, and heterologous expression of the cyclothiazomycin biosynthetic gene cluster from Streptomyces hygroscopicus 10-22. Remarkably, successful heterologous expression of a 22.7-kb gene cluster in Streptomyces lividans 1326 suggested that there is a minimum set of 15 open reading frames that includes all of the functional genes required for cyclothiazomycin production. Six genes of these genes, cltBCDEFG flanking the structural gene cltA, were predicted to encode the enzymes required for the main framework of cyclothiazomycin, and two enzymes encoded by a putative operon, cltMN, were hypothesized to participate in the tailoring step to generate the tertiary thioether, leading to the final cyclization of the bridged macrocyclic structure. This rigorous bioinformatics analysis based on heterologous expression of cyclothiazomycin resulted in an ideal biosynthetic model for us to understand the biosynthesis of thiopeptides. PMID:20154110
The Epipolythiodiketopiperazine Gene Cluster in Claviceps purpurea: Dysfunctional Cytochrome P450 Enzyme Prevents Formation of the Previously Unknown Clapurines.

PubMed

Dopstadt, Julian; Neubauer, Lisa; Tudzynski, Paul; Humpf, Hans-Ulrich

2016-01-01

Claviceps purpurea is an important food contaminant and well known for the production of the toxic ergot alkaloids. Apart from that, little is known about its secondary metabolism and not all toxic substances going along with the food contamination with Claviceps are known yet. We explored the metabolite profile of a gene cluster in C. purpurea with a high homology to gene clusters, which are responsible for the formation of epipolythiodiketopiperazine (ETP) toxins in other fungi. By overexpressing the transcription factor, we were able to activate the cluster in the standard C. purpurea strain 20.1. Although all necessary genes for the formation of the characteristic disulfide bridge were expressed in the overexpression mutants, the fungus did not produce any ETPs. Isolation of pathway intermediates showed that the common biosynthetic pathway stops after the first steps. Our results demonstrate that hydroxylation of the diketopiperazine backbone is the critical step during the ETP biosynthesis. Due to a dysfunctional enzyme, the fungus is not able to produce toxic ETPs. Instead, the pathway end-products are new unusual metabolites with a unique nitrogen-sulfur bond. By heterologous expression of the Leptosphaeria maculans cytochrome P450 encoding gene sirC, we were able to identify the end-products of the ETP cluster in C. purpurea. The thioclapurines are so far unknown ETPs, which might contribute to the toxicity of other C. purpurea strains with a potentially intact ETP cluster.
The Epipolythiodiketopiperazine Gene Cluster in Claviceps purpurea: Dysfunctional Cytochrome P450 Enzyme Prevents Formation of the Previously Unknown Clapurines

PubMed Central

Tudzynski, Paul; Humpf, Hans-Ulrich

2016-01-01

Claviceps purpurea is an important food contaminant and well known for the production of the toxic ergot alkaloids. Apart from that, little is known about its secondary metabolism and not all toxic substances going along with the food contamination with Claviceps are known yet. We explored the metabolite profile of a gene cluster in C. purpurea with a high homology to gene clusters, which are responsible for the formation of epipolythiodiketopiperazine (ETP) toxins in other fungi. By overexpressing the transcription factor, we were able to activate the cluster in the standard C. purpurea strain 20.1. Although all necessary genes for the formation of the characteristic disulfide bridge were expressed in the overexpression mutants, the fungus did not produce any ETPs. Isolation of pathway intermediates showed that the common biosynthetic pathway stops after the first steps. Our results demonstrate that hydroxylation of the diketopiperazine backbone is the critical step during the ETP biosynthesis. Due to a dysfunctional enzyme, the fungus is not able to produce toxic ETPs. Instead, the pathway end-products are new unusual metabolites with a unique nitrogen-sulfur bond. By heterologous expression of the Leptosphaeria maculans cytochrome P450 encoding gene sirC, we were able to identify the end-products of the ETP cluster in C. purpurea. The thioclapurines are so far unknown ETPs, which might contribute to the toxicity of other C. purpurea strains with a potentially intact ETP cluster. PMID:27390873
Submegabase Clusters of Unstable Tandem Repeats Unique to the Tla Region of Mouse T Haplotypes

PubMed Central

Uehara, H.; Ebersole, T.; Bennett, D.; Artzt, K.

1990-01-01

We describe here the identification and genomic organization of mouse t haplotype-specific elements (TSEs) 7.8 and 5.8 kb in length. The TSEs exist as submegabase-long clusters of tandem repeats localized in the Tla region of the major histocompatibility complex of all t haplotype chromosomes examined. In contrast, no such clusters were detected among 12 inbred strains of Mus musculus and other Mus species; thus, clusters of TSEs represent the first absolutely qualitative difference between t haplotypes and wild-type chromosomes. Pulsed field gel electrophoresis shows that the number of clusters, and the number of repeats in each cluster are extremely variable. Dramatic quantitative differences of TSEs uniquely distinguish every independent t haplotype from any other. The complete nucleotide sequence of one 7.8-kb TSE reveals significant homology to the ETn (a major transcript in the early embryo of the mouse), and some homologies to intracisternal A-particles and the mammary tumor virus env gene. Apart from the diagnostic relevance to t haplotypes, evolutionary and functional significances are discussed with respect to chromosome structure and genetic recombination. PMID:2076812
Genetic Diversity of Bacterial Communities and Gene Transfer Agents in Northern South China Sea

PubMed Central

Sun, Fu-Lin; Wang, You-Shao; Wu, Mei-Lin; Jiang, Zhao-Yu; Sun, Cui-Ci; Cheng, Hao

2014-01-01

Pyrosequencing of the 16S ribosomal RNA gene (rDNA) amplicons was performed to investigate the unique distribution of bacterial communities in northern South China Sea (nSCS) and evaluate community structure and spatial differences of bacterial diversity. Cyanobacteria, Proteobacteria, Actinobacteria, and Bacteroidetes constitute the majority of bacteria. The taxonomic description of bacterial communities revealed that more Chroococcales, SAR11 clade, Acidimicrobiales, Rhodobacterales, and Flavobacteriales are present in the nSCS waters than other bacterial groups. Rhodobacterales were less abundant in tropical water (nSCS) than in temperate and cold waters. Furthermore, the diversity of Rhodobacterales based on the gene transfer agent (GTA) major capsid gene (g5) was investigated. Four g5 gene clone libraries were constructed from samples representing different regions and yielded diverse sequences. Fourteen g5 clusters could be identified among 197 nSCS clones. These clusters were also related to known g5 sequences derived from genome-sequenced Rhodobacterales. The composition of g5 sequences in surface water varied with the g5 sequences in the sampling sites; this result indicated that the Rhodobacterales population could be highly diverse in nSCS. Phylogenetic tree analysis result indicated distinguishable diversity patterns among tropical (nSCS), temperate, and cold waters, thereby supporting the niche adaptation of specific Rhodobacterales members in unique environments. PMID:25364820
Characterization of three different clusters of 18S-26S ribosomal DNA genes in the sea urchin P. lividus: Genetic and epigenetic regulation synchronous to 5S rDNA.

PubMed

Bellavia, Daniele; Dimarco, Eufrosina; Caradonna, Fabio

2016-04-15

We previously reported the characterization 5S ribosomal DNA (rDNA) clusters in the common sea urchin Paracentrotus lividus and demonstrated the presence of DNA methylation-dependent silencing of embryo specific 5S rDNA cluster in adult tissue. In this work, we show genetic and epigenetic characterization of 18S-26S rDNA clusters in this specie. The results indicate the presence of three different 18S-26S rDNA clusters with different Non-Transcribed Spacer (NTS) regions that have different chromosomal localizations. Moreover, we show that the two largest clusters are hyper-methylated in the promoter-containing NTS regions in adult tissues, as in the 5S rDNA. These findings demonstrate an analogous epigenetic regulation in small and large rDNA clusters and support the logical synchronism in building ribosomes. In fact, all the ribosomal RNA genes must be synchronously and equally transcribed to perform their unique final product. Copyright © 2016 Elsevier B.V. All rights reserved.
When Genome-Based Approach Meets the “Old but Good”: Revealing Genes Involved in the Antibacterial Activity of Pseudomonas sp. P482 against Soft Rot Pathogens

PubMed Central

Krzyżanowska, Dorota M.; Ossowicki, Adam; Rajewska, Magdalena; Maciąg, Tomasz; Jabłońska, Magdalena; Obuchowski, Michał; Heeb, Stephan; Jafra, Sylwia

2016-01-01

Dickeya solani and Pectobacterium carotovorum subsp. brasiliense are recently established species of bacterial plant pathogens causing black leg and soft rot of many vegetables and ornamental plants. Pseudomonas sp. strain P482 inhibits the growth of these pathogens, a desired trait considering the limited measures to combat these diseases. In this study, we determined the genetic background of the antibacterial activity of P482, and established the phylogenetic position of this strain. Pseudomonas sp. P482 was classified as Pseudomonas donghuensis. Genome mining revealed that the P482 genome does not contain genes determining the synthesis of known antimicrobials. However, the ClusterFinder algorithm, designed to detect atypical or novel classes of secondary metabolite gene clusters, predicted 18 such clusters in the genome. Screening of a Tn5 mutant library yielded an antimicrobial negative transposon mutant. The transposon insertion was located in a gene encoding an HpcH/HpaI aldolase/citrate lyase family protein. This gene is located in a hypothetical cluster predicted by the ClusterFinder, together with the downstream homologs of four nfs genes, that confer production of a non-fluorescent siderophore by P. donghuensis HYST. Site-directed inactivation of the HpcH/HpaI aldolase gene, the adjacent short chain dehydrogenase gene, as well as a homolog of an essential nfs cluster gene, all abolished the antimicrobial activity of the P482, suggesting their involvement in a common biosynthesis pathway. However, none of the mutants showed a decreased siderophore yield, neither was the antimicrobial activity of the wild type P482 compromised by high iron bioavailability. A genomic region comprising the nfs cluster and three upstream genes is involved in the antibacterial activity of P. donghuensis P482 against D. solani and P. carotovorum subsp. brasiliense. The genes studied are unique to the two known P. donghuensis strains. This study illustrates that mining of microbial genomes is a powerful approach for predictingthe presence of novel secondary-metabolite encoding genes especially when coupled with transposon mutagenesis. PMID:27303376
Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering.

PubMed

Ji, Shuiwang

2013-07-11

The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship.
Complete genome sequence of Nitrosospira multiformis, an ammonia-oxidizing bacterium from the soil environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Norton, Jeanette M.; Klotz, Martin G; Stein, Lisa Y

2008-01-01

The complete genome of the ammonia-oxidizing bacterium, Nitrosospira multiformis (ATCC 25196T), consists of a circular chromosome and three small plasmids totaling 3,234,309 bp and encoding 2827 putative proteins. Of these, 2026 proteins have predicted functions and 801 are without conserved functional domains, yet 747 of these have similarity to other predicted proteins in databases. Gene homologs from Nitrosomonas europaea and N. eutropha were the best match for 42% of the predicted genes in N. multiformis. The genome contains three nearly identical copies of amo and hao gene clusters as large repeats. Distinguishing features compared to N. europaea include: the presencemore » of gene clusters encoding urease and hydrogenase, a RuBisCO-encoding operon of distinctive structure and phylogeny, and a relatively small complement of genes related to Fe acquisition. Systems for synthesis of a pyoverdine-like siderophore and for acyl-homoserine lactone were unique to N. multiformis among the sequenced AOB genomes. Gene clusters encoding proteins associated with outer membrane and cell envelope functions including transporters, porins, exopolysaccharide synthesis, capsule formation and protein sorting/export were abundant. Numerous sensory transduction and response regulator gene systems directed towards sensing of the extracellular environment are described. Gene clusters for glycogen, polyphosphate and cyanophycin storage and utilization were identified providing mechanisms for meeting energy requirements under substrate-limited conditions. The genome of N. multiformis encodes the core pathways for chemolithoautotrophy along with adaptations for surface growth and survival in soil environments.« less
High sequence variations in the region containing genes encoding a cellular morphogenesis protein and the repressor of sexual development help to reveal origins of Aspergillus oryzae

USDA-ARS?s Scientific Manuscript database

Aspergillus oryzae and Aspergillus flavus are closely related fungal species. The A. flavus population that produces numerous small sclerotia (S strain) and aflatoxin has a unique 1.5 kb deletion in the norB-cypA region of the aflatoxin gene cluster (the S genotype). Phylogenetic studies have indica...
The human RHOX gene cluster: target genes and functional analysis of gene variants in infertile men.

PubMed

Borgmann, Jennifer; Tüttelmann, Frank; Dworniczak, Bernd; Röpke, Albrecht; Song, Hye-Won; Kliesch, Sabine; Wilkinson, Miles F; Laurentino, Sandra; Gromoll, Jörg

2016-11-15

The X-linked reproductive homeobox (RHOX) gene cluster encodes transcription factors preferentially expressed in reproductive tissues. This gene cluster has important roles in male fertility based on phenotypic defects of Rhox-mutant mice and the finding that aberrant RHOX promoter methylation is strongly associated with abnormal human sperm parameters. However, little is known about the molecular mechanism of RHOX function in humans. Using gene expression profiling, we identified genes regulated by members of the human RHOX gene cluster. Some genes were uniquely regulated by RHOXF1 or RHOXF2/2B, while others were regulated by both of these transcription factors. Several of these regulated genes encode proteins involved in processes relevant to spermatogenesis; e.g. stress protection and cell survival. One of the target genes of RHOXF2/2B is RHOXF1, suggesting cross-regulation to enhance transcriptional responses. The potential role of RHOX in human infertility was addressed by sequencing all RHOX exons in a group of 250 patients with severe oligozoospermia. This revealed two mutations in RHOXF1 (c.515G > A and c.522C > T) and four in RHOXF2/2B (-73C > G, c.202G > A, c.411C > T and c.679G > A), of which only one (c.202G > A) was found in a control group of men with normal sperm concentration. Functional analysis demonstrated that c.202G > A and c.679G > A significantly impaired the ability of RHOXF2/2B to regulate downstream genes. Molecular modelling suggested that these mutations alter RHOXF2/F2B protein conformation. By combining clinical data with in vitro functional analysis, we demonstrate how the X-linked RHOX gene cluster may function in normal human spermatogenesis and we provide evidence that it is impaired in human male fertility.
Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma.

PubMed

Young, Jonathan D; Cai, Chunhui; Lu, Xinghua

2017-10-03

One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene expression data. Deep learning is a group of machine learning algorithms that use multiple layers of hidden units to capture hierarchically related, alternative representations of the input data. We hypothesize that this hierarchical structure learned by deep learning will be related to the cellular signaling system. Robust deep learning model selection identified a network architecture that is biologically plausible. Our model selection results indicated that the 1st hidden layer of our deep learning model should contain about 1300 hidden units to most effectively capture the covariance structure of the input data. This agrees with the estimated number of human transcription factors, which is approximately 1400. This result lends support to our hypothesis that the 1st hidden layer of a deep learning model trained on gene expression data may represent signals related to transcription factor activation. Using the 3rd hidden layer representation of each tumor as learned by our unsupervised deep learning model, we performed consensus clustering on all tumor samples-leading to the discovery of clusters of glioblastoma multiforme with differential survival. One of these clusters contained all of the glioblastoma samples with G-CIMP, a known methylation phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that the hidden units in the 3rd hidden layer representations captured a methylation signal without explicitly using methylation data as input. We also found differentially expressed genes and well-known mutations (NF1, IDH1, EGFR) that were uniquely correlated with each of these clusters. Exploring these unique genes and mutations will allow us to further investigate the disease mechanisms underlying each of these clusters. In summary, we show that a deep learning model can be trained to represent biologically and clinically meaningful abstractions of cancer gene expression data. Understanding what additional relationships these hidden layer abstractions have with the cancer cellular signaling system could have a significant impact on the understanding and treatment of cancer.
An archaeal genomic signature

NASA Technical Reports Server (NTRS)

Graham, D. E.; Overbeek, R.; Olsen, G. J.; Woese, C. R.

2000-01-01

Comparisons of complete genome sequences allow the most objective and comprehensive descriptions possible of a lineage's evolution. This communication uses the completed genomes from four major euryarchaeal taxa to define a genomic signature for the Euryarchaeota and, by extension, the Archaea as a whole. The signature is defined in terms of the set of protein-encoding genes found in at least two diverse members of the euryarchaeal taxa that function uniquely within the Archaea; most signature proteins have no recognizable bacterial or eukaryal homologs. By this definition, 351 clusters of signature proteins have been identified. Functions of most proteins in this signature set are currently unknown. At least 70% of the clusters that contain proteins from all the euryarchaeal genomes also have crenarchaeal homologs. This conservative set, which appears refractory to horizontal gene transfer to the Bacteria or the Eukarya, would seem to reflect the significant innovations that were unique and fundamental to the archaeal "design fabric." Genomic protein signature analysis methods may be extended to characterize the evolution of any phylogenetically defined lineage. The complete set of protein clusters for the archaeal genomic signature is presented as supplementary material (see the PNAS web site, www.pnas.org).

Bacillus sp. CDB3 isolated from cattle dip-sites possesses two ars gene clusters.

PubMed

Bhat, Somanath; Luo, Xi; Xu, Zhiqiang; Liu, Lixia; Zhang, Ren

2011-01-01

Contamination of soil and water by arsenic is a global problem. In Australia, the dipping of cattle in arsenic-containing solution to control cattle ticks in last centenary has left many sites heavily contaminated with arsenic and other toxicants. We had previously isolated five soil bacterial strains (CDB1-5) highly resistant to arsenic. To understand the resistance mechanism, molecular studies have been carried out. Two chromosome-encoded arsenic resistance (ars) gene clusters have been cloned from CDB3 (Bacillus sp.). They both function in Escherichia coli and cluster 1 exerts a much higher resistance to the toxic metalloid. Cluster 2 is smaller possessing four open reading frames (ORFs) arsRorf2BC, similar to that identified in Bacillus subtilis Skin element. Among the eight ORFs in cluster 1 five are analogs of common ars genes found in other bacteria, however, organized in a unique order arsRBCDA instead of arsRDABC. Three other putative genes are located directly downstream and designated as arsTIP based on the homologies of their theoretical translation sequences respectively to thioredoxin reductases, iron-sulphur cluster proteins and protein phosphatases. The latter two are novel of any known ars operons. The arsD gene from Bacillus species was cloned for the first time and the predict protein differs from the well studied E. coli ArsD by lacking two pairs of C-terminal cysteine residues. Its functional involvement in arsenic resistance has been confirmed by a deletion experiment. There exists also an inverted repeat in the intergenic region between arsC and arsD implying some unknown transcription regulation.
Conservation of gene linkage in dispersed vertebrate NK homeobox clusters.

PubMed

Wotton, Karl R; Weierud, Frida K; Juárez-Morales, José L; Alvares, Lúcia E; Dietrich, Susanne; Lewis, Katharine E

2009-10-01

Nk homeobox genes are important regulators of many different developmental processes including muscle, heart, central nervous system and sensory organ development. They are thought to have arisen as part of the ANTP megacluster, which also gave rise to Hox and ParaHox genes, and at least some NK genes remain tightly linked in all animals examined so far. The protostome-deuterostome ancestor probably contained a cluster of nine Nk genes: (Msx)-(Nk4/tinman)-(Nk3/bagpipe)-(Lbx/ladybird)-(Tlx/c15)-(Nk7)-(Nk6/hgtx)-(Nk1/slouch)-(Nk5/Hmx). Of these genes, only NKX2.6-NKX3.1, LBX1-TLX1 and LBX2-TLX2 remain tightly linked in humans. However, it is currently unclear whether this is unique to the human genome as we do not know which of these Nk genes are clustered in other vertebrates. This makes it difficult to assess whether the remaining linkages are due to selective pressures or because chance rearrangements have "missed" certain genes. In this paper, we identify all of the paralogs of these ancestrally clustered NK genes in several distinct vertebrates. We demonstrate that tight linkages of Lbx1-Tlx1, Lbx2-Tlx2 and Nkx3.1-Nkx2.6 have been widely maintained in both the ray-finned and lobe-finned fish lineages. Moreover, the recently duplicated Hmx2-Hmx3 genes are also tightly linked. Finally, we show that Lbx1-Tlx1 and Hmx2-Hmx3 are flanked by highly conserved noncoding elements, suggesting that shared regulatory regions may have resulted in evolutionary pressure to maintain these linkages. Consistent with this, these pairs of genes have overlapping expression domains. In contrast, Lbx2-Tlx2 and Nkx3.1-Nkx2.6, which do not seem to be coexpressed, are also not associated with conserved noncoding sequences, suggesting that an alternative mechanism may be responsible for the continued clustering of these genes.
Molecular identification and characterization of clustered regularly interspaced short palindromic repeats (CRISPRs) in a urease-positive thermophilic Campylobacter sp. (UPTC).

PubMed

Tasaki, E; Hirayama, J; Tazumi, A; Hayashi, K; Hara, Y; Ueno, H; Moore, J E; Millar, B C; Matsuda, M

2012-02-01

Novel clustered regularly-interspaced short palindromic repeats (CRISPRs) locus [7,500 base pairs (bp) in length] occurred in the urease-positive thermophilic Campylobacter (UPTC) Japanese isolate, CF89-12. The 7,500 bp gene loci consisted of the 5'-methylaminomethyl-2-thiouridylate methyltransferase gene, putative (P) CRISPR associated (p-Cas), putative open reading frames, Cas1 and Cas2, leader sequence region (146 bp), 12 CRISPRs consensus sequence repeats (each 36 bp) separated by a non-repetitive unique spacer region of similar length (26-31 bp) and the phosphatidyl glycerophosphatase A gene. When the CRISPRs loci in the UPTC CF89-12 and five C. jejuni isolates were compared with one another, these six isolates contained p-Cas, Cas1 and Cas2 within the loci. Four to 12 CRISPRs consensus sequence repeats separated by a non-repetitive unique spacer region occurred in six isolates and the nucleotide sequences of those repeats gave approximately 92-100% similarity with each other. However, no sequence similarity occurred in the unique spacer regions among these isolates. The putative σ(70) transcriptional promoter and the hypothetical ρ-independent terminator structures for the CRISPRs and Cas were detected. No in vivo transcription of p-Cas, Cas1 and Cas2 was confirmed in the UPTC cells.
Unique Physiological and Transcriptional Shifts under Combinations of Salinity, Drought, and Heat.

PubMed

Shaar-Moshe, Lidor; Blumwald, Eduardo; Peleg, Zvi

2017-05-01

Climate-change-driven stresses such as extreme temperatures, water deficit, and ion imbalance are projected to exacerbate and jeopardize global food security. Under field conditions, these stresses usually occur simultaneously and cause damages that exceed single stresses. Here, we investigated the transcriptional patterns and morpho-physiological acclimations of Brachypodium dystachion to single salinity, drought, and heat stresses, as well as their double and triple stress combinations. Hierarchical clustering analysis of morpho-physiological acclimations showed that several traits exhibited a gradually aggravating effect as plants were exposed to combined stresses. On the other hand, other morphological traits were dominated by salinity, while some physiological traits were shaped by heat stress. Response patterns of differentially expressed genes, under single and combined stresses (i.e. common stress genes), were maintained only among 37% of the genes, indicating a limited expression consistency among partially overlapping stresses. A comparison between common stress genes and genes that were uniquely expressed only under combined stresses (i.e. combination unique genes) revealed a significant shift from increased intensity to antagonistic responses, respectively. The different transcriptional signatures imply an alteration in the mode of action under combined stresses and limited ability to predict plant responses as different stresses are combined. Coexpression analysis coupled with enrichment analysis revealed that each gene subset was enriched with different biological processes. Common stress genes were enriched with known stress response pathways, while combination unique-genes were enriched with unique processes and genes with unknown functions that hold the potential to improve stress tolerance and enhance cereal productivity under suboptimal field conditions. © 2017 American Society of Plant Biologists. All Rights Reserved.
Deciphering the Cryptic Genome: Genome-wide Analyses of the Rice Pathogen Fusarium fujikuroi Reveal Complex Regulation of Secondary Metabolism and Novel Metabolites

PubMed Central

Studt, Lena; Niehaus, Eva-Maria; Espino, Jose J.; Huß, Kathleen; Michielse, Caroline B.; Albermann, Sabine; Wagner, Dominik; Bergner, Sonja V.; Connolly, Lanelle R.; Fischer, Andreas; Reuter, Gunter; Kleigrewe, Karin; Bald, Till; Wingfield, Brenda D.; Ophir, Ron; Freeman, Stanley; Hippler, Michael; Smith, Kristina M.; Brown, Daren W.; Proctor, Robert H.; Münsterkötter, Martin; Freitag, Michael; Humpf, Hans-Ulrich; Güldener, Ulrich; Tudzynski, Bettina

2013-01-01

The fungus Fusarium fujikuroi causes “bakanae” disease of rice due to its ability to produce gibberellins (GAs), but it is also known for producing harmful mycotoxins. However, the genetic capacity for the whole arsenal of natural compounds and their role in the fungus' interaction with rice remained unknown. Here, we present a high-quality genome sequence of F. fujikuroi that was assembled into 12 scaffolds corresponding to the 12 chromosomes described for the fungus. We used the genome sequence along with ChIP-seq, transcriptome, proteome, and HPLC-FTMS-based metabolome analyses to identify the potential secondary metabolite biosynthetic gene clusters and to examine their regulation in response to nitrogen availability and plant signals. The results indicate that expression of most but not all gene clusters correlate with proteome and ChIP-seq data. Comparison of the F. fujikuroi genome to those of six other fusaria revealed that only a small number of gene clusters are conserved among these species, thus providing new insights into the divergence of secondary metabolism in the genus Fusarium. Noteworthy, GA biosynthetic genes are present in some related species, but GA biosynthesis is limited to F. fujikuroi, suggesting that this provides a selective advantage during infection of the preferred host plant rice. Among the genome sequences analyzed, one cluster that includes a polyketide synthase gene (PKS19) and another that includes a non-ribosomal peptide synthetase gene (NRPS31) are unique to F. fujikuroi. The metabolites derived from these clusters were identified by HPLC-FTMS-based analyses of engineered F. fujikuroi strains overexpressing cluster genes. In planta expression studies suggest a specific role for the PKS19-derived product during rice infection. Thus, our results indicate that combined comparative genomics and genome-wide experimental analyses identified novel genes and secondary metabolites that contribute to the evolutionary success of F. fujikuroi as a rice pathogen. PMID:23825955
Identification and Characterization of the Pyridomycin Biosynthetic Gene Cluster of Streptomyces pyridomyceticus NRRL B-2517*

PubMed Central

Huang, Tingting; Wang, Yemin; Yin, Jun; Du, Yanhua; Tao, Meifeng; Xu, Jing; Chen, Wenqing; Lin, Shuangjun; Deng, Zixin

2011-01-01

Pyridomycin is a structurally unique antimycobacterial cyclodepsipeptide containing rare 3-(3-pyridyl)-l-alanine and 2-hydroxy-3-methylpent-2-enoic acid moieties. The biosynthetic gene cluster for pyridomycin has been cloned and identified from Streptomyces pyridomyceticus NRRL B-2517. Sequence analysis of a 42.5-kb DNA region revealed 26 putative open reading frames, including two nonribosomal peptide synthetase (NRPS) genes and a polyketide synthase gene. A special feature is the presence of a polyketide synthase-type ketoreductase domain embedded in an NRPS. Furthermore, we showed that PyrA functioned as an NRPS adenylation domain that activates 3-hydroxypicolinic acid and transfers it to a discrete peptidyl carrier protein, PyrU, which functions as a loading module that initiates pyridomycin biosynthesis in vivo and in vitro. PyrA could also activate other aromatic acids, generating three pyridomycin analogues in vivo. PMID:21454714
AcsF Catalyzes the ATP-dependent Insertion of Nickel into the Ni,Ni-[4Fe4S] Cluster of Acetyl-CoA Synthase*

PubMed Central

Gregg, Christina M.; Goetzl, Sebastian; Jeoung, Jae-Hun

2016-01-01

Acetyl-CoA synthase (ACS) catalyzes the reversible condensation of CO, CoA, and a methyl-cation to form acetyl-CoA at a unique Ni,Ni-[4Fe4S] cluster (the A-cluster). However, it was unknown which proteins support the assembly of the A-cluster. We analyzed the product of a gene from the cluster containing the ACS gene, cooC2 from Carboxydothermus hydrogenoformans, named AcsFCh, and showed that it acts as a maturation factor of ACS. AcsFCh and inactive ACS form a stable 2:1 complex that binds two nickel ions with higher affinity than the individual components. The nickel-bound ACS-AcsFCh complex remains inactive until MgATP is added, thereby converting inactive to active ACS. AcsFCh is a MinD-type ATPase and belongs to the CooC protein family, which can be divided into homologous subgroups. We propose that proteins of one subgroup are responsible for assembling the Ni,Ni-[4Fe4S] cluster of ACS, whereas proteins of a second subgroup mature the [Ni4Fe4S] cluster of carbon monoxide dehydrogenases. PMID:27382049
The unique fold and lability of the [2Fe-2S] clusters of NEET proteins mediate their key functions in health and disease.

PubMed

Karmi, Ola; Marjault, Henri-Baptiste; Pesce, Luca; Carloni, Paolo; Onuchic, Jose' N; Jennings, Patricia A; Mittler, Ron; Nechushtai, Rachel

2018-02-12

NEET proteins comprise a new class of [2Fe-2S] cluster proteins. In human, three genes encode for NEET proteins: cisd1 encodes mitoNEET (mNT), cisd2 encodes the Nutrient-deprivation autophagy factor-1 (NAF-1) and cisd3 encodes MiNT (Miner2). These recently discovered proteins play key roles in many processes related to normal metabolism and disease. Indeed, NEET proteins are involved in iron, Fe-S, and reactive oxygen homeostasis in cells and play an important role in regulating apoptosis and autophagy. mNT and NAF-1 are homodimeric and reside on the outer mitochondrial membrane. NAF-1 also resides in the membranes of the ER associated mitochondrial membranes (MAM) and the ER. MiNT is a monomer with distinct asymmetry in the molecular surfaces surrounding the clusters. Unlike its paralogs mNT and NAF-1, it resides within the mitochondria. NAF-1 and mNT share similar backbone folds to the plant homodimeric NEET protein (At-NEET), while MiNT's backbone fold resembles a bacterial MiNT protein. Despite the variation of amino acid composition among these proteins, all NEET proteins retained their unique CDGSH domain harboring their unique 3Cys:1His [2Fe-2S] cluster coordination through evolution. The coordinating exposed His was shown to convey the lability to the NEET proteins' [2Fe-2S] clusters. In this minireview, we discuss the NEET fold and its structural elements. Special attention is given to the unique lability of the NEETs' [2Fe-2S] cluster and the implication of the latter to the NEET proteins' cellular and systemic function in health and disease.
Biosynthesis of the acetyl‐CoA carboxylase‐inhibiting antibiotic, andrimid in Serratia is regulated by Hfq and the LysR‐type transcriptional regulator, AdmX

PubMed Central

Nogellova, Veronika; Morel, Bertrand; Krell, Tino

2016-01-01

Summary Infections due to multidrug‐resistant bacteria represent a major global health challenge. To combat this problem, new antibiotics are urgently needed and some plant‐associated bacteria are a promising source. The rhizobacterium Serratia plymuthica A153 produces several bioactive secondary metabolites, including the anti‐oomycete and antifungal haterumalide, oocydin A and the broad spectrum polyamine antibiotic, zeamine. In this study, we show that A153 produces a second broad spectrum antibiotic, andrimid. Using genome sequencing, comparative genomics and mutagenesis, we defined new genes involved in andrimid (adm) biosynthesis. Both the expression of the adm gene cluster and regulation of andrimid synthesis were investigated. The biosynthetic cluster is operonic and its expression is modulated by various environmental cues, including temperature and carbon source. Analysis of the genome context of the adm operon revealed a gene encoding a predicted LysR‐type regulator, AdmX, apparently unique to Serratia strains. Mutagenesis and gene expression assays demonstrated that AdmX is a transcriptional activator of the adm gene cluster. At the post‐transcriptional level, the expression of the adm cluster is positively regulated by the RNA chaperone, Hfq, in an RpoS‐independent manner. Our results highlight the complexity of andrimid biosynthesis – an antibiotic with potential clinical and agricultural utility. PMID:26914969
Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering

PubMed Central

2013-01-01

Background The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. Results In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Conclusions Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship. PMID:23845024
Natural history of Ashkenazi intelligence.

PubMed

Cochran, Gregory; Hardy, Jason; Harpending, Henry

2006-09-01

This paper elaborates the hypothesis that the unique demography and sociology of Ashkenazim in medieval Europe selected for intelligence. Ashkenazi literacy, economic specialization, and closure to inward gene flow led to a social environment in which there was high fitness payoff to intelligence, specifically verbal and mathematical intelligence but not spatial ability. As with any regime of strong directional selection on a quantitative trait, genetic variants that were otherwise fitness reducing rose in frequency. In particular we propose that the well-known clusters of Ashkenazi genetic diseases, the sphingolipid cluster and the DNA repair cluster in particular, increase intelligence in heterozygotes. Other Ashkenazi disorders are known to increase intelligence. Although these disorders have been attributed to a bottleneck in Ashkenazi history and consequent genetic drift, there is no evidence of any bottleneck. Gene frequencies at a large number of autosomal loci show that if there was a bottleneck then subsequent gene flow from Europeans must have been very large, obliterating the effects of any bottleneck. The clustering of the disorders in only a few pathways and the presence at elevated frequency of more than one deleterious allele at many of them could not have been produced by drift. Instead these are signatures of strong and recent natural selection.
Enigmatic, ultrasmall, uncultivated Archaea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Brett J.; Comolli, Luis; Dick, Gregory J.

Metagenomics has provided access to genomes of as yet uncultivated microorganisms in natural environments, yet there are gaps in our knowledge particularly for Archaea that occur at relatively low abundance and in extreme environments. Ultrasmall cells (<500 nm in diameter) from lineages without cultivated representatives that branch near the crenarchaeal/euryarchaeal divide have been detected in a variety of acidic ecosystems. We reconstructed composite, near-complete 1-Mb genomes for three lineages, referred to as ARMAN (archaeal Richmond Mine acidophilic nanoorganisms), from environmental samples and a biofilm filtrate. Genes of two lineages are among the smallest yet described, enabling a 10% higher codingmore » density than found genomes of the same size, and there are noncontiguous genes. No biological function could be inferred for up to 45% of genes and no more than 63% of the predicted proteins could be assigned to a revised set of archaeal clusters of orthologous groups. Some core metabolic genes are more common in Crenarchaeota than Euryarchaeota, up to 21% of genes have the highest sequence identity to bacterial genes, and 12 belong to clusters of orthologous groups that were previously exclusive to bacteria. A small subset of 3D cryo-electron tomographic reconstructions clearly show penetration of the ARMAN cell wall and cytoplasmic membranes by protuberances extended from cells of the archaeal order Thermoplasmatales. Interspecies interactions, the presence of a unique internal tubular organelle [Comolli, et al. (2009) ISME J 3:159 167], and many genes previously only affiliated with Crenarchaea or Bacteria indicate extensive unique physiology in organisms that branched close to the time that Cren- and Euryarchaeotal lineages diverged.« less
Enigmatic, ultrasmall, uncultivated Archaea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Brett J.; Comolli, Luis; Dick, Gregory J.

Metagenomics has provided access to genomes of as yet uncultivated microorganisms in natural environments, yet there are gaps in our knowledge-particularly for Archaea-that occur at relatively low abundance and in extreme environments. Ultrasmall cells (<500 nm in diameter) from lineages without cultivated representatives that branch near the crenarchaeal/euryarchaeal divide have been detected in a variety of acidic ecosystems. We reconstructed composite, near-complete similar to 1-Mb genomes for three lineages, referred to as ARMAN (archaeal Richmond Mine acidophilic nanoorganisms), from environmental samples and a biofilm filtrate. Genes of two lineages are among the smallest yet described, enabling a 10% higher codingmore » density than found genomes of the same size, and there are noncontiguous genes. No biological function could be inferred for up to 45% of genes and no more than 63% of the predicted proteins could be assigned to a revised set of archaeal clusters of orthologous groups. Some core metabolic genes are more common in Crenarchaeota than Euryarchaeota, up to 21% of genes have the highest sequence identity to bacterial genes, and 12 belong to clusters of orthologous groups that were previously exclusive to bacteria. A small subset of 3D cryo-electron tomographic reconstructions clearly show penetration of the ARMAN cell wall and cytoplasmic membranes by protuberances extended from cells of the archaeal order Thermoplasmatales. Interspecies interactions, the presence of a unique internal tubular organelle [Comolli, et al. (2009) ISME J 3: 159-167], and many genes previously only affiliated with Crenarchaea or Bacteria indicate extensive unique physiology in organisms that branched close to the time that Cren- and Euryarchaeotal lineages diverged.« less
Structural Characterization and Evolutionary Relationship of High-Molecular-Weight Glutenin Subunit Genes in Roegneria nakaii and Roegneria alashanica.

PubMed

Zhang, Lujun; Li, Zhixin; Fan, Renchun; Wei, Bo; Zhang, Xiangqi

2016-07-19

The Roegneria of Triticeae is a large genus including about 130 allopolyploid species. Little is known about its high-molecular-weight glutenin subunits (HMW-GSs). Here, we reported six novel HMW-GS genes from R. nakaii and R. alashanica. Sequencing indicated that Rny1, Rny3, and Ray1 possessed intact open reading frames (ORFs), whereas Rny2, Rny4, and Ray2 harbored in-frame stop codons. All of the six genes possessed a similar primary structure to known HMW-GS, while showing some unique characteristics. Their coding regions were significantly shorter than Glu-1 genes in wheat. The amino acid sequences revealed that all of the six genes were intermediate towards the y-type. The phylogenetic analysis showed that the HMW-GSs from species with St, StY, or StH genome(s) clustered in an independent clade, varying from the typical x- and y-type clusters. Thus, the Glu-1 locus in R. nakaii and R. alashanica is a very primitive glutenin locus across evolution. The six genes were phylogenetically split into two groups clustered to different clades, respectively, each of the two clades included the HMW-GSs from species with St (diploid and tetraploid species), StY, and StH genomes. Hence, it is concluded that the six Roegneria HMW-GS genes are from two St genomes undergoing slight differentiation.
Shark IgW C region diversification through RNA processing and isotype switching.

PubMed

Zhang, Cecilia; Du Pasquier, Louis; Hsu, Ellen

2013-09-15

Sharks and skates represent the earliest vertebrates with an adaptive immune system based on lymphocyte Ag receptors generated by V(D)J recombination. Shark B cells express two classical Igs, IgM and IgW, encoded by an early, alternative gene organization consisting of numerous autonomous miniloci, where the individual gene cluster carries a few rearranging gene segments and one C region, μ or ω. We have characterized eight distinct Ig miniloci encoding the nurse shark ω H chain. Each cluster consists of VH, D, and JH segments and six to eight C domain exons. Two interspersed secretory exons, in addition to the 3'-most C exon with tailpiece, provide the gene cluster with the ability to generate at least six secreted isoforms that differ as to polypeptide length and C domain combination. All clusters appear to be functional, as judged by the capability for rearrangement and absence of defects in the deduced amino acid sequence. We previously showed that IgW VDJ can perform isotype switching to μ C regions; in this study, we found that switching also occurs between ω clusters. Thus, C region diversification for any IgW VDJ can take place at the DNA level by switching to other ω or μ C regions, as well as by RNA processing to generate different C isoforms. The wide array of pathogens recognized by Abs requires different disposal pathways, and our findings demonstrate complex and unique pathways for C effector function diversity that evolved independently in cartilaginous fishes.
A recently transferred cluster of bacterial genes in Trichomonas vaginalis - lateral gene transfer and the fate of acquired genes

PubMed Central

2014-01-01

Background Lateral Gene Transfer (LGT) has recently gained recognition as an important contributor to some eukaryote proteomes, but the mechanisms of acquisition and fixation in eukaryotic genomes are still uncertain. A previously defined norm for LGTs in microbial eukaryotes states that the majority are genes involved in metabolism, the LGTs are typically localized one by one, surrounded by vertically inherited genes on the chromosome, and phylogenetics shows that a broad collection of bacterial lineages have contributed to the transferome. Results A unique 34 kbp long fragment with 27 clustered genes (TvLF) of prokaryote origin was identified in the sequenced genome of the protozoan parasite Trichomonas vaginalis. Using a PCR based approach we confirmed the presence of the orthologous fragment in four additional T. vaginalis strains. Detailed sequence analyses unambiguously suggest that TvLF is the result of one single, recent LGT event. The proposed donor is a close relative to the firmicute bacterium Peptoniphilus harei. High nucleotide sequence similarity between T. vaginalis strains, as well as to P. harei, and the absence of homologs in other Trichomonas species, suggests that the transfer event took place after the radiation of the genus Trichomonas. Some genes have undergone pseudogenization and degradation, indicating that they may not be retained in the future. Functional annotations reveal that genes involved in informational processes are particularly prone to degradation. Conclusions We conclude that, although the majority of eukaryote LGTs are single gene occurrences, they may be acquired in clusters of several genes that are subsequently cleansed of evolutionarily less advantageous genes. PMID:24898731
Recursive expectation-maximization clustering: A method for identifying buffering mechanisms composed of phenomic modules

NASA Astrophysics Data System (ADS)

Guo, Jingyu; Tian, Dehua; McKinney, Brett A.; Hartman, John L.

2010-06-01

Interactions between genetic and/or environmental factors are ubiquitous, affecting the phenotypes of organisms in complex ways. Knowledge about such interactions is becoming rate-limiting for our understanding of human disease and other biological phenomena. Phenomics refers to the integrative analysis of how all genes contribute to phenotype variation, entailing genome and organism level information. A systems biology view of gene interactions is critical for phenomics. Unfortunately the problem is intractable in humans; however, it can be addressed in simpler genetic model systems. Our research group has focused on the concept of genetic buffering of phenotypic variation, in studies employing the single-cell eukaryotic organism, S. cerevisiae. We have developed a methodology, quantitative high throughput cellular phenotyping (Q-HTCP), for high-resolution measurements of gene-gene and gene-environment interactions on a genome-wide scale. Q-HTCP is being applied to the complete set of S. cerevisiae gene deletion strains, a unique resource for systematically mapping gene interactions. Genetic buffering is the idea that comprehensive and quantitative knowledge about how genes interact with respect to phenotypes will lead to an appreciation of how genes and pathways are functionally connected at a systems level to maintain homeostasis. However, extracting biologically useful information from Q-HTCP data is challenging, due to the multidimensional and nonlinear nature of gene interactions, together with a relative lack of prior biological information. Here we describe a new approach for mining quantitative genetic interaction data called recursive expectation-maximization clustering (REMc). We developed REMc to help discover phenomic modules, defined as sets of genes with similar patterns of interaction across a series of genetic or environmental perturbations. Such modules are reflective of buffering mechanisms, i.e., genes that play a related role in the maintenance of physiological homeostasis. To develop the method, 297 gene deletion strains were selected based on gene-drug interactions with hydroxyurea, an inhibitor of ribonucleotide reductase enzyme activity, which is critical for DNA synthesis. To partition the gene functions, these 297 deletion strains were challenged with growth inhibitory drugs known to target different genes and cellular pathways. Q-HTCP-derived growth curves were used to quantify all gene interactions, and the data were used to test the performance of REMc. Fundamental advantages of REMc include objective assessment of total number of clusters and assignment to each cluster a log-likelihood value, which can be considered an indicator of statistical quality of clusters. To assess the biological quality of clusters, we developed a method called gene ontology information divergence z-score (GOid_z). GOid_z summarizes total enrichment of GO attributes within individual clusters. Using these and other criteria, we compared the performance of REMc to hierarchical and K-means clustering. The main conclusion is that REMc provides distinct efficiencies for mining Q-HTCP data. It facilitates identification of phenomic modules, which contribute to buffering mechanisms that underlie cellular homeostasis and the regulation of phenotypic expression.
Phylogenetic comparisons of a coastal bacterioplankton community with its counterparts in open ocean and freshwater systems.

PubMed

Rappé; Vergin; Giovannoni

2000-09-01

In order to extend previous comparisons between coastal marine bacterioplankton communities and their open ocean and freshwater counterparts, here we summarize and provide new data on a clone library of 105 SSU rRNA genes recovered from seawater collected over the western continental shelf of the USA in the Pacific Ocean. Comparisons to previously published data revealed that this coastal bacterioplankton clone library was dominated by SSU rRNA gene phylotypes originally described from surface waters of the open ocean, but also revealed unique SSU rRNA gene lineages of beta Proteobacteria related to those found in clone libraries from freshwater habitats. beta Proteobacteria lineages common to coastal and freshwater samples included members of a clade of obligately methylotrophic bacteria, SSU rRNA genes affiliated with Xylophilus ampelinus, and a clade related to the genus Duganella. In addition, SSU rRNA genes were recovered from such previously recognized marine bacterioplankton SSU rRNA gene clone clusters as the SAR86, SAR11, and SAR116 clusters within the class Proteobacteria, the Roseobacter clade of the alpha subclass of the Proteobacteria, the marine group A/SAR406 cluster, and the marine Actinobacteria clade. Overall, these results support and extend previous observations concerning the global distribution of several marine planktonic prokaryote SSU rRNA gene phylotypes, but also show that coastal bacterioplankton communities contain SSU rRNA gene lineages (and presumably bacterioplankton) shown previously to be prevalent in freshwater habitats.
Transcriptome analyses of the Giardia lamblia life cycle

PubMed Central

Birkeland, Shanda R.; Preheim, Sarah P.; Davids, Barbara J.; Cipriano, Michael J.; Palm, Daniel; Reiner, David S.; Svärd, Staffan G.; Gillin, Frances D.; McArthur, Andrew G.

2010-01-01

We quantified mRNA abundance from 10 stages in the Giardia lamblia life cycle in vitro using Serial Analysis of Gene Expression (SAGE). 163 abundant transcripts were expressed constitutively. 71 transcripts were upregulated specifically during excystation and 42 during encystation. Nonetheless, the transcriptomes of cysts and trophozoites showed major differences. SAGE detected co-expressed clusters of 284 transcripts differentially expressed in cysts and excyzoites and 287 transcripts in vegetative trophozoites and encysting cells. All clusters included known genes and pathways as well as proteins unique to Giardia or diplomonads. SAGE analysis of the Giardia life cycle identified a number of kinases, phosphatases, and DNA replication proteins involved in excystation and encystation, which could be important for examining the roles of cell signaling in giardial differentiation. Overall, these data pave the way for directed gene discovery and a better understanding of the biology of Giardia lamblia. PMID:20570699
Molecular phylogeny of the higher and lower taxonomy of the Fusarium genus and differences in the evolutionary histories of multiple genes

PubMed Central

2011-01-01

Background Species of the Fusarium genus are important fungi which is associated with health hazards in human and animals. The taxonomy of this genus has been a subject of controversy for many years. Although many researchers have applied molecular phylogenetic analysis to examine the taxonomy of Fusarium species, their phylogenetic relationships remain unclear only few comprehensive phylogenetic analyses of the Fusarium genus and a lack of suitable nucleotides and amino acid substitution rates. A previous stugy with whole genome comparison among Fusairum species revealed the possibility that each gene in Fusarium genomes has a unique evolutionary history, and such gene may bring difficulty to the reconstruction of phylogenetic tree of Fusarium. There is a need not only to check substitution rates of genes but also to perform the exact evaluation of each gene-evolution. Results We performed phylogenetic analyses based on the nucleotide sequences of the rDNA cluster region (rDNA cluster), and the β-tubulin gene (β-tub), the elongation factor 1α gene (EF-1α), and the aminoadipate reductase gene (lys2). Although incongruence of the tree topologies between lys2 and the other genes was detected, all genes supported the classification of Fusarium species into 7 major clades, I to VII. To obtain a reliable phylogeny for Fusarium species, we excluded the lys2 sequences from our dataset, and re-constructed a maximum likelihood (ML) tree based on the combined data of the rDNA cluster, β-tub, and EF-1α. Our ML tree indicated some interesting relationships in the higher and lower taxa of Fusarium species and related genera. Moreover, we observed a novel evolutionary history of lys2. We suggest that the unique tree topologies of lys2 are not due to an analytical artefact, but due to differences in the evolutionary history of genomes caused by positive selection of particular lineages. Conclusion This study showed the reliable species tree of the higher and lower taxonomy in the lineage of the Fusarium genus. Our ML tree clearly indicated 7 major clades within the Fusarium genus. Furthermore, this study reported differences in the evolutionary histories among multiple genes within this genus for the first time. PMID:22047111

Draft genome sequence of marine-derived Streptomyces sp. TP-A0598, a producer of anti-MRSA antibiotic lydicamycins.

PubMed

Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro

2015-01-01

Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.
Whole-genome sequencing of Aspergillus tubingensis G131 and overview of its secondary metabolism potential.

PubMed

Choque, Elodie; Klopp, Christophe; Valiere, Sophie; Raynal, José; Mathieu, Florence

2018-03-15

Black Aspergilli represent one of the most important fungal resources of primary and secondary metabolites for biotechnological industry. Having several black Aspergilli sequenced genomes should allow targeting the production of certain metabolites with bioactive properties. In this study, we report the draft genome of a black Aspergilli, A. tubingensis G131, isolated from a French Mediterranean vineyard. This 35 Mb genome includes 10,994 predicted genes. A genomic-based discovery identifies 80 secondary metabolites biosynthetic gene clusters. Genomic sequences of these clusters were blasted on 3 chosen black Aspergilli genomes: A. tubingensis CBS 134.48, A. niger CBS 513.88 and A. kawachii IFO 4308. This comparison highlights different levels of clusters conservation between the four strains. It also allows identifying seven unique clusters in A. tubingensis G131. Moreover, the putative secondary metabolites clusters for asperazine and naphtho-gamma-pyrones production were proposed based on this genomic analysis. Key biosynthetic genes required for the production of 2 mycotoxins, ochratoxin A and fumonisin, are absent from this draft genome. Even if intergenic sequences of these mycotoxins biosynthetic pathways are present, this could not lead to the production of those mycotoxins by A. tubingensis G131. Functional and bioinformatics analyses of A. tubingensis G131 genome highlight its potential for metabolites production in particular for TAN-1612, asperazine and naphtho-gamma-pyrones presenting antioxidant, anticancer or antibiotic properties.
A mesh generation and machine learning framework for Drosophila gene expression pattern image analysis

PubMed Central

2013-01-01

Background Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods. PMID:24373308
The pap Operon of Avian Pathogenic Escherichia coli Strain O1:K1 Is Located on a Novel Pathogenicity Island

PubMed Central

Kariyawasam, Subhashinie; Johnson, Timothy J.; Nolan, Lisa K.

2006-01-01

We have identified a 56-kb pathogenicity island (PAI) in avian pathogenic Escherichia coli strain O1:K1 (APEC-O1). This PAI, termed PAI IAPEC-O1, is integrated adjacent to the 3′ end of the pheV tRNA gene. It carries putative virulence genes of APEC (pap operon), other E. coli genes (tia and ireA), and a 1.5-kb region unique to APEC-O1. The kps gene cluster required for the biosynthesis of polysialic acid capsule was mapped to a location immediately downstream of this PAI. PMID:16369033
In vitro downregulated hypoxia transcriptome is associated with poor prognosis in breast cancer.

PubMed

Abu-Jamous, Basel; Buffa, Francesca M; Harris, Adrian L; Nandi, Asoke K

2017-06-15

Hypoxia is a characteristic of breast tumours indicating poor prognosis. Based on the assumption that those genes which are up-regulated under hypoxia in cell-lines are expected to be predictors of poor prognosis in clinical data, many signatures of poor prognosis were identified. However, it was observed that cell line data do not always concur with clinical data, and therefore conclusions from cell line analysis should be considered with caution. As many transcriptomic cell-line datasets from hypoxia related contexts are available, integrative approaches which investigate these datasets collectively, while not ignoring clinical data, are required. We analyse sixteen heterogeneous breast cancer cell-line transcriptomic datasets in hypoxia-related conditions collectively by employing the unique capabilities of the method, UNCLES, which integrates clustering results from multiple datasets and can address questions that cannot be answered by existing methods. This has been demonstrated by comparison with the state-of-the-art iCluster method. From this collection of genome-wide datasets include 15,588 genes, UNCLES identified a relatively high number of genes (>1000 overall) which are consistently co-regulated over all of the datasets, and some of which are still poorly understood and represent new potential HIF targets, such as RSBN1 and KIAA0195. Two main, anti-correlated, clusters were identified; the first is enriched with MYC targets participating in growth and proliferation, while the other is enriched with HIF targets directly participating in the hypoxia response. Surprisingly, in six clinical datasets, some sub-clusters of growth genes are found consistently positively correlated with hypoxia response genes, unlike the observation in cell lines. Moreover, the ability to predict bad prognosis by a combined signature of one sub-cluster of growth genes and one sub-cluster of hypoxia-induced genes appears to be comparable and perhaps greater than that of known hypoxia signatures. We present a clustering approach suitable to integrate data from diverse experimental set-ups. Its application to breast cancer cell line datasets reveals new hypoxia-regulated signatures of genes which behave differently when in vitro (cell-line) data is compared with in vivo (clinical) data, and are of a prognostic value comparable or exceeding the state-of-the-art hypoxia signatures.
Genome analysis of Hibiscus syriacus provides insights of polyploidization and indeterminate flowering in woody plants

PubMed Central

Kim, Yong-Min; Kim, Seungill; Koo, Namjin; Shin, Ah-Young; Yeom, Seon-In; Seo, Eunyoung; Park, Seong-Jin; Kang, Won-Hee; Kim, Myung-Shin; Park, Jieun; Jang, Insu; Kim, Pan-Gyu; Byeon, Iksu; Kim, Min-Seo; Choi, JinHyuk; Ko, Gunhwan; Hwang, JiHye; Yang, Tae-Jin; Choi, Sang-Bong; Lee, Je Min; Lim, Ki-Byung; Lee, Jungho; Choi, Ik-Young; Park, Beom-Seok; Kwon, Suk-Yoon; Choi, Doil

2017-01-01

Abstract Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology. PMID:28011721
Rf8-mediated T-urf13 transcript accumulation coincides with a pentatricopeptide repeat cluster on maize chromosome 2L

USDA-ARS?s Scientific Manuscript database

Cytoplasmic male sterility (CMS) is a maternally inherited inability to produce functional pollen. In T-cytoplasm maize, CMS results from the action of the URF13 mitochondrial pore-forming protein, encoded by the unique T-urf13 mitochondrial gene. Full or partial restoration of fertility to T-cyto...
Evolutionary insights from Erwinia amylovora genomics.

PubMed

Smits, Theo H M; Rezzonico, Fabio; Duffy, Brion

2011-08-20

Evolutionary genomics is coming into focus with the recent availability of complete sequences for many bacterial species. A hypothesis on the evolution of virulence factors in the plant pathogen Erwinia amylovora, the causative agent of fire blight, was generated using comparative genomics with the genomes E. amylovora, Erwinia pyrifoliae and Erwinia tasmaniensis. Putative virulence factors were mapped to the proposed genealogy of the genus Erwinia that is based on phylogenetic and genomic data. Ancestral origin of several virulence factors was identified, including levan biosynthesis, sorbitol metabolism, three T3SS and two T6SS. Other factors appeared to have been acquired after divergence of pathogenic species, including a second flagellar gene and two glycosyltransferases involved in amylovoran biosynthesis. E. amylovora singletons include 3 unique T3SS effectors that may explain differential virulence/host ranges. E. amylovora also has a unique T1SS export system, and a unique third T6SS gene cluster. Genetic analysis revealed signatures of foreign DNA suggesting that horizontal gene transfer is responsible for some of these differential features between the three species. Copyright © 2010 Elsevier B.V. All rights reserved.
IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

PubMed

Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

2015-07-14

In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.
Spread of carbapenem-resistant international clones of Acinetobacter baumannii in Turkey and Azerbaijan: a collaborative study.

PubMed

Ahmed, S S; Alp, E; Ulu-Kilic, A; Dinc, G; Aktas, Z; Ada, B; Bagirova, F; Baran, I; Ersoy, Y; Esen, S; Guven, T G; Hopman, J; Hosoglu, S; Koksal, F; Parlak, E; Yalcin, A N; Yilmaz, G; Voss, A; Melchers, W

2016-09-01

Epidemic clones of Acinetobacter baumannii, described as European clones I, II, and III, are associated with hospital epidemics throughout the world. We aimed to determine the molecular characteristics and genetic diversity between European clones I, II, and III from Turkey and Azerbaijan. In this study, a total of 112 bloodstream isolates of carbapenem-resistant Acinetobacter spp. were collected from 11 hospitals across Turkey and Azerbaijan. The identification of Acinetobacter spp. using conventional and sensitivity tests was performed by standard criteria. Multiplex polymerase chain reaction (PCR) was used to detect OXA carbapenemase-encoding genes (bla OXA-23-like, bla OXA-24-like, bla OXA-51-like, and bla OXA-58-like). Pulsed-field gel electrophoresis (PFGE) typing was used to investigate genetic diversity. The bla OXA-51-like gene was present in all 112 isolates, 75 (67 %) carried bla OXA-23-like, 7 (6.2 %) carried bla OXA-58-like genes, and 5 (4.5 %) carried bla OXA-24-like genes. With a 90 % similarity cut-off value, 15 clones and eight unique isolates were identified. The largest clone was cluster D, with six subtypes. Isolates from clusters D and I were widely spread in seven different geographical regions throughout Turkey. However, F cluster was found in the northern and eastern regions of Turkey. EU clone I was grouped within J cluster with three isolates found in Antalya, Istanbul, and Erzurum. EU clone II was grouped in the U cluster with 15 isolates and found in Kayseri and Diyarbakır. The bla OXA-24-like gene in carbapenemases was identified rarely in Turkey and has been reported for the first time from Azerbaijan. Furthermore, this is the first multicenter study in Turkey and Azerbaijan to identify several major clusters belonging to European clones I and II of A. baumannii.
Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239

PubMed Central

Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki

2011-01-01

We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486
A dynamic intron retention program enriched in RNA processing genes regulates gene expression during terminal erythropoiesis

DOE PAGES

Pimentel, Harold; Parra, Marilyn; Gee, Sherry L.; ...

2015-11-03

Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentallydynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ~50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclearlocalized. Splicemore » site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. Finally, we conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.« less
A dynamic intron retention program enriched in RNA processing genes regulates gene expression during terminal erythropoiesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pimentel, Harold; Parra, Marilyn; Gee, Sherry L.

Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentallydynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ~50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclearlocalized. Splicemore » site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. Finally, we conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.« less
Gene Expression Profiles in Paired Gingival Biopsies from Periodontitis-Affected and Healthy Tissues Revealed by Massively Parallel Sequencing

PubMed Central

Båge, Tove; Lagervall, Maria; Jansson, Leif; Lundeberg, Joakim; Yucel-Lindberg, Tülay

2012-01-01

Periodontitis is a chronic inflammatory disease affecting the soft tissue and bone that surrounds the teeth. Despite extensive research, distinctive genes responsible for the disease have not been identified. The objective of this study was to elucidate transcriptome changes in periodontitis, by investigating gene expression profiles in gingival tissue obtained from periodontitis-affected and healthy gingiva from the same patient, using RNA-sequencing. Gingival biopsies were obtained from a disease-affected and a healthy site from each of 10 individuals diagnosed with periodontitis. Enrichment analysis performed among uniquely expressed genes for the periodontitis-affected and healthy tissues revealed several regulated pathways indicative of inflammation for the periodontitis-affected condition. Hierarchical clustering of the sequenced biopsies demonstrated clustering according to the degree of inflammation, as observed histologically in the biopsies, rather than clustering at the individual level. Among the top 50 upregulated genes in periodontitis-affected tissues, we investigated two genes which have not previously been demonstrated to be involved in periodontitis. These included interferon regulatory factor 4 and chemokine (C-C motif) ligand 18, which were also expressed at the protein level in gingival biopsies from patients with periodontitis. In conclusion, this study provides a first step towards a quantitative comprehensive insight into the transcriptome changes in periodontitis. We demonstrate for the first time site-specific local variation in gene expression profiles of periodontitis-affected and healthy tissues obtained from patients with periodontitis, using RNA-seq. Further, we have identified novel genes expressed in periodontitis tissues, which may constitute potential therapeutic targets for future treatment strategies of periodontitis. PMID:23029519
Differentially expressed genes of Coptotermes formosanus (Isoptera: Rhinotermitidae) challenged by chemical insecticides.

PubMed

Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou

2013-08-01

Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.
Comparative genomic analysis of clinical and environmental Vibrio vulnificus isolates revealed biotype 3 evolutionary relationships.

PubMed

Koton, Yael; Gordon, Michal; Chalifa-Caspi, Vered; Bisharat, Naiel

2014-01-01

In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59 and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C) and environmental (E), all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins) were present in all human pathogenic strains (both biotype 3 and non-biotype 3) and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS) proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and formed a genetically distinct group within the E-cluster. The unique epidemiological circumstances facilitated disease outbreak and brought this genotype to the attention of the scientific community.
The genetic basis for the biosynthesis of the pharmaceutically important class of epoxyketone proteasome inhibitors

PubMed Central

Schorn, Michelle; Zettler, Judith; Noel, Joseph P.; Dorrestein, Pieter C.; Moore, Bradley S.; Kaysser, Leonard

2013-01-01

The epoxyketone proteasome inhibitors are an established class of therapeutic agents for the treatment of cancer. Their unique α′,β′-epoxyketone pharmacophore allows binding to the catalytic β-subunits of the proteasome with extraordinary specificity. Here we report the characterization of the first gene clusters for the biosynthesis of natural peptidyl-epoxyketones. The clusters for epoxomicin, the lead compound for the anti-cancer drug Kyprolis™, and for eponemycin were identified in the actinobacterial producer strains ATCC 53904 and Streptomyces hygroscopicus ATCC 53709, respectively, using a modified protocol for Ion Torrent PGM genome sequencing. Both gene clusters code for a hybrid non-ribosomal peptide synthetase/polyketide synthase multifunctional enzyme complex and homologous redox enzymes. Epoxomicin and eponemycin were heterologously produced in Streptomyces albus J1046 via whole pathway expression. Moreover, we employed mass spectral molecular networking for a new comparative metabolomics approach in a heterologous system and discovered a number of putative epoxyketone derivatives. With this study we have definitively linked epoxyketone proteasome inhibitors and their biosynthesis genes for the first time in any organism, which will now allow for their detailed biochemical investigation. PMID:24168704
Antibiotic Susceptibility and Molecular Diversity of Bacillus anthracis Strains in Chad: Detection of a New Phylogenetic Subgroup

PubMed Central

Maho, Angaya; Rossano, Alexandra; Hächler, Herbert; Holzer, Anita; Schelling, Esther; Zinsstag, Jakob; Hassane, Mahamat H.; Toguebaye, Bhen S.; Akakpo, Ayayi J.; Van Ert, Matthew; Keim, Paul; Kenefic, Leo; Frey, Joachim; Perreten, Vincent

2006-01-01

We genotyped 15 Bacillus anthracis isolates from Chad, Africa, using multiple-locus variable-number tandem repeat analysis and three additional direct-repeat markers. We identified two unique genotypes that represent a novel genetic lineage in the A cluster. Chadian isolates were susceptible to 11 antibiotics and free of 94 antibiotic resistance genes. PMID:16954291
Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes.

PubMed

Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W

2018-02-14

Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.
An RNA-Seq Transcriptome Analysis of Orthophosphate-Deficient White Lupin Reveals Novel Insights into Phosphorus Acclimation in Plants1[W][OA

PubMed Central

O’Rourke, Jamie A.; Yang, S. Samuel; Miller, Susan S.; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W.; Vance, Carroll P.

2013-01-01

Phosphorus, in its orthophosphate form (Pi), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to Pi deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in Pi-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to Pi supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to Pi deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to Pi deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the Pi status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in Pi deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to Pi deficiency. PMID:23197803

An RNA-Seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants.

PubMed

O'Rourke, Jamie A; Yang, S Samuel; Miller, Susan S; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W; Vance, Carroll P

2013-02-01

Phosphorus, in its orthophosphate form (P(i)), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to P(i) deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in P(i)-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to P(i) supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to P(i) deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to P(i) deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the P(i) status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in P(i) deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to P(i) deficiency.
Mapping in an apple (Malus x domestica) F1 segregating population based on physical clustering of differentially expressed genes.

PubMed

Jensen, Philip J; Fazio, Gennaro; Altman, Naomi; Praul, Craig; McNellis, Timothy W

2014-04-04

Apple tree breeding is slow and difficult due to long generation times, self-incompatibility, and complex genetics. The identification of molecular markers linked to traits of interest is a way to expedite the breeding process. In the present study, we aimed to identify genes whose steady-state transcript abundance was associated with inheritance of specific traits segregating in an apple (Malus × domestica) rootstock F1 breeding population, including resistance to powdery mildew (Podosphaera leucotricha) disease and woolly apple aphid (Eriosoma lanigerum). Transcription profiling was performed for 48 individual F1 apple trees from a cross of two highly heterozygous parents, using RNA isolated from healthy, actively-growing shoot tips and a custom apple DNA oligonucleotide microarray representing 26,000 unique transcripts. Genome-wide expression profiles were not clear indicators of powdery mildew or woolly apple aphid resistance phenotype. However, standard differential gene expression analysis between phenotypic groups of trees revealed relatively small sets of genes with trait-associated expression levels. For example, thirty genes were identified that were differentially expressed between trees resistant and susceptible to powdery mildew. Interestingly, the genes encoding twenty-four of these transcripts were physically clustered on chromosome 12. Similarly, seven genes were identified that were differentially expressed between trees resistant and susceptible to woolly apple aphid, and the genes encoding five of these transcripts were also clustered, this time on chromosome 17. In each case, the gene clusters were in the vicinity of previously identified major quantitative trait loci for the corresponding trait. Similar results were obtained for a series of molecular traits. Several of the differentially expressed genes were used to develop DNA polymorphism markers linked to powdery mildew disease and woolly apple aphid resistance. Gene expression profiling and trait-associated transcript analysis using an apple F1 population readily identified genes physically linked to powdery mildew disease resistance and woolly apple aphid resistance loci. This result was especially useful in apple, where extreme levels of heterozygosity make the development of reliable DNA markers quite difficult. The results suggest that this approach could prove effective in crops with complicated genetics, or for which few genomic information resources are available.
A cluster of coregulated genes determines TGF-β–induced regulatory T-cell (Treg) dysfunction in NOD mice

PubMed Central

D'Alise, Anna Morena; Ergun, Ayla; Hill, Jonathan A.; Mathis, Diane; Benoist, Christophe

2011-01-01

Foxp3+ regulatory T cells (Tregs) originate in the thymus, but the Treg phenotype can also be induced in peripheral lymphoid organs or in vitro by stimulation of conventional CD4+ T cells with IL-2 and TGF-β. There have been divergent reports on the suppressive capacity of these TGF-Treg cells. We find that TGF-Tregs derived from diabetes-prone NOD mice, although expressing normal Foxp3 levels, are uniquely defective in suppressive activity, whereas TGF-Tregs from control strains (B6g7) or ex vivo Tregs from NOD mice all function normally. Most Treg-typical transcripts were shared by NOD or B6g7 TGF-Tregs, except for a small group of differentially expressed genes, including genes relevant for suppressive activity (Lrrc32, Ctla4, and Cd73). Many of these transcripts form a coregulated cluster in a broader analysis of T-cell differentiation. The defect does not map to idd3 or idd5 regions. Whereas Treg cells from NOD mice are normal in spleen and lymph nodes, the NOD defect is observed in locations that have been tied to pathogenesis of diabetes (small intestine lamina propria and pancreatic lymph node). Thus, a genetic defect uniquely affects a specific Treg subpopulation in NOD mice, in a manner consistent with a role in determining diabetes susceptibility. PMID:21543717
A cluster of coregulated genes determines TGF-beta-induced regulatory T-cell (Treg) dysfunction in NOD mice.

PubMed

D'Alise, Anna Morena; Ergun, Ayla; Hill, Jonathan A; Mathis, Diane; Benoist, Christophe

2011-05-24

Foxp3(+) regulatory T cells (Tregs) originate in the thymus, but the Treg phenotype can also be induced in peripheral lymphoid organs or in vitro by stimulation of conventional CD4(+) T cells with IL-2 and TGF-β. There have been divergent reports on the suppressive capacity of these TGF-Treg cells. We find that TGF-Tregs derived from diabetes-prone NOD mice, although expressing normal Foxp3 levels, are uniquely defective in suppressive activity, whereas TGF-Tregs from control strains (B6g7) or ex vivo Tregs from NOD mice all function normally. Most Treg-typical transcripts were shared by NOD or B6g7 TGF-Tregs, except for a small group of differentially expressed genes, including genes relevant for suppressive activity (Lrrc32, Ctla4, and Cd73). Many of these transcripts form a coregulated cluster in a broader analysis of T-cell differentiation. The defect does not map to idd3 or idd5 regions. Whereas Treg cells from NOD mice are normal in spleen and lymph nodes, the NOD defect is observed in locations that have been tied to pathogenesis of diabetes (small intestine lamina propria and pancreatic lymph node). Thus, a genetic defect uniquely affects a specific Treg subpopulation in NOD mice, in a manner consistent with a role in determining diabetes susceptibility.
The Methanosarcina barkeri genome: comparative analysis withMethanosarcina acetivorans and Methanosarcina mazei reveals extensiverearrangement within methanosarcinal genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.

2006-05-19

We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri,more » 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.« less
Cloning, Sequencing, and Functional Analysis of an Iterative Type I Polyketide Synthase Gene Cluster for Biosynthesis of the Antitumor Chlorinated Polyenone Neocarzilin in “Streptomyces carzinostaticus”

PubMed Central

Otsuka, Miyuki; Ichinose, Koji; Fujii, Isao; Ebizuka, Yutaka

2004-01-01

Neocarzilins (NCZs) are antitumor chlorinated polyenones produced by “Streptomyces carzinostaticus” var. F-41. The gene cluster responsible for the biosynthesis of NCZs was cloned and characterized. DNA sequence analysis of a 33-kb region revealed a cluster of 14 open reading frames (ORFs), three of which (ORF4, ORF5, and ORF6) encode type I polyketide synthase (PKS), which consists of four modules. Unusual features of the modular organization is the lack of an obvious acyltransferase domain on modules 2 and 4 and the presence of longer interdomain regions more than 200 amino acids in length on each module. Involvement of the PKS genes in NCZ biosynthesis was demonstrated by heterologous expression of the cluster in Streptomyces coelicolor CH999, which produced the apparent NCZ biosynthetic intermediates dechloroneocarzillin A and dechloroneocarzilin B. Disruption of ORF5 resulted in a failure of NCZ production, providing further evidence that the cluster is essential for NCZ biosynthesis. Mechanistic consideration of NCZ formation indicates the iterative use of at least one module of the PKS, which subsequently releases its product by decarboxylation to generate an NCZ skeleton, possibly catalyzed by a type II thioesterase encoded by ORF7. This is a novel type I PKS system of bacterial origin for the biosynthesis of a reduced polyketide chain. Additionally, the protein encoded by ORF3, located upstream of the PKS genes, closely resembles the FADH2-dependent halogenases involved in the formation of halometabolites. The ORF3 protein could be responsible for the halogenation of NCZs, presenting a unique example of a halogenase involved in the biosynthesis of an aliphatic halometabolite. PMID:15328113
Accumulation of slightly deleterious mutations in the mitochondrial genome: a hallmark of animal domestication.

PubMed

Hughes, Austin L

2013-02-15

The hypothesis that domestication leads to a relaxation of purifying selection on mitochondrial (mt) genomes was tested by comparative analysis of mt genes from dog, pig, chicken, and silkworm. The three vertebrate species showed mt genome phylogenies in which domestic and wild isolates were intermingled, whereas the domestic silkworm (Bombyx mori) formed a distinct cluster nested within its closest wild relative (Bombyx mandarina). In spite of these differences in phylogenetic pattern, significantly greater proportions of nonsynonymous SNPs than of synonymous SNPs were unique to the domestic populations of all four species. Likewise, in all four species, significantly greater proportions of RNA-encoding SNPs than of synonymous SNPs were unique to the domestic populations. Thus, domestic populations were characterized by an excess of unique polymorphisms in two categories generally subject to purifying selection: nonsynonymous sites and RNA-encoding sites. Many of these unique polymorphisms thus seem likely to be slightly deleterious; the latter hypothesis was supported by the generally lower gene diversities of polymorphisms unique to domestic populations in comparison to those of polymorphisms shared by domestic and wild populations. Copyright © 2012 Elsevier B.V. All rights reserved.
A Unique Procedure to Identify Cell Surface Markers Through a Spherical Self-Organizing Map Applied to DNA Microarray Analysis.

PubMed

Sugii, Yuh; Kasai, Tomonari; Ikeda, Masashi; Vaidyanath, Arun; Kumon, Kazuki; Mizutani, Akifumi; Seno, Akimasa; Tokutaka, Heizo; Kudoh, Takayuki; Seno, Masaharu

2016-01-01

To identify cell-specific markers, we designed a DNA microarray platform with oligonucleotide probes for human membrane-anchored proteins. Human glioma cell lines were analyzed using microarray and compared with normal and fetal brain tissues. For the microarray analysis, we employed a spherical self-organizing map, which is a clustering method suitable for the conversion of multidimensional data into two-dimensional data and displays the relationship on a spherical surface. Based on the gene expression profile, the cell surface characteristics were successfully mirrored onto the spherical surface, thereby distinguishing normal brain tissue from the disease model based on the strength of gene expression. The clustered glioma-specific genes were further analyzed by polymerase chain reaction procedure and immunocytochemical staining of glioma cells. Our platform and the following procedure were successfully demonstrated to categorize the genes coding for cell surface proteins that are specific to glioma cells. Our assessment demonstrates that a spherical self-organizing map is a valuable tool for distinguishing cell surface markers and can be employed in marker discovery studies for the treatment of cancer.
Discovery, biosynthesis, and rational engineering of novel enterocin and wailupemycin polyketide analogues.

PubMed

Kalaitzis, John A

2013-01-01

The marine actinomycete Streptomyces maritimus produces a structurally diverse set of unusual polyketide natural products including the major metabolite enterocin. Investigations of enterocin biosynthesis revealed that the unique carbon skeleton is derived from an aromatic polyketide pathway which is genetically coded by the 21.3 kb enc gene cluster in S. maritimus. Characterization of the enc biosynthesis gene cluster and subsequent manipulation of it via heterologous expression and/or mutagenesis enabled the discovery of other enc-based metabolites that were produced in only very minor amounts in the wild type. Also described are techniques used to harness the enterocin biosynthetic machinery in order to generate unnatural enc-derived polyketide analogues. This review focuses upon the molecular methods used in combination with classical natural products detection and isolation techniques to access minor metabolites of the S. maritimus secondary metabolome.
Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization.

PubMed

Yang, Haixuan; Seoighe, Cathal

2016-01-01

Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.
Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

PubMed

Feltus, F Alex; Ficklin, Stephen P; Gibson, Scott M; Smith, Melissa C

2013-06-05

In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.
Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

PubMed Central

2013-01-01

Background In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. Results A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Conclusions Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired. PMID:23738693
FnrL and Three Dnr Regulators Are Used for the Metabolic Adaptation to Low Oxygen Tension in Dinoroseobacter shibae

PubMed Central

Ebert, Matthias; Laaß, Sebastian; Thürmer, Andrea; Roselius, Louisa; Eckweiler, Denitsa; Daniel, Rolf; Härtig, Elisabeth; Jahn, Dieter

2017-01-01

The heterotrophic marine bacterium Dinoroseobacter shibae utilizes aerobic respiration and anaerobic denitrification supplemented with aerobic anoxygenic photosynthesis for energy generation. The aerobic to anaerobic transition is controlled by four Fnr/Crp family regulators in a unique cascade-type regulatory network. FnrL is utilizing an oxygen-sensitive Fe-S cluster for oxygen sensing. Active FnrL is inducing most operons encoding the denitrification machinery and the corresponding heme biosynthesis. Activation of gene expression of the high oxygen affinity cbb3-type and repression of the low affinity aa3-type cytochrome c oxidase is mediated by FnrL. Five regulator genes including dnrE and dnrF are directly controlled by FnrL. Multiple genes of the universal stress protein (USP) and cold shock response are further FnrL targets. DnrD, most likely sensing NO via a heme cofactor, co-induces genes of denitrification, heme biosynthesis, and the regulator genes dnrE and dnrF. DnrE is controlling genes for a putative Na+/H+ antiporter, indicating a potential role of a Na+ gradient under anaerobic conditions. The formation of the electron donating primary dehydrogenases is coordinated by FnrL and DnrE. Many plasmid encoded genes were DnrE regulated. DnrF is controlling directly two regulator genes including the Fe-S cluster biosynthesis regulator iscR, genes of the electron transport chain and the glutathione metabolism. The genes for nitrate reductase and CO dehydrogenase are repressed by DnrD and DnrF. Both regulators in concert with FnrL are inducing the photosynthesis genes. One of the major denitrification operon control regions, the intergenic region between nirS and nosR2, contains one Fnr/Dnr binding site. Using regulator gene mutant strains, lacZ-reporter gene fusions in combination with promoter mutagenesis, the function of the single Fnr/Dnr binding site for FnrL-, DnrD-, and partly DnrF-dependent nirS and nosR2 transcriptional activation was shown. Overall, the unique regulatory network of the marine bacterium D. shibae for the transition from aerobic to anaerobic growth composed of four Crp/Fnr family regulators was elucidated. PMID:28473807
[Diversity of beta-proteobacterial ammonia-oxidizing bacteria and ammonia-oxidizing archaea in shrimp farm sediment].

PubMed

Gao, Lihai; Lin, Weitie

2011-01-01

In order to study the diversity of ammonia-oxidizing bacteria (AOB) and ammonia-oxidizing archaea (AOA) in shrimp farm sediment. Total microbial DNA was directly extracted from the shrimp farm sediment. The clone library of amoA genes were constructed with beta-Proteobacterial-AOB and AOA specific primers. The library was screened by PCR-restriction fragment length polymorphism (RFLP) analysis and clones with unique RFLP patterns were sequenced. Phylogenetic analyses of the amoA gene fragments showed that all AOB sequences from shrimp farm sediment were affiliated with Nitrosomonas (61.54%) or Nitrosomonas-like (38. 46%) species and grouped into Nitrosomonas communis cluster, Nitrosomonas sp. Nm148 cluster, Nitrosomonas oligotropha cluster. All AOA sequences belonged to the kingdom Crenarchaeote except that one Operational Taxa Unit (OTU) sequence was Unclassified-Archaea and fell within cluster S (soil origin). AOB and AOA species composition included 13 OTUs and 9 OTUs. The clone coverage of bacterial and archaeal amoA genes was 73.47% and 90.43%. The Shannon-Wiener index, Evenness index, Simpson index and Richness index of AOB were higher than those of AOA. These findings represent the first detailed examination of archaeal amoA diversity in shrimp farm sediment and demonstrate that diverse communities of Crenarchaeote capable of ammonia oxidation are present within shrimp farm sediment, where they may be actively involved in nitrification.
Genome analysis of Hibiscus syriacus provides insights of polyploidization and indeterminate flowering in woody plants.

PubMed

Kim, Yong-Min; Kim, Seungill; Koo, Namjin; Shin, Ah-Young; Yeom, Seon-In; Seo, Eunyoung; Park, Seong-Jin; Kang, Won-Hee; Kim, Myung-Shin; Park, Jieun; Jang, Insu; Kim, Pan-Gyu; Byeon, Iksu; Kim, Min-Seo; Choi, JinHyuk; Ko, Gunhwan; Hwang, JiHye; Yang, Tae-Jin; Choi, Sang-Bong; Lee, Je Min; Lim, Ki-Byung; Lee, Jungho; Choi, Ik-Young; Park, Beom-Seok; Kwon, Suk-Yoon; Choi, Doil; Kim, Ryan W

2017-02-01

Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Whole Genome Analysis of Leptospira licerasiae Provides Insight into Leptospiral Evolution and Pathogenicity

PubMed Central

Selengut, Jeremy D.; Harkins, Derek M.; Patra, Kailash P.; Moreno, Angelo; Lehmann, Jason S.; Purushe, Janaki; Sanka, Ravi; Torres, Michael; Webster, Nicholas J.; Vinetz, Joseph M.; Matthias, Michael A.

2012-01-01

The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics. PMID:23145189
Up-regulation of HOXB cluster genes are epigenetically regulated in tamoxifen-resistant MCF7 breast cancer cells.

PubMed

Yang, Seoyeon; Lee, Ji-Yeon; Hur, Ho; Oh, Ji Hoon; Kim, Myoung Hee

2018-05-28

Tamoxifen (TAM) is commonly used to treat estrogen receptor (ER)-positive breast cancer. Despite the remarkable benefits, resistance to TAM presents a serious therapeutic challenge. Since several HOX transcription factors have been proposed as strong candidates in the development of resistance to TAM therapy in breast cancer, we generated an in vitro model of acquired TAM resistance using ER-positive MCF7 breast cancer cells (MCF7-TAMR), and analyzed the expression pattern and epigenetic states of HOX genes. HOXB cluster genes were uniquely up-regulated in MCF7-TAMR cells. Survival analysis of in slico data showed the correlation of high expression of HOXB genes with poor response to TAM in ER-positive breast cancer patients treated with TAM. Gain- and loss-of-function experiments showed that the overexpression of multi HOXB genes in MCF7 renders cancer cells more resistant to TAM, whereas the knockdown restores TAM sensitivity. Furthermore, activation of HOXB genes in MCF7-TAMR was associated with histone modifications, particularly the gain of H3K9ac. These findings imply that the activation of HOXB genes mediate the development of TAM resistance, and represent a target for development of new strategies to prevent or reverse TAM resistance.
Specialized adaptation of a lactic acid bacterium to the milk environment: the comparative genomics of Streptococcus thermophilus LMD-9

PubMed Central

2011-01-01

Background Streptococcus thermophilus represents the only species among the streptococci that has “Generally Regarded As Safe” status and that plays an economically important role in the fermentation of yogurt and cheeses. We conducted comparative genome analysis of S. thermophilus LMD-9 to identify unique gene features as well as features that contribute to its adaptation to the dairy environment. In addition, we investigated the transcriptome response of LMD-9 during growth in milk in the presence of Lactobacillus delbrueckii ssp. bulgaricus, a companion culture in yogurt fermentation, and during lytic bacteriophage infection. Results The S. thermophilus LMD-9 genome is comprised of a 1.8 Mbp circular chromosome (39.1% GC; 1,834 predicted open reading frames) and two small cryptic plasmids. Genome comparison with the previously sequenced LMG 18311 and CNRZ1066 strains revealed 114 kb of LMD-9 specific chromosomal region, including genes that encode for histidine biosynthetic pathway, a cell surface proteinase, various host defense mechanisms and a phage remnant. Interestingly, also unique to LMD-9 are genes encoding for a putative mucus-binding protein, a peptide transporter, and exopolysaccharide biosynthetic proteins that have close orthologs in human intestinal microorganisms. LMD-9 harbors a large number of pseudogenes (13% of ORFeome), indicating that like LMG 18311 and CNRZ1066, LMD-9 has also undergone major reductive evolution, with the loss of carbohydrate metabolic genes and virulence genes found in their streptococcal counterparts. Functional genome distribution analysis of ORFeomes among streptococci showed that all three S. thermophilus strains formed a distinct functional cluster, further establishing their specialized adaptation to the nutrient-rich milk niche. An upregulation of CRISPR1 expression in LMD-9 during lytic bacteriophage DT1 infection suggests its protective role against phage invasion. When co-cultured with L. bulgaricus, LMD-9 overexpressed genes involved in amino acid transport and metabolism as well as DNA replication. Conclusions The genome of S. thermophilus LMD-9 is shaped by its domestication in the dairy environment, with gene features that conferred rapid growth in milk, stress response mechanisms and host defense systems that are relevant to its industrial applications. The presence of a unique exopolysaccharide gene cluster and cell surface protein orthologs commonly associated with probiotic functionality revealed potential probiotic applications of LMD-9. PMID:21995282
Specialized adaptation of a lactic acid bacterium to the milk environment: the comparative genomics of Streptococcus thermophilus LMD-9.

PubMed

Goh, Yong Jun; Goin, Caitlin; O'Flaherty, Sarah; Altermann, Eric; Hutkins, Robert

2011-08-30

Streptococcus thermophilus represents the only species among the streptococci that has "Generally Regarded As Safe" status and that plays an economically important role in the fermentation of yogurt and cheeses. We conducted comparative genome analysis of S. thermophilus LMD-9 to identify unique gene features as well as features that contribute to its adaptation to the dairy environment. In addition, we investigated the transcriptome response of LMD-9 during growth in milk in the presence of Lactobacillus delbrueckii ssp. bulgaricus, a companion culture in yogurt fermentation, and during lytic bacteriophage infection. The S. thermophilus LMD-9 genome is comprised of a 1.8 Mbp circular chromosome (39.1% GC; 1,834 predicted open reading frames) and two small cryptic plasmids. Genome comparison with the previously sequenced LMG 18311 and CNRZ1066 strains revealed 114 kb of LMD-9 specific chromosomal region, including genes that encode for histidine biosynthetic pathway, a cell surface proteinase, various host defense mechanisms and a phage remnant. Interestingly, also unique to LMD-9 are genes encoding for a putative mucus-binding protein, a peptide transporter, and exopolysaccharide biosynthetic proteins that have close orthologs in human intestinal microorganisms. LMD-9 harbors a large number of pseudogenes (13% of ORFeome), indicating that like LMG 18311 and CNRZ1066, LMD-9 has also undergone major reductive evolution, with the loss of carbohydrate metabolic genes and virulence genes found in their streptococcal counterparts. Functional genome distribution analysis of ORFeomes among streptococci showed that all three S. thermophilus strains formed a distinct functional cluster, further establishing their specialized adaptation to the nutrient-rich milk niche. An upregulation of CRISPR1 expression in LMD-9 during lytic bacteriophage DT1 infection suggests its protective role against phage invasion. When co-cultured with L. bulgaricus, LMD-9 overexpressed genes involved in amino acid transport and metabolism as well as DNA replication. The genome of S. thermophilus LMD-9 is shaped by its domestication in the dairy environment, with gene features that conferred rapid growth in milk, stress response mechanisms and host defense systems that are relevant to its industrial applications. The presence of a unique exopolysaccharide gene cluster and cell surface protein orthologs commonly associated with probiotic functionality revealed potential probiotic applications of LMD-9.
Differentiating Botulinum Neurotoxin-Producing Clostridia with a Simple, Multiplex PCR Assay.

PubMed

Williamson, Charles H D; Vazquez, Adam J; Hill, Karen; Smith, Theresa J; Nottingham, Roxanne; Stone, Nathan E; Sobek, Colin J; Cocking, Jill H; Fernández, Rafael A; Caballero, Patricia A; Leiser, Owen P; Keim, Paul; Sahl, Jason W

2017-09-15

Diverse members of the genus Clostridium produce botulinum neurotoxins (BoNTs), which cause a flaccid paralysis known as botulism. While multiple species of clostridia produce BoNTs, the majority of human botulism cases have been attributed to Clostridium botulinum groups I and II. Recent comparative genomic studies have demonstrated the genomic diversity within these BoNT-producing species. This report introduces a multiplex PCR assay for differentiating members of C. botulinum group I, C. sporogenes , and two major subgroups within C. botulinum group II. Coding region sequences unique to each of the four species/subgroups were identified by in silico analyses of thousands of genome assemblies, and PCR primers were designed to amplify each marker. The resulting multiplex PCR assay correctly assigned 41 tested isolates to the appropriate species or subgroup. A separate PCR assay to determine the presence of the ntnh gene (a gene associated with the botulinum neurotoxin gene cluster) was developed and validated. The ntnh gene PCR assay provides information about the presence or absence of the botulinum neurotoxin gene cluster and the type of gene cluster present ( ha positive [ ha + ] or orfX + ). The increased availability of whole-genome sequence data and comparative genomic tools enabled the design of these assays, which provide valuable information for characterizing BoNT-producing clostridia. The PCR assays are rapid, inexpensive tests that can be applied to a variety of sample types to assign isolates to species/subgroups and to detect clostridia with botulinum neurotoxin gene ( bont ) clusters. IMPORTANCE Diverse clostridia produce the botulinum neurotoxin, one of the most potent known neurotoxins. In this study, a multiplex PCR assay was developed to differentiate clostridia that are most commonly isolated in connection with human botulism cases: C. botulinum group I, C. sporogenes , and two major subgroups within C. botulinum group II. Since BoNT-producing and nontoxigenic isolates can be found in each species, a PCR assay to determine the presence of the ntnh gene, which is a universally present component of bont gene clusters, and to provide information about the type ( ha + or orfX + ) of bont gene cluster present in a sample was also developed. The PCR assays provide simple, rapid, and inexpensive tools for screening uncharacterized isolates from clinical or environmental samples. The information provided by these assays can inform epidemiological studies, aid with identifying mixtures of isolates and unknown isolates in culture collections, and confirm the presence of bacteria of interest. Copyright © 2017 Williamson et al.

The complete genome sequence of the African buffalo (Syncerus caffer).

PubMed

Glanzmann, Brigitte; Möller, Marlo; le Roex, Nikki; Tromp, Gerard; Hoal, Eileen G; van Helden, Paul D

2016-12-07

The African buffalo (Syncerus caffer) is an important role player in the savannah ecosystem. It has become a species of relevance because of its role as a wildlife maintenance host for an array of infectious and zoonotic diseases some of which include corridor disease, foot-and-mouth disease and bovine tuberculosis. To date, no complete genome sequence for S. caffer had been available for study and the genomes of other species such as the domestic cow (Bos taurus) had been used as a proxy for any genetics analysis conducted on this species. Here, the high coverage genome sequence of the African buffalo (S. caffer) is presented. A total of 19,765 genes were predicted and 19,296 genes could be successfully annotated to S. caffer while 469 genes remained unannotated. Moreover, in order to extend a detailed annotation of S. caffer, gene clusters were constructed using twelve additional mammalian genomes. The S. caffer genome contains 10,988 gene clusters, of which 62 are shared exclusively between B. taurus and S. caffer. This study provides a unique genomic perspective for the S. caffer, allowing for the identification of novel variants that may play a role in the natural history and physiological adaptations.
Characterization of the flgG operon of Rhodobacter sphaeroides WS8 and its role in flagellum biosynthesis.

PubMed

González-Pedrajo, Bertha; de la Mora, Javier; Ballado, Teresa; Camarena, Laura; Dreyfus, Georges

2002-11-13

In this work, we show evidence regarding the functionality of a large cluster of flagellar genes in Rhodobacter sphaeroides. The genes of this cluster, flgGHIJKL and orf-1, are mainly involved in the formation of the basal body, and flgK and flgL encode the hook-associated proteins HAP1 and HAP3. In general, these genes showed a good similarity as compared with those reported for Salmonella enterica. However, flgJ and flgK showed particular features that make them unique among the flagellar sequences already reported. flgJ is only a third of the size reported for flgJ from Salmonella; whereas flgK is about three times larger than any other flgK sequence previously known. Our results indicate that both genes are functional, and their products are essential for flagellar assembly. In contrast, the interruption of orf-1, did not affect motility suggesting that this sequence, if functional, is not indispensable for flagellar assembly. Finally, we present genetic evidence suggesting that the flgGHIJKL genes are expressed as a single transcriptional unit depending on the sigma-54 factor.
Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

PubMed Central

Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

2007-01-01

Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730
Genome-scale analysis of aberrant DNA methylation in colorectal cancer

PubMed Central

Hinoue, Toshinori; Weisenberger, Daniel J.; Lange, Christopher P.E.; Shen, Hui; Byun, Hyang-Min; Van Den Berg, David; Malik, Simeen; Pan, Fei; Noushmehr, Houtan; van Dijk, Cornelis M.; Tollenaar, Rob A.E.M.; Laird, Peter W.

2012-01-01

Colorectal cancer (CRC) is a heterogeneous disease in which unique subtypes are characterized by distinct genetic and epigenetic alterations. Here we performed comprehensive genome-scale DNA methylation profiling of 125 colorectal tumors and 29 adjacent normal tissues. We identified four DNA methylation–based subgroups of CRC using model-based cluster analyses. Each subtype shows characteristic genetic and clinical features, indicating that they represent biologically distinct subgroups. A CIMP-high (CIMP-H) subgroup, which exhibits an exceptionally high frequency of cancer-specific DNA hypermethylation, is strongly associated with MLH1 DNA hypermethylation and the BRAFV600E mutation. A CIMP-low (CIMP-L) subgroup is enriched for KRAS mutations and characterized by DNA hypermethylation of a subset of CIMP-H-associated markers rather than a unique group of CpG islands. Non-CIMP tumors are separated into two distinct clusters. One non-CIMP subgroup is distinguished by a significantly higher frequency of TP53 mutations and frequent occurrence in the distal colon, while the tumors that belong to the fourth group exhibit a low frequency of both cancer-specific DNA hypermethylation and gene mutations and are significantly enriched for rectal tumors. Furthermore, we identified 112 genes that were down-regulated more than twofold in CIMP-H tumors together with promoter DNA hypermethylation. These represent ∼7% of genes that acquired promoter DNA methylation in CIMP-H tumors. Intriguingly, 48/112 genes were also transcriptionally down-regulated in non-CIMP subgroups, but this was not attributable to promoter DNA hypermethylation. Together, we identified four distinct DNA methylation subgroups of CRC and provided novel insight regarding the role of CIMP-specific DNA hypermethylation in gene silencing. PMID:21659424
Transcriptional Coupling of Neighboring Genes and Gene Expression Noise: Evidence that Gene Orientation and Noncoding Transcripts Are Modulators of Noise

PubMed Central

Wang, Guang-Zhong; Lercher, Martin J.; Hurst, Laurence D.

2011-01-01

Abstract How is noise in gene expression modulated? Do mechanisms of noise control impact genome organization? In yeast, the expression of one gene can affect that of a very close neighbor. As the effect is highly regionalized, we hypothesize that genes in different orientations will have differing degrees of coupled expression and, in turn, different noise levels. Divergently organized gene pairs, in particular those with bidirectional promoters, have close promoters, maximizing the likelihood that expression of one gene affects the neighbor. With more distant promoters, the same is less likely to hold for gene pairs in nondivergent orientation. Stochastic models suggest that coupled chromatin dynamics will typically result in low abundance-corrected noise (ACN). Transcription of noncoding RNA (ncRNA) from a bidirectional promoter, we thus hypothesize to be a noise-reduction, expression-priming, mechanism. The hypothesis correctly predicts that protein-coding genes with a bidirectional promoter, including those with a ncRNA partner, have lower ACN than other genes and divergent gene pairs uniquely have correlated ACN. Moreover, as predicted, ACN increases with the distance between promoters. The model also correctly predicts ncRNA transcripts to be often divergently transcribed from genes that a priori would be under selection for low noise (essential genes, protein complex genes) and that the latter genes should commonly reside in divergent orientation. Likewise, that genes with bidirectional promoters are rare subtelomerically, cluster together, and are enriched in essential gene clusters is expected and observed. We conclude that gene orientation and transcription of ncRNAs are candidate modulators of noise. PMID:21402863
Evolution of Streptococcus pneumoniae and Its Close Commensal Relatives

PubMed Central

Kilian, Mogens; Poulsen, Knud; Blomqvist, Trinelise; Håvarstein, Leiv S.; Bek-Thomsen, Malene; Tettelin, Hervé; Sørensen, Uffe B. S.

2008-01-01

Streptococcus pneumoniae is a member of the Mitis group of streptococci which, according to 16S rRNA-sequence based phylogenetic reconstruction, includes 12 species. While other species of this group are considered prototypes of commensal bacteria, S. pneumoniae is among the most frequent microbial killers worldwide. Population genetic analysis of 118 strains, supported by demonstration of a distinct cell wall carbohydrate structure and competence pheromone sequence signature, shows that S. pneumoniae is one of several hundred evolutionary lineages forming a cluster separate from Streptococcus oralis and Streptococcus infantis. The remaining lineages of this distinct cluster are commensals previously collectively referred to as Streptococcus mitis and each represent separate species by traditional taxonomic standard. Virulence genes including the operon for capsule polysaccharide synthesis and genes encoding IgA1 protease, pneumolysin, and autolysin were randomly distributed among S. mitis lineages. Estimates of the evolutionary age of the lineages, the identical location of remnants of virulence genes in the genomes of commensal strains, the pattern of genome reductions, and the proportion of unique genes and their origin support the model that the entire cluster of S. pneumoniae, S. pseudopneumoniae, and S. mitis lineages evolved from pneumococcus-like bacteria presumably pathogenic to the common immediate ancestor of hominoids. During their adaptation to a commensal life style, most of the lineages gradually lost the majority of genes determining virulence and became genetically distinct due to sexual isolation in their respective hosts. PMID:18628950
Comparative analysis of the Rotarix™ vaccine strain and G1P[8] rotaviruses detected before and after vaccine introduction in Belgium.

PubMed

Zeller, Mark; Heylen, Elisabeth; Tamim, Sana; McAllen, John K; Kirkness, Ewen F; Akopov, Asmik; De Coster, Sarah; Van Ranst, Marc; Matthijnssens, Jelle

2017-01-01

G1P[8] rotaviruses are responsible for the majority of human rotavirus infections worldwide. The effect of universal mass vaccination with rotavirus vaccines on circulating G1P[8] rotaviruses is still poorly understood. Therefore we analyzed the complete genomes of the Rotarix™ vaccine strain, and 70 G1P[8] rotaviruses, detected between 1999 and 2010 in Belgium (36 before and 34 after vaccine introduction) to investigate the impact of rotavirus vaccine introduction on circulating G1P[8] strains. All rotaviruses possessed a complete Wa-like genotype constellation, but frequent intra-genogroup reassortments were observed as well as multiple different cluster constellations circulating in a single season. In addition, identical cluster constellations were found to circulate persistently over multiple seasons. The Rotarix™ vaccine strain possessed a unique cluster constellation that was not present in currently circulating G1P[8] strains. At the nucleotide level, the VP6, VP2 and NSP2 gene segments of Rotarix™ were relatively distantly related to any Belgian G1P[8] strain, but other gene segments of Rotarix™ were found in clusters also containing circulating Belgian strains. At the amino acid level, the genetic distance between Rotarix™ and circulating Belgian strains was considerably lower, except for NSP1. When we compared the Belgian G1P[8] strains collected before and after vaccine introduction a reduction in the proportion of strains that were found in the same cluster as the Rotarix™ vaccine strain was observed for most gene segments. The reduction in the proportion of strains belonging to the same cluster may be the result of the vaccine introduction, although natural fluctuations cannot be ruled out.
A Phosphorylated Cytoplasmic Autoantigen, GW182, Associates with a Unique Population of Human mRNAs within Novel Cytoplasmic Speckles

PubMed Central

Eystathioy, Theophany; Chan, Edward K. L.; Tenenbaum, Scott A.; Keene, Jack D.; Griffith, Kevin; Fritzler, Marvin J.

2002-01-01

A novel human cellular structure has been identified that contains a unique autoimmune antigen and multiple messenger RNAs. This complex was discovered using an autoimmune serum from a patient with motor and sensory neuropathy and contains a protein of 182 kDa. The gene and cDNA encoding the protein indicated an open reading frame with glycine-tryptophan (GW) repeats and a single RNA recognition motif. Both the patient's serum and a rabbit serum raised against the recombinant GW protein costained discrete cytoplasmic speckles designated as GW bodies (GWBs) that do not overlap with the Golgi complex, endosomes, lysosomes, or peroxisomes. The mRNAs associated with GW182 represent a clustered set of transcripts that are presumed to reside within the GW complexes. We propose that the GW ribonucleoprotein complex is involved in the posttranscriptional regulation of gene expression by sequestering a specific subset of gene transcripts involved in cell growth and homeostasis. PMID:11950943
Horizontal gene transfer and gene dosage drives adaptation to wood colonization in a tree pathogen

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dhillon, Braham; Feau, Nicolas; Aerts, Andrea L.

Some of the most damaging tree diseases are caused by pathogens that induce cankers, a stem deformation often lethal. To investigate the cause of this adaptation, we sequenced the genomes of poplar pathogens that do and do not cause cankers. We found a unique cluster of genes that produce secondary metabolites and are co-activated when the canker pathogen is grown on poplar wood and leaves. The gene genealogy is discordant with the species phylogeny, showing a signature of horizontal transfer from fungi associated with wood decay. Furthermore, genes encoding hemicellulose-degrading enzymes are up-regulated on poplar wood chips, with some havingmore » been acquired horizontally. In conclusion, we propose that adaptation to colonize poplar woody stems is the result of acquisition of these genes.« less
Horizontal gene transfer and gene dosage drives adaptation to wood colonization in a tree pathogen

DOE PAGES

Dhillon, Braham; Feau, Nicolas; Aerts, Andrea L.; ...

2015-03-02

Some of the most damaging tree diseases are caused by pathogens that induce cankers, a stem deformation often lethal. To investigate the cause of this adaptation, we sequenced the genomes of poplar pathogens that do and do not cause cankers. We found a unique cluster of genes that produce secondary metabolites and are co-activated when the canker pathogen is grown on poplar wood and leaves. The gene genealogy is discordant with the species phylogeny, showing a signature of horizontal transfer from fungi associated with wood decay. Furthermore, genes encoding hemicellulose-degrading enzymes are up-regulated on poplar wood chips, with some havingmore » been acquired horizontally. In conclusion, we propose that adaptation to colonize poplar woody stems is the result of acquisition of these genes.« less
Comparative interrogation of the developing xylem transcriptomes of two wood-forming species: Populus trichocarpa and Eucalyptus grandis.

PubMed

Hefer, Charles A; Mizrachi, Eshchar; Myburg, Alexander A; Douglas, Carl J; Mansfield, Shawn D

2015-06-01

Wood formation is a complex developmental process governed by genetic and environmental stimuli. Populus and Eucalyptus are fast-growing, high-yielding tree genera that represent ecologically and economically important species suitable for generating significant lignocellulosic biomass. Comparative analysis of the developing xylem and leaf transcriptomes of Populus trichocarpa and Eucalyptus grandis together with phylogenetic analyses identified clusters of homologous genes preferentially expressed during xylem formation in both species. A conserved set of 336 single gene pairs showed highly similar xylem preferential expression patterns, as well as evidence of high functional constraint. Individual members of multi-gene orthologous clusters known to be involved in secondary cell wall biosynthesis also showed conserved xylem expression profiles. However, species-specific expression as well as opposite (xylem versus leaf) expression patterns observed for a subset of genes suggest subtle differences in the transcriptional regulation important for xylem development in each species. Using sequence similarity and gene expression status, we identified functional homologs likely to be involved in xylem developmental and biosynthetic processes in Populus and Eucalyptus. Our study suggests that, while genes involved in secondary cell wall biosynthesis show high levels of gene expression conservation, differential regulation of some xylem development genes may give rise to unique xylem properties. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation

PubMed Central

Wegrzyn, Jill L.; Liechty, John D.; Stevens, Kristian A.; Wu, Le-Shin; Loopstra, Carol A.; Vasquez-Gross, Hans A.; Dougherty, William M.; Lin, Brian Y.; Zieve, Jacob J.; Martínez-García, Pedro J.; Holt, Carson; Yandell, Mark; Zimin, Aleksey V.; Yorke, James A.; Crepeau, Marc W.; Puiu, Daniela; Salzberg, Steven L.; de Jong, Pieter J.; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H.; Neale, David B.

2014-01-01

The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%. PMID:24653211
IMG-ABC. A knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites

DOE PAGES

Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; ...

2015-07-14

In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.« less
Paenilamicin: structure and biosynthesis of a hybrid nonribosomal peptide/polyketide antibiotic from the bee pathogen Paenibacillus larvae.

PubMed

Müller, Sebastian; Garcia-Gonzalez, Eva; Mainz, Andi; Hertlein, Gillian; Heid, Nina C; Mösker, Eva; van den Elst, Hans; Overkleeft, Herman S; Genersch, Elke; Süssmuth, Roderich D

2014-09-26

The spore-forming bacterium Paenibacillus larvae is the causative agent of American Foulbrood (AFB), a fatal disease of honey bees that occurs worldwide. Previously, we identified a complex hybrid nonribosomal peptide/polyketide synthesis (NRPS/PKS) gene cluster in the genome of P. larvae. Herein, we present the isolation and structure elucidation of the antibacterial and antifungal products of this gene cluster, termed paenilamicins. The unique structures of the paenilamicins give deep insight into the underlying complex hybrid NRPS/PKS biosynthetic machinery. Bee larval co-infection assays reveal that the paenilamicins are employed by P. larvae in fighting ecological niche competitors and are not directly involved in killing the bee larvae. Their antibacterial and antifungal activities qualify the paenilamicins as attractive candidates for drug development. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A novel gene cluster allows preferential utilization of fucosylated milk oligosaccharides in Bifidobacterium longum subsp. longum SC596

PubMed Central

Garrido, Daniel; Ruiz-Moyano, Santiago; Kirmiz, Nina; Davis, Jasmine C.; Totten, Sarah M.; Lemay, Danielle G.; Ugalde, Juan A.; German, J. Bruce; Lebrilla, Carlito B.; Mills, David A.

2016-01-01

The infant intestinal microbiota is often colonized by two subspecies of Bifidobacterium longum: subsp. infantis (B. infantis) and subsp. longum (B. longum). Competitive growth of B. infantis in the neonate intestine has been linked to the utilization of human milk oligosaccharides (HMO). However, little is known how B. longum consumes HMO. In this study, infant-borne B. longum strains exhibited varying HMO growth phenotypes. While all strains efficiently utilized lacto-N-tetraose, certain strains additionally metabolized fucosylated HMO. B. longum SC596 grew vigorously on HMO, and glycoprofiling revealed a preference for consumption of fucosylated HMO. Transcriptomes of SC596 during early-stage growth on HMO were more similar to growth on fucosyllactose, transiting later to a pattern similar to growth on neutral HMO. B. longum SC596 contains a novel gene cluster devoted to the utilization of fucosylated HMO, including genes for import of fucosylated molecules, fucose metabolism and two α-fucosidases. This cluster showed a modular induction during early growth on HMO and fucosyllactose. This work clarifies the genomic and physiological variation of infant-borne B. longum to HMO consumption, which resembles B. infantis. The capability to preferentially consume fucosylated HMO suggests a competitive advantage for these unique B. longum strains in the breast-fed infant gut. PMID:27756904
Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae).

PubMed

Eckert, Andrew J; van Heerwaarden, Joost; Wegrzyn, Jill L; Nelson, C Dana; Ross-Ibarra, Jeffrey; González-Martínez, Santíago C; Neale, David B

2010-07-01

Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as F(ST) outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes.
A novel gene cluster allows preferential utilization of fucosylated milk oligosaccharides in Bifidobacterium longum subsp. longum SC596.

PubMed

Garrido, Daniel; Ruiz-Moyano, Santiago; Kirmiz, Nina; Davis, Jasmine C; Totten, Sarah M; Lemay, Danielle G; Ugalde, Juan A; German, J Bruce; Lebrilla, Carlito B; Mills, David A

2016-10-19

The infant intestinal microbiota is often colonized by two subspecies of Bifidobacterium longum: subsp. infantis (B. infantis) and subsp. longum (B. longum). Competitive growth of B. infantis in the neonate intestine has been linked to the utilization of human milk oligosaccharides (HMO). However, little is known how B. longum consumes HMO. In this study, infant-borne B. longum strains exhibited varying HMO growth phenotypes. While all strains efficiently utilized lacto-N-tetraose, certain strains additionally metabolized fucosylated HMO. B. longum SC596 grew vigorously on HMO, and glycoprofiling revealed a preference for consumption of fucosylated HMO. Transcriptomes of SC596 during early-stage growth on HMO were more similar to growth on fucosyllactose, transiting later to a pattern similar to growth on neutral HMO. B. longum SC596 contains a novel gene cluster devoted to the utilization of fucosylated HMO, including genes for import of fucosylated molecules, fucose metabolism and two α-fucosidases. This cluster showed a modular induction during early growth on HMO and fucosyllactose. This work clarifies the genomic and physiological variation of infant-borne B. longum to HMO consumption, which resembles B. infantis. The capability to preferentially consume fucosylated HMO suggests a competitive advantage for these unique B. longum strains in the breast-fed infant gut.
Chloroplast DNA sequence of the green alga Oedogonium cardiacum (Chlorophyceae): Unique genome architecture, derived characters shared with the Chaetophorales and novel genes acquired through horizontal transfer

PubMed Central

Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique

2008-01-01

Background To gain insight into the branching order of the five main lineages currently recognized in the green algal class Chlorophyceae and to expand our understanding of chloroplast genome evolution, we have undertaken the sequencing of chloroplast DNA (cpDNA) from representative taxa. The complete cpDNA sequences previously reported for Chlamydomonas (Chlamydomonadales), Scenedesmus (Sphaeropleales), and Stigeoclonium (Chaetophorales) revealed tremendous variability in their architecture, the retention of only few ancestral gene clusters, and derived clusters shared by Chlamydomonas and Scenedesmus. Unexpectedly, our recent phylogenies inferred from these cpDNAs and the partial sequences of three other chlorophycean cpDNAs disclosed two major clades, one uniting the Chlamydomonadales and Sphaeropleales (CS clade) and the other uniting the Oedogoniales, Chaetophorales and Chaetopeltidales (OCC clade). Although molecular signatures provided strong support for this dichotomy and for the branching of the Oedogoniales as the earliest-diverging lineage of the OCC clade, more data are required to validate these phylogenies. We describe here the complete cpDNA sequence of Oedogonium cardiacum (Oedogoniales). Results Like its three chlorophycean homologues, the 196,547-bp Oedogonium chloroplast genome displays a distinctive architecture. This genome is one of the most compact among photosynthetic chlorophytes. It has an atypical quadripartite structure, is intron-rich (17 group I and 4 group II introns), and displays 99 different conserved genes and four long open reading frames (ORFs), three of which are clustered in the spacious inverted repeat of 35,493 bp. Intriguingly, two of these ORFs (int and dpoB) revealed high similarities to genes not usually found in cpDNA. At the gene content and gene order levels, the Oedogonium genome most closely resembles its Stigeoclonium counterpart. Characters shared by these chlorophyceans but missing in members of the CS clade include the retention of psaM, rpl32 and trnL(caa), the loss of petA, the disruption of three ancestral clusters and the presence of five derived gene clusters. Conclusion The Oedogonium chloroplast genome disclosed additional characters that bolster the evidence for a close alliance between the Oedogoniales and Chaetophorales. Our unprecedented finding of int and dpoB in this cpDNA provides a clear example that novel genes were acquired by the chloroplast genome through horizontal transfers, possibly from a mitochondrial genome donor. PMID:18558012
Clustering cancer gene expression data by projective clustering ensemble

PubMed Central

Yu, Xianxue; Yu, Guoxian

2017-01-01

Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Mapping in an apple (Malus x domestica) F1 segregating population based on physical clustering of differentially expressed genes

PubMed Central

2014-01-01

Background Apple tree breeding is slow and difficult due to long generation times, self-incompatibility, and complex genetics. The identification of molecular markers linked to traits of interest is a way to expedite the breeding process. In the present study, we aimed to identify genes whose steady-state transcript abundance was associated with inheritance of specific traits segregating in an apple (Malus × domestica) rootstock F1 breeding population, including resistance to powdery mildew (Podosphaera leucotricha) disease and woolly apple aphid (Eriosoma lanigerum). Results Transcription profiling was performed for 48 individual F1 apple trees from a cross of two highly heterozygous parents, using RNA isolated from healthy, actively-growing shoot tips and a custom apple DNA oligonucleotide microarray representing 26,000 unique transcripts. Genome-wide expression profiles were not clear indicators of powdery mildew or woolly apple aphid resistance phenotype. However, standard differential gene expression analysis between phenotypic groups of trees revealed relatively small sets of genes with trait-associated expression levels. For example, thirty genes were identified that were differentially expressed between trees resistant and susceptible to powdery mildew. Interestingly, the genes encoding twenty-four of these transcripts were physically clustered on chromosome 12. Similarly, seven genes were identified that were differentially expressed between trees resistant and susceptible to woolly apple aphid, and the genes encoding five of these transcripts were also clustered, this time on chromosome 17. In each case, the gene clusters were in the vicinity of previously identified major quantitative trait loci for the corresponding trait. Similar results were obtained for a series of molecular traits. Several of the differentially expressed genes were used to develop DNA polymorphism markers linked to powdery mildew disease and woolly apple aphid resistance. Conclusions Gene expression profiling and trait-associated transcript analysis using an apple F1 population readily identified genes physically linked to powdery mildew disease resistance and woolly apple aphid resistance loci. This result was especially useful in apple, where extreme levels of heterozygosity make the development of reliable DNA markers quite difficult. The results suggest that this approach could prove effective in crops with complicated genetics, or for which few genomic information resources are available. PMID:24708064

Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

PubMed

Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

2017-07-11

A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.
Insights into a divergent phenazine biosynthetic pathway governed by a plasmid-born esmeraldin gene cluster.

PubMed

Rui, Zhe; Ye, Min; Wang, Shuoguo; Fujikawa, Kaori; Akerele, Bankole; Aung, May; Floss, Heinz G; Zhang, Wenjun; Yu, Tin-Wein

2012-09-21

Phenazine-type metabolites arise from either phenazine-1-carboxylic acid (PCA) or phenazine-1,6-dicarboxylic acid (PDC). Although the biosynthesis of PCA has been studied extensively, PDC assembly remains unclear. Esmeraldins and saphenamycin, the PDC originated products, are antimicrobial and antitumor metabolites isolated from Streptomyces antibioticus Tü 2706. Herein, the esmeraldin biosynthetic gene cluster was identified on a dispensable giant plasmid. Twenty-four putative esm genes were characterized by bioinformatics, mutagenesis, genetic complementation, and functional protein expressions. Unlike enzymes involved in PCA biosynthesis, EsmA1 and EsmA2 together decisively promoted the PDC yield. The resulting PDC underwent a series of conversions to give 6-acetylphenazine-1-carboxylic acid, saphenic acid, and saphenamycin through a unique one-carbon extension by EsmB1-B5, a keto reduction by EsmC, and an esterification by EsmD1-D3, the atypical polyketide sythases, respectively. Two transcriptional regulators, EsmT1 and EsmT2, are required for esmeraldin production. Copyright © 2012 Elsevier Ltd. All rights reserved.
Identification and comparative analysis of the epidermal differentiation complex in snakes

PubMed Central

Brigit Holthaus, Karin; Mlitz, Veronika; Strasser, Bettina; Tschachler, Erwin; Alibardi, Lorenzo; Eckhart, Leopold

2017-01-01

The epidermis of snakes efficiently protects against dehydration and mechanical stress. However, only few proteins of the epidermal barrier to the environment have so far been identified in snakes. Here, we determined the organization of the Epidermal Differentiation Complex (EDC), a cluster of genes encoding protein constituents of cornified epidermal structures, in snakes and compared it to the EDCs of other squamates and non-squamate reptiles. The EDC of snakes displays shared synteny with that of the green anole lizard, including the presence of a cluster of corneous beta-protein (CBP)/beta-keratin genes. We found that a unique CBP comprising 4 putative beta-sheets and multiple cysteine-rich EDC proteins are conserved in all snakes and other squamates investigated. Comparative genomics of squamates suggests that the evolution of snakes was associated with a gene duplication generating two isoforms of the S100 fused-type protein, scaffoldin, the origin of distinct snake-specific EDC genes, and the loss of other genes that were present in the EDC of the last common ancestor of snakes and lizards. Taken together, our results provide new insights into the evolution of the skin in squamates and a basis for the characterization of the molecular composition of the epidermis in snakes. PMID:28345630
Identifying microbially mediated transformations of DOC across season and tide from simultaneous changes in whole community gene expression and in mass spectra generated by Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICR-MS)

NASA Astrophysics Data System (ADS)

Ballantyne, F.; Medeiros, P. M.; Moran, M. A.; Song, C.; Whitman, W. B.; Washington, B.; Yu, M.; Lee, J.

2017-12-01

Despite the advent of methods enabling high resolution characterization of metabolic activity and of organic matter, linking microbial metabolism to organic matter transformations remains a challenge. By sequencing metatranscriptomes and using Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICR-MS) to characterize organic matter (OM) at the beginning and at the end of incubations of estuarine water across tide and season, we sought to link observed a changes in OM composition to microbial metabolism. We used linear models and K means clustering to identify clusters of genes that responded coherently across season, which accounted for most of the variability in gene expression, over tidal regime, which explained the majority of the remaining variation, and over time during the 24 hour incubations. We used an approach from the field of signal processing, that to our knowledge has not been used to analyze FTICR-MS data, to identify formulae of compounds that changed in concentration during the incubations. This approach, based on the discrete wavelet transform (DWT), allowed us to overcome some of the challenges associated with analyzing FTICR-MS data: variable ionization of organic compounds, signal suppression by high concentration compounds, and uncertainty about how to normalize changes across spectra. We were able to link clusters of metabolic and transporter genes to changes in OM composition, and uniquely identify genes based on their cross correlation with changes in FTICR mass spectra. Our approach for analyzing FTICR- MS data enables more robust inference about OM transformations, and linking high resolution changes in gene expression and in OM data during incubations represents an important step toward formulating models of microbial metabolism relevant for predicting biogeochemically relevant C fluxes.
Mandibulofacial Dysostosis in a Patient with a de novo 2;17 Translocation that Disrupts the HOXD Gene Cluster

PubMed Central

Stevenson, David A.; Bleyl, Steven B.; Maxwell, Teresa; Brothman, Arthur R.; South, Sarah T.

2011-01-01

Treacher Collins syndrome is the prototypical mandibulofacial dysostosis syndrome, but other mandibulofacial dysostosis syndromes have been described. We report an infant with mandibulofacial dysostosis and an apparently balanced de novo 2;17 translocation. She presented with severe lower eyelid colobomas requiring skin grafting, malar and mandibular hypoplasia, bilateral microtia with external auditory canal atreasia, dysplastic ossicles, hearing loss, bilateral choanal stenosis, cleft palate without cleft lip, several oral frenula of the upper lip/gum, and micrognathia requiring tracheostomy. Her limbs were normal. Chromosome analysis at the 600-band level showed a 46,XX,t(2;17)(q24.3;q23) karyotype. Sequencing of the entire TCOF1 coding region did not show evidence of a sequence variation. High-resolution genomic microarray analysis did not identify a cryptic imbalance. FISH mapping refined the breakpoints to 2q31.1 and 17q24.3–25.1 and showed the 2q31.1 breakpoint likely affects the HOXD gene cluster. Several atypical findings and lack of an identifiable TCOF1 mutation suggest that this child has a provisionally unique mandibulofacial dysostosis syndrome. The apparently balanced de novo translocation provides candidate loci for atypical and TCOF1 mutation negative cases of Treacher Collins syndrome. Based on the agreement of our findings with one previous case of mandibulofacial dysostosis with a 2q31.1 transocation, we hypothesize that misexpression of genes in the HOXD gene cluster produced the described phenotype in this patient. PMID:17431905
Mandibulofacial dysostosis in a patient with a de novo 2;17 translocation that disrupts the HOXD gene cluster.

PubMed

Stevenson, David A; Bleyl, Steven B; Maxwell, Teresa; Brothman, Arthur R; South, Sarah T

2007-05-15

Treacher Collins syndrome (TCS) is the prototypical mandibulofacial dysostosis syndrome, but other mandibulofacial dysostosis syndromes have been described. We report an infant with mandibulofacial dysostosis and an apparently balanced de novo 2;17 translocation. She presented with severe lower eyelid colobomas requiring skin grafting, malar and mandibular hypoplasia, bilateral microtia with external auditory canal atreasia, dysplastic ossicles, hearing loss, bilateral choanal stenosis, cleft palate without cleft lip, several oral frenula of the upper lip/gum, and micrognathia requiring tracheostomy. Her limbs were normal. Chromosome analysis at the 600-band level showed a 46,XX,t(2;17)(q24.3;q23) karyotype. Sequencing of the entire TCOF1 coding region did not show evidence of a sequence variation. High-resolution genomic microarray analysis did not identify a cryptic imbalance. FISH mapping refined the breakpoints to 2q31.1 and 17q24.3-25.1 and showed the 2q31.1 breakpoint likely affects the HOXD gene cluster. Several atypical findings and lack of an identifiable TCOF1 mutation suggest that this child has a provisionally unique mandibulofacial dysostosis syndrome. The apparently balanced de novo translocation provides candidate loci for atypical and TCOF1 mutation negative cases of TCS. Based on the agreement of our findings with one previous case of mandibulofacial dysostosis with a 2q31.1 transocation, we hypothesize that misexpression of genes in the HOXD gene cluster produced the described phenotype in this patient.
TGF-β1-Induced Epithelial–Mesenchymal Transition Promotes Monocyte/Macrophage Properties in Breast Cancer Cells

PubMed Central

Johansson, Joel; Tabor, Vedrana; Wikell, Anna; Jalkanen, Sirpa; Fuxe, Jonas

2015-01-01

Breast cancer progression toward metastatic disease is linked to re-activation of epithelial–mesenchymal transition (EMT), a latent developmental process. Breast cancer cells undergoing EMT lose epithelial characteristics and gain the capacity to invade the surrounding tissue and migrate away from the primary tumor. However, less is known about the possible role of EMT in providing cancer cells with properties that allow them to traffic to distant sites. Given the fact that pro-metastatic cancer cells share a unique capacity with immune cells to traffic in-and-out of blood and lymphatic vessels we hypothesized that tumor cells undergoing EMT may acquire properties of immune cells. To study this, we performed gene-profiling analysis of mouse mammary EpRas tumor cells that had been allowed to adopt an EMT program after long-term treatment with TGF-β1 for 2 weeks. As expected, EMT cells acquired traits of mesenchymal cell differentiation and migration. However, in addition, we found another cluster of induced genes, which was specifically enriched in monocyte-derived macrophages, mast cells, and myeloid dendritic cells, but less in other types of immune cells. Further studies revealed that this monocyte/macrophage gene cluster was enriched in human breast cancer cell lines displaying an EMT or a Basal B profile, and in human breast tumors with EMT and undifferentiated (ER−/PR−) characteristics. The results identify an EMT-induced monocyte/macrophage gene cluster, which may play a role in breast cancer cell dissemination and metastasis. PMID:25674539
Methylobacterium genome sequences: a reference blueprint to investigate microbial metabolism of C1 compounds from natural and industrial sources.

PubMed

Vuilleumier, Stéphane; Chistoserdova, Ludmila; Lee, Ming-Chun; Bringel, Françoise; Lajus, Aurélie; Zhou, Yang; Gourion, Benjamin; Barbe, Valérie; Chang, Jean; Cruveiller, Stéphane; Dossat, Carole; Gillett, Will; Gruffaz, Christelle; Haugen, Eric; Hourcade, Edith; Levy, Ruth; Mangenot, Sophie; Muller, Emilie; Nadalig, Thierry; Pagni, Marco; Penny, Christian; Peyraud, Rémi; Robinson, David G; Roche, David; Rouy, Zoé; Saenampechek, Channakhone; Salvignol, Grégory; Vallenet, David; Wu, Zaining; Marx, Christopher J; Vorholt, Julia A; Olson, Maynard V; Kaul, Rajinder; Weissenbach, Jean; Médigue, Claudine; Lidstrom, Mary E

2009-01-01

Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared. The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name "island integration determinant" (iid). These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic lifestyles.
CXCL4 induces a unique transcriptome in monocyte-derived macrophages

PubMed Central

Gleissner, Christian A.; Shaked, Iftach; Little, Kristina M.; Ley, Klaus

2012-01-01

In atherosclerotic arteries, blood monocytes differentiate to macrophages in the presence of growth factors like macrophage colony-stimulation factor (MCSF) and chemokines like platelet factor 4 (CXCL4). To compare the gene expression signature of CXCL4-induced macrophages with MCSF-induced macrophages or macrophages polarized with IFN-γ/LPS (M1) or IL-4 (M2), we cultured primary human peripheral blood monocytes for six days. mRNA expression was measured by Affymetrix gene chips and differences were analyzed by Local Pooled Error test, Profile of Complex Functionality and Gene Set Enrichment Analysis. 375 genes were differentially expressed between MCSF- and CXCL4-induced macrophages, 206 of them overexpressed in CXCL4 macrophages coding for genes implicated in the inflammatory/immune response, antigen processing/presentation, and lipid metabolism. CXCL4-induced macrophages overexpressed some M1 and M2 genes and the corresponding cytokines at the protein level, however, their transcriptome clustered with neither M1 nor M2 transcriptomes. They almost completely lost the ability to phagocytose zymosan beads. Genes linked to atherosclerosis were not consistently up- or downregulated. Scavenger receptors showed lower and cholesterol efflux transporters higher expression in CXCL4- than MCSF-induced macrophages, resulting in lower LDL content. We conclude that CXCL4 induces a unique macrophage transcriptome distinct from known macrophage types, defining a new macrophage differentiation that we propose to call M4. PMID:20335529
Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

PubMed

Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

2012-07-15

Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.
Development and validation of a gene expression oligo microarray for the gilthead sea bream (Sparus aurata).

PubMed

Ferraresso, Serena; Vitulo, Nicola; Mininni, Alba N; Romualdi, Chiara; Cardazzo, Barbara; Negrisolo, Enrico; Reinhardt, Richard; Canario, Adelino V M; Patarnello, Tomaso; Bargelloni, Luca

2008-12-03

Aquaculture represents the most sustainable alternative of seafood supply to substitute for the declining marine fisheries, but severe production bottlenecks remain to be solved. The application of genomic technologies offers much promise to rapidly increase our knowledge on biological processes in farmed species and overcome such bottlenecks. Here we present an integrated platform for mRNA expression profiling in the gilthead sea bream (Sparus aurata), a marine teleost of great importance for aquaculture. A public data base was constructed, consisting of 19,734 unique clusters (3,563 contigs and 16,171 singletons). Functional annotation was obtained for 8,021 clusters. Over 4,000 sequences were also associated with a GO entry. Two 60mer probes were designed for each gene and in-situ synthesized on glass slides using Agilent SurePrint technology. Platform reproducibility and accuracy were assessed on two early stages of sea bream development (one-day and four days old larvae). Correlation between technical replicates was always > 0.99, with strong positive correlation between paired probes. A two class SAM test identified 1,050 differentially expressed genes between the two developmental stages. Functional analysis suggested that down-regulated transcripts (407) in older larvae are mostly essential/housekeeping genes, whereas tissue-specific genes are up-regulated in parallel with the formation of key organs (eye, digestive system). Cross-validation of microarray data was carried out using quantitative qRT-PCR on 11 target genes, selected to reflect the whole range of fold-change and both up-regulated and down-regulated genes. A statistically significant positive correlation was obtained comparing expression levels for each target gene across all biological replicates. Good concordance between qRT-PCR and microarray data was observed between 2- and 7-fold change, while fold-change compression in the microarray was present for differences greater than 10-fold in the qRT-PCR. A highly reliable oligo-microarray platform was developed and validated for the gilthead sea bream despite the presently limited knowledge of the species transcriptome. Because of the flexible design this array will be able to accommodate additional probes as soon as novel unique transcripts are available.
Draft Genome Sequence of Eggplant (Solanum melongena L.): the Representative Solanum Species Indigenous to the Old World

PubMed Central

Hirakawa, Hideki; Shirasawa, Kenta; Miyatake, Koji; Nunome, Tsukasa; Negoro, Satomi; Ohyama, Akio; Yamaguchi, Hirotaka; Sato, Shusei; Isobe, Sachiko; Tabata, Satoshi; Fukuoka, Hiroyuki

2014-01-01

Unlike other important Solanaceae crops such as tomato, potato, chili pepper, and tobacco, all of which originated in South America and are cultivated worldwide, eggplant (Solanum melongena L.) is indigenous to the Old World and in this respect it is phylogenetically unique. To broaden our knowledge of the genomic nature of solanaceous plants further, we dissected the eggplant genome and built a draft genome dataset with 33,873 scaffolds termed SME_r2.5.1 that covers 833.1 Mb, ca. 74% of the eggplant genome. Approximately 90% of the gene space was estimated to be covered by SME_r2.5.1 and 85,446 genes were predicted in the genome. Clustering analysis of the predicted genes of eggplant along with the genes of three other solanaceous plants as well as Arabidopsis thaliana revealed that, of the 35,000 clusters generated, 4,018 were exclusively composed of eggplant genes that would perhaps confer eggplant-specific traits. Between eggplant and tomato, 16,573 pairs of genes were deduced to be orthologous, and 9,489 eggplant scaffolds could be mapped onto the tomato genome. Furthermore, 56 conserved synteny blocks were identified between the two species. The detailed comparative analysis of the eggplant and tomato genomes will facilitate our understanding of the genomic architecture of solanaceous plants, which will contribute to cultivation and further utilization of these crops. PMID:25233906
Discovery and characterization of miRNA genes in atlantic salmon (Salmo salar) by use of a deep sequencing approach

PubMed Central

2013-01-01

Background MicroRNAs (miRNAs) are an abundant class of endogenous small RNA molecules that downregulate gene expression at the posttranscriptional level. They play important roles in multiple biological processes by regulating genes that control developmental timing, growth, stem cell division and apoptosis by binding to the mRNA of target genes. Despite the position Atlantic salmon (Salmo salar) has as an economically important domesticated animal, there has been little research on miRNAs in this species. Knowledge about miRNAs and their target genes may be used to control health and to improve performance of economically important traits. However, before their biological function can be unravelled they must be identified and annotated. The aims of this study were to identify and characterize miRNA genes in Atlantic salmon by deep sequencing analysis of small RNA libraries from nine different tissues. Results A total of 180 distinct mature miRNAs belonging to 106 families of evolutionary conserved miRNAs, and 13 distinct novel mature miRNAs were discovered and characterized. The mature miRNAs corresponded to 521 putative precursor sequences located at unique genome locations. About 40% of these precursors were part of gene clusters, and the majority of the Salmo salar gene clusters discovered were conserved across species. Comparison of expression levels in samples from different tissues applying DESeq indicated that there were tissue specific expression differences in three conserved and one novel miRNA. Ssa-miR 736 was detected in heart tissue only, while two other clustered miRNAs (ssa-miR 212 and132) seems to be at a higher expression level in brain tissue. These observations correlate well with their expected functions as regulators of signal pathways in cardiac and neuronal cells, respectively. Ssa-miR 8163 is one of the novel miRNAs discovered and its function remains unknown. However, differential expression analysis using DESeq suggests that this miRNA is enriched in liver tissue and the precursor was mapped to intron 7 of the transferrin gene. Conclusions The identification and annotation of evolutionary conserved and novel Salmo salar miRNAs as well as the characterization of miRNA gene clusters provide biological knowledge that will greatly facilitate further functional studies on miRNAs in this species. PMID:23865519
Genome Sequence of the Bacterium Streptomyces davawensis JCM 4913 and Heterologous Production of the Unique Antibiotic Roseoflavin

PubMed Central

Jankowitsch, Frank; Schwarz, Julia; Rückert, Christian; Gust, Bertolt; Szczepanowski, Rafael; Blom, Jochen; Pelzer, Stefan; Kalinowski, Jörn

2012-01-01

Streptomyces davawensis JCM 4913 synthesizes the antibiotic roseoflavin, a structural riboflavin (vitamin B2) analog. Here, we report the 9,466,619-bp linear chromosome of S. davawensis JCM 4913 and a 89,331-bp linear plasmid. The sequence has an average G+C content of 70.58% and contains six rRNA operons (16S-23S-5S) and 69 tRNA genes. The 8,616 predicted protein-coding sequences include 32 clusters coding for secondary metabolites, several of which are unique to S. davawensis. The chromosome contains long terminal inverted repeats of 33,255 bp each and atypical telomeres. Sequence analysis with regard to riboflavin biosynthesis revealed three different patterns of gene organization in Streptomyces species. Heterologous expression of a set of genes present on a subgenomic fragment of S. davawensis resulted in the production of roseoflavin by the host Streptomyces coelicolor M1152. Phylogenetic analysis revealed that S. davawensis is a close relative of Streptomyces cinnabarinus, and much to our surprise, we found that the latter bacterium is a roseoflavin producer as well. PMID:23043000
Determining Physical Mechanisms of Gene Expression Regulation from Single Cell Gene Expression Data.

PubMed

Ezer, Daphne; Moignard, Victoria; Göttgens, Berthold; Adryan, Boris

2016-08-01

Many genes are expressed in bursts, which can contribute to cell-to-cell heterogeneity. It is now possible to measure this heterogeneity with high throughput single cell gene expression assays (single cell qPCR and RNA-seq). These experimental approaches generate gene expression distributions which can be used to estimate the kinetic parameters of gene expression bursting, namely the rate that genes turn on, the rate that genes turn off, and the rate of transcription. We construct a complete pipeline for the analysis of single cell qPCR data that uses the mathematics behind bursty expression to develop more accurate and robust algorithms for analyzing the origin of heterogeneity in experimental samples, specifically an algorithm for clustering cells by their bursting behavior (Simulated Annealing for Bursty Expression Clustering, SABEC) and a statistical tool for comparing the kinetic parameters of bursty expression across populations of cells (Estimation of Parameter changes in Kinetics, EPiK). We applied these methods to hematopoiesis, including a new single cell dataset in which transcription factors (TFs) involved in the earliest branchpoint of blood differentiation were individually up- and down-regulated. We could identify two unique sub-populations within a seemingly homogenous group of hematopoietic stem cells. In addition, we could predict regulatory mechanisms controlling the expression levels of eighteen key hematopoietic transcription factors throughout differentiation. Detailed information about gene regulatory mechanisms can therefore be obtained simply from high throughput single cell gene expression data, which should be widely applicable given the rapid expansion of single cell genomics.
A distinct and divergent lineage of genomic island-associated Type IV Secretion Systems in Legionella.

PubMed

Wee, Bryan A; Woolfit, Megan; Beatson, Scott A; Petty, Nicola K

2013-01-01

Legionella encodes multiple classes of Type IV Secretion Systems (T4SSs), including the Dot/Icm protein secretion system that is essential for intracellular multiplication in amoebal and human hosts. Other T4SSs not essential for virulence are thought to facilitate the acquisition of niche-specific adaptation genes including the numerous effector genes that are a hallmark of this genus. Previously, we identified two novel gene clusters in the draft genome of Legionella pneumophila strain 130b that encode homologues of a subtype of T4SS, the genomic island-associated T4SS (GI-T4SS), usually associated with integrative and conjugative elements (ICE). In this study, we performed genomic analyses of 14 homologous GI-T4SS clusters found in eight publicly available Legionella genomes and show that this cluster is unusually well conserved in a region of high plasticity. Phylogenetic analyses show that Legionella GI-T4SSs are substantially divergent from other members of this subtype of T4SS and represent a novel clade of GI-T4SSs only found in this genus. The GI-T4SS was found to be under purifying selection, suggesting it is functional and may play an important role in the evolution and adaptation of Legionella. Like other GI-T4SSs, the Legionella clusters are also associated with ICEs, but lack the typical integration and replication modules of related ICEs. The absence of complete replication and DNA pre-processing modules, together with the presence of Legionella-specific regulatory elements, suggest the Legionella GI-T4SS-associated ICE is unique and may employ novel mechanisms of regulation, maintenance and excision. The Legionella GI-T4SS cluster was found to be associated with several cargo genes, including numerous antibiotic resistance and virulence factors, which may confer a fitness benefit to the organism. The in-silico characterisation of this new T4SS furthers our understanding of the diversity of secretion systems involved in the frequent horizontal gene transfers that allow Legionella to adapt to and exploit diverse environmental niches.
A Distinct and Divergent Lineage of Genomic Island-Associated Type IV Secretion Systems in Legionella

PubMed Central

Wee, Bryan A.; Woolfit, Megan; Beatson, Scott A.; Petty, Nicola K.

2013-01-01

Legionella encodes multiple classes of Type IV Secretion Systems (T4SSs), including the Dot/Icm protein secretion system that is essential for intracellular multiplication in amoebal and human hosts. Other T4SSs not essential for virulence are thought to facilitate the acquisition of niche-specific adaptation genes including the numerous effector genes that are a hallmark of this genus. Previously, we identified two novel gene clusters in the draft genome of Legionella pneumophila strain 130b that encode homologues of a subtype of T4SS, the genomic island-associated T4SS (GI-T4SS), usually associated with integrative and conjugative elements (ICE). In this study, we performed genomic analyses of 14 homologous GI-T4SS clusters found in eight publicly available Legionella genomes and show that this cluster is unusually well conserved in a region of high plasticity. Phylogenetic analyses show that Legionella GI-T4SSs are substantially divergent from other members of this subtype of T4SS and represent a novel clade of GI-T4SSs only found in this genus. The GI-T4SS was found to be under purifying selection, suggesting it is functional and may play an important role in the evolution and adaptation of Legionella. Like other GI-T4SSs, the Legionella clusters are also associated with ICEs, but lack the typical integration and replication modules of related ICEs. The absence of complete replication and DNA pre-processing modules, together with the presence of Legionella-specific regulatory elements, suggest the Legionella GI-T4SS-associated ICE is unique and may employ novel mechanisms of regulation, maintenance and excision. The Legionella GI-T4SS cluster was found to be associated with several cargo genes, including numerous antibiotic resistance and virulence factors, which may confer a fitness benefit to the organism. The in-silico characterisation of this new T4SS furthers our understanding of the diversity of secretion systems involved in the frequent horizontal gene transfers that allow Legionella to adapt to and exploit diverse environmental niches. PMID:24358157
Patterns of Population Structure and Environmental Associations to Aridity Across the Range of Loblolly Pine (Pinus taeda L., Pinaceae)

PubMed Central

Eckert, Andrew J.; van Heerwaarden, Joost; Wegrzyn, Jill L.; Nelson, C. Dana; Ross-Ibarra, Jeffrey; González-Martínez, Santíago C.; Neale, David. B.

2010-01-01

Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as FST outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes. PMID:20439779
Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.

PubMed

Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin

2005-01-01

DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.

Evolutionary conservation of sequence and secondary structures inCRISPR repeats

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeatsmore » identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.« less
Significant Natural Product Biosynthetic Potential of Actinorhizal Symbionts of the Genus Frankia, as Revealed by Comparative Genomic and Proteomic Analyses▿

PubMed Central

Udwary, Daniel W.; Gontang, Erin A.; Jones, Adam C.; Jones, Carla S.; Schultz, Andrew W.; Winter, Jaclyn M.; Yang, Jane Y.; Beauchemin, Nicholas; Capson, Todd L.; Clark, Benjamin R.; Esquenazi, Eduardo; Eustáquio, Alessandra S.; Freel, Kelle; Gerwick, Lena; Gerwick, William H.; Gonzalez, David; Liu, Wei-Ting; Malloy, Karla L.; Maloney, Katherine N.; Nett, Markus; Nunnery, Joshawna K.; Penn, Kevin; Prieto-Davo, Alejandra; Simmons, Thomas L.; Weitz, Sara; Wilson, Micheal C.; Tisa, Louis S.; Dorrestein, Pieter C.; Moore, Bradley S.

2011-01-01

Bacteria of the genus Frankia are mycelium-forming actinomycetes that are found as nitrogen-fixing facultative symbionts of actinorhizal plants. Although soil-dwelling actinomycetes are well-known producers of bioactive compounds, the genus Frankia has largely gone uninvestigated for this potential. Bioinformatic analysis of the genome sequences of Frankia strains ACN14a, CcI3, and EAN1pec revealed an unexpected number of secondary metabolic biosynthesis gene clusters. Our analysis led to the identification of at least 65 biosynthetic gene clusters, the vast majority of which appear to be unique and for which products have not been observed or characterized. More than 25 secondary metabolite structures or structure fragments were predicted, and these are expected to include cyclic peptides, siderophores, pigments, signaling molecules, and specialized lipids. Outside the hopanoid gene locus, no cluster could be convincingly demonstrated to be responsible for the few secondary metabolites previously isolated from other Frankia strains. Few clusters were shared among the three species, demonstrating species-specific biosynthetic diversity. Proteomic analysis of Frankia sp. strains CcI3 and EAN1pec showed that significant and diverse secondary metabolic activity was expressed in laboratory cultures. In addition, several prominent signals in the mass range of peptide natural products were observed in Frankia sp. CcI3 by intact-cell matrix-assisted laser desorption-ionization mass spectrometry (MALDI-MS). This work supports the value of bioinformatic investigation in natural products biosynthesis using genomic information and presents a clear roadmap for natural products discovery in the Frankia genus. PMID:21498757
Abnormal DNA methylation may contribute to the progression of osteosarcoma.

PubMed

Chen, Xiao-Gang; Ma, Liang; Xu, Jia-Xin

2018-01-01

The identification of optimal methylation biomarkers to achieve maximum diagnostic ability remains a challenge. The present study aimed to elucidate the potential molecular mechanisms underlying osteosarcoma (OS) using DNA methylation analysis. Based on the GSE36002 dataset obtained from the Gene Expression Omnibus database, differentially methylated genes were extracted between patients with OS and controls using t‑tests. Subsequently, hierarchical clustering was performed to segregate the samples into two distinct clusters, OS and normal. Gene Ontology (GO) and pathway enrichment analyses for differentially methylated genes were performed using the Database for Annotation, Visualization and Integrated Discovery tool. A protein‑protein interaction (PPI) network was established, followed by hub gene identification. Using the cut‑off threshold of ≥0.2 average β‑value difference, 3,725 unique CpGs (2,862 genes) were identified to be differentially methylated between the OS and normal groups. Among these 2,862 genes, 510 genes were differentially hypermethylated and 2,352 were differentially hypomethylated. The differentially hypermethylated genes were primarily involved in 20 GO terms, and the top 3 terms were associated with potassium ion transport. For differentially hypomethylated genes, GO functions principally included passive transmembrane transporter activity, channel activity and metal ion transmembrane transporter activity. In addition, a total of 10 significant pathways were enriched by differentially hypomethylated genes; notably, neuroactive ligand‑receptor interaction was the most significant pathway. Based on a connectivity degree >90, 7 hub genes were selected from the PPI network, including neuromedin U (NMU; degree=103) and NMU receptor 1 (NMUR1; degree=103). Functional terms (potassium ion transport, transmembrane transporter activity, and neuroactive ligand‑receptor interaction) and hub genes (NMU and NMUR1) may serve as potential targets for the treatment and diagnosis of OS.
Analysis of dissimilatory sulfite reductase and 16S rRNA gene fragments from deep-sea hydrothermal sites of the Suiyo Seamount, Izu-Bonin Arc, Western Pacific.

PubMed

Nakagawa, Tatsunori; Ishibashi, Jun-Ichiro; Maruyama, Akihiko; Yamanaka, Toshiro; Morimoto, Yusuke; Kimura, Hiroyuki; Urabe, Tetsuro; Fukui, Manabu

2004-01-01

This study describes the occurrence of unique dissimilatory sulfite reductase (DSR) genes at a depth of 1,380 m from the deep-sea hydrothermal vent field at the Suiyo Seamount, Izu-Bonin Arc, Western Pacific, Japan. The DSR genes were obtained from microbes that grew in a catheter-type in situ growth chamber deployed for 3 days on a vent and from the effluent water of drilled holes at 5 degrees C and natural vent fluids at 7 degrees C. DSR clones SUIYOdsr-A and SUIYOdsr-B were not closely related to cultivated species or environmental clones. Moreover, samples of microbial communities were examined by PCR-denaturing gradient gel electrophoresis (DGGE) analysis of the 16S rRNA gene. The sequence analysis of 16S rRNA gene fragments obtained from the vent catheter after a 3-day incubation revealed the occurrence of bacterial DGGE bands affiliated with the Aquificae and gamma- and epsilon-Proteobacteria as well as the occurrence of archaeal phylotypes affiliated with the Thermococcales and of a unique archaeon sequence that clustered with "Nanoarchaeota." The DGGE bands obtained from drilled holes and natural vent fluids from 7 to 300 degrees C were affiliated with the delta-Proteobacteria, genus Thiomicrospira, and Pelodictyon. The dominant DGGE bands retrieved from the effluent water of casing pipes at 3 and 4 degrees C were closely related to phylotypes obtained from the Arctic Ocean. Our results suggest the presence of microorganisms corresponding to a unique DSR lineage not detected previously from other geothermal environments.
Finding gene clusters for a replicated time course study

PubMed Central

2014-01-01

Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656
Phylogenetic Analysis of H6 Influenza Viruses Isolated from Rosy-Billed Pochards (Netta peposaca) in Argentina Reveals the Presence of Different HA Gene Clusters ▿

PubMed Central

Rimondi, Agustina; Xu, Kemin; Craig, Maria Isabel; Shao, Hongxia; Ferreyra, Hebe; Rago, Maria Virginia; Romano, Marcelo; Uhart, Marcela; Sutton, Troy; Ferrero, Andrea; Perez, Daniel R.; Pereda, Ariel

2011-01-01

Until recently, influenza A viruses from wild waterfowl in South America were rarely isolated and/or characterized. To explore the ecology of influenza A viruses in this region, a long-term surveillance program was established in 2006 for resident and migratory water birds in Argentina. We report the characterization of 5 avian influenza viruses of the H6 hemagglutinin (HA) subtype isolated from rosy-billed pochards (Netta peposaca). Three of these viruses were paired to an N2 NA subtype, while the other two were of the N8 subtype. Genetic and phylogenetic analyses of the internal gene segments revealed a close relationship with influenza viruses from South America, forming a unique clade and supporting the notion of independent evolution from influenza A viruses in other latitudes. The presence of NS alleles A and B was also identified. The HA and NA genes formed unique clades separate from North American and Eurasian viruses, with the exception of the HA gene of one isolate, which was more closely related to the North American lineage, suggesting possible interactions between viruses of North American and South American lineages. Animal studies suggested that these Argentine H6 viruses could replicate and transmit inefficiently in chickens, indicating limited adaptation to poultry. Our results highlight the importance of continued influenza virus surveillance in wild birds of South America, especially considering the unique evolution of these viruses. PMID:21976652
Multiconstrained gene clustering based on generalized projections

PubMed Central

2010-01-01

Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386
Natural Product Biosynthetic Diversity and Comparative Genomics of the Cyanobacteria.

PubMed

Dittmann, Elke; Gugger, Muriel; Sivonen, Kaarina; Fewer, David P

2015-10-01

Cyanobacteria are an ancient lineage of slow-growing photosynthetic bacteria and a prolific source of natural products with intricate chemical structures and potent biological activities. The bulk of these natural products are known from just a handful of genera. Recent efforts have elucidated the mechanisms underpinning the biosynthesis of a diverse array of natural products from cyanobacteria. Many of the biosynthetic mechanisms are unique to cyanobacteria or rarely described from other organisms. Advances in genome sequence technology have precipitated a deluge of genome sequences for cyanobacteria. This makes it possible to link known natural products to biosynthetic gene clusters but also accelerates the discovery of new natural products through genome mining. These studies demonstrate that cyanobacteria encode a huge variety of cryptic gene clusters for the production of natural products, and the known chemical diversity is likely to be just a fraction of the true biosynthetic capabilities of this fascinating and ancient group of organisms. Copyright © 2015. Published by Elsevier Ltd.
Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

PubMed

Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

2014-08-01

Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Horizontal transfer of a large and highly toxic secondary metabolic gene cluster between fungi.

PubMed

Slot, Jason C; Rokas, Antonis

2011-01-25

Genes involved in intermediary and secondary metabolism in fungi are frequently physically linked or clustered. For example, in Aspergillus nidulans the entire pathway for the production of sterigmatocystin (ST), a highly toxic secondary metabolite and a precursor to the aflatoxins (AF), is located in a ∼54 kb, 23 gene cluster. We discovered that a complete ST gene cluster in Podospora anserina was horizontally transferred from Aspergillus. Phylogenetic analysis shows that most Podospora cluster genes are adjacent to or nested within Aspergillus cluster genes, although the two genera belong to different taxonomic classes. Furthermore, the Podospora cluster is highly conserved in content, sequence, and microsynteny with the Aspergillus ST/AF clusters and its intergenic regions contain 14 putative binding sites for AflR, the transcription factor required for activation of the ST/AF biosynthetic genes. Examination of ∼52,000 Podospora expressed sequence tags identified transcripts for 14 genes in the cluster, with several expressed at multiple life cycle stages. The presence of putative AflR-binding sites and the expression evidence for several cluster genes, coupled with the recent independent discovery of ST production in Podospora [1], suggest that this HGT event probably resulted in a functional cluster. Given the abundance of metabolic gene clusters in fungi, our finding that one of the largest known metabolic gene clusters moved intact between species suggests that such transfers might have significantly contributed to fungal metabolic diversity. PAPERFLICK: Copyright Â© 2011 Elsevier Ltd. All rights reserved.
Genome-wide identification of physically clustered genes suggests chromatin-level co-regulation in male reproductive development in Arabidopsis thaliana

PubMed Central

Reimegård, Johan; Kundu, Snehangshu; Pendle, Ali; Irish, Vivian F.; Shaw, Peter

2017-01-01

Abstract Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation. PMID:28175342
Characterization of Staphylococcus and Corynebacterium Clusters in the Human Axillary Region

PubMed Central

Callewaert, Chris; Kerckhof, Frederiek-Maarten; Granitsiotis, Michael S.; Van Gele, Mireille; Van de Wiele, Tom; Boon, Nico

2013-01-01

The skin microbial community is regarded as essential for human health and well-being, but likewise plays an important role in the formation of body odor in, for instance, the axillae. Few molecular-based research was done on the axillary microbiome. This study typified the axillary microbiome of a group of 53 healthy subjects. A profound view was obtained of the interpersonal, intrapersonal and temporal diversity of the human axillary microbiota. Denaturing gradient gel electrophoresis (DGGE) and next generation sequencing on 16S rRNA gene region were combined and used as extent to each other. Two important clusters were characterized, where Staphylococcus and Corynebacterium species were the abundant species. Females predominantly clustered within the Staphylococcus cluster (87%, n = 17), whereas males clustered more in the Corynebacterium cluster (39%, n = 36). The axillary microbiota was unique to each individual. Left-right asymmetry occurred in about half of the human population. For the first time, an elaborate study was performed on the dynamics of the axillary microbiome. A relatively stable axillary microbiome was noticed, although a few subjects evolved towards another stable community. The deodorant usage had a proportional linear influence on the species diversity of the axillary microbiome. PMID:23950955
Constrained clusters of gene expression profiles with pathological features.

PubMed

Sese, Jun; Kurokawa, Yukinori; Monden, Morito; Kato, Kikuya; Morishita, Shinichi

2004-11-22

Gene expression profiles should be useful in distinguishing variations in disease, since they reflect accurately the status of cells. The primary clustering of gene expression reveals the genotypes that are responsible for the proximity of members within each cluster, while further clustering elucidates the pathological features of the individual members of each cluster. However, since the first clustering process and the second classification step, in which the features are associated with clusters, are performed independently, the initial set of clusters may omit genes that are associated with pathologically meaningful features. Therefore, it is important to devise a way of identifying gene expression clusters that are associated with pathological features. We present the novel technique of 'itemset constrained clustering' (IC-Clustering), which computes the optimal cluster that maximizes the interclass variance of gene expression between groups, which are divided according to the restriction that only divisions that can be expressed using common features are allowed. This constraint automatically labels each cluster with a set of pathological features which characterize that cluster. When applied to liver cancer datasets, IC-Clustering revealed informative gene expression clusters, which could be annotated with various pathological features, such as 'tumor' and 'man', or 'except tumor' and 'normal liver function'. In contrast, the k-means method overlooked these clusters.
Histidine Kinase-Mediated Production and Autoassembly of Porphyromonas gingivalis Fimbriae▿ †

PubMed Central

Nishikawa, Kiyoshi; Duncan, Margaret J.

2010-01-01

Porphyromonas gingivalis, a Gram-negative oral anaerobe, is strongly associated with chronic adult periodontitis, and it utilizes FimA fimbriae to persistently colonize and evade host defenses in the periodontal crevice. The FimA-related gene cluster (the fim gene cluster) is positively regulated by the FimS-FimR two-component system. In this study, comparative analyses between fimbriate type strain ATCC 33277 and fimbria-deficient strain W83 revealed differences in their fimS loci, which encode FimS histidine kinase. Using a reciprocal gene exchange system, we established that FimS from W83 is malfunctional. Complementation analysis with chimeric fimS constructs revealed that W83 FimS has a defective kinase domain due to a truncated conserved G3 box motif that provides an ATP-binding pocket. The introduction of the functional fimS from 33277 restored the production, but not polymerization, of endogenous FimA subunits in W83. Further analyses with a fimA-exchanged W83 isogenic strain showed that even the fimbria-deficient W83 retains the ability to polymerize FimA from 33277, indicating the assembly of mature FimA by a primary structure-dependent mechanism. It also was shown that the substantial expression of 33277-type FimA fimbriae in the W83 derivative requires the introduction and expression of the functional 33277 fimS. These findings indicate that FimSR is the unique and universal regulatory system that activates the fim gene cluster in a fimA genotype-independent manner. PMID:20118268
Diametrical clustering for identifying anti-correlated gene clusters.

PubMed

Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

2003-09-01

Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
Gene Cluster Responsible for Secretion of and Immunity to Multiple Bacteriocins, the NKR-5-3 Enterocins

PubMed Central

Ishibashi, Naoki; Himeno, Kohei; Masuda, Yoshimitsu; Perez, Rodney Honrada; Iwatani, Shun; Wilaipun, Pongtep; Leelawatcharamas, Vichien; Nakayama, Jiro; Sonomoto, Kenji

2014-01-01

Enterococcus faecium NKR-5-3, isolated from Thai fermented fish, is characterized by the unique ability to produce five bacteriocins, namely, enterocins NKR-5-3A, -B, -C, -D, and -Z (Ent53A, Ent53B, Ent53C, Ent53D, and Ent53Z). Genetic analysis with a genome library revealed that the bacteriocin structural genes (enkA [ent53A], enkC [ent53C], enkD [ent53D], and enkZ [ent53Z]) that encode these peptides (except for Ent53B) are located in close proximity to each other. This NKR-5-3ACDZ (Ent53ACDZ) enterocin gene cluster (approximately 13 kb long) includes certain bacteriocin biosynthetic genes such as an ABC transporter gene (enkT), two immunity genes (enkIaz and enkIc), a response regulator (enkR), and a histidine protein kinase (enkK). Heterologous-expression studies of enkT and ΔenkT mutant strains showed that enkT is responsible for the secretion of Ent53A, Ent53C, Ent53D, and Ent53Z, suggesting that EnkT is a wide-range ABC transporter that contributes to the effective production of these bacteriocins. In addition, EnkIaz and EnkIc were found to confer self-immunity to the respective bacteriocins. Furthermore, bacteriocin induction assays performed with the ΔenkRK mutant strain showed that EnkR and EnkK are regulatory proteins responsible for bacteriocin production and that, together with Ent53D, they constitute a three-component regulatory system. Thus, the Ent53ACDZ gene cluster is essential for the biosynthesis and regulation of NKR-5-3 enterocins, and this is, to our knowledge, the first report that demonstrates the secretion of multiple bacteriocins by an ABC transporter. PMID:25149515
Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat

The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less
Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

DOE PAGES

Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

2016-01-01

The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less
The genetic and molecular basis for sunscreen biosynthesis in cyanobacteria

PubMed Central

Balskus, Emily P.; Walsh, Christopher T.

2011-01-01

UV-A and UV-B radiation are harmful to living systems, causing damage to biological macromolecules. An important strategy for dealing with UV exposure is the biosynthesis of small molecule sunscreens. Among such metabolites, the mycosporine and mycosporine-like amino acids (MAAs) are remarkable for their wide phylogenetic distribution and their unique chemical structures. Here we report the identification of a MAA biosynthetic gene cluster in a cyanobacterium and the discovery of analogous pathways in other sequenced organisms. We have expressed the cluster in a heterologous bacterial host and characterized all four biosynthetic enzymes in vitro. In addition to clarifying the origin of the MAAs, these efforts have revealed two unprecedented enzymatic strategies for imine formation. PMID:20813918
Breakup of a homeobox cluster after genome duplication in teleosts

PubMed Central

Mulley, John F.; Chiu, Chi-hua; Holland, Peter W. H.

2006-01-01

Several families of homeobox genes are arranged in genomic clusters in metazoan genomes, including the Hox, ParaHox, NK, Rhox, and Iroquois gene clusters. The selective pressures responsible for maintenance of these gene clusters are poorly understood. The ParaHox gene cluster is evolutionarily conserved between amphioxus and human but is fragmented in teleost fishes. We show that two basal ray-finned fish, Polypterus and Amia, each possess an intact ParaHox cluster; this implies that the selective pressure maintaining clustering was lost after whole-genome duplication in teleosts. Cluster breakup is because of gene loss, not transposition or inversion, and the total number of ParaHox genes is the same in teleosts, human, mouse, and frog. We propose that this homeobox gene cluster is held together in chordates by the existence of interdigitated control regions that could be separated after locus duplication in the teleost fish. PMID:16801555

Receptor-like genes in the major resistance locus of lettuce are subject to divergent selection.

PubMed Central

Meyers, B C; Shen, K A; Rohani, P; Gaut, B S; Michelmore, R W

1998-01-01

Disease resistance genes in plants are often found in complex multigene families. The largest known cluster of disease resistance specificities in lettuce contains the RGC2 family of genes. We compared the sequences of nine full-length genomic copies of RGC2 representing the diversity in the cluster to determine the structure of genes within this family and to examine the evolution of its members. The transcribed regions range from at least 7.0 to 13.1 kb, and the cDNAs contain deduced open reading frames of approximately 5. 5 kb. The predicted RGC2 proteins contain a nucleotide binding site and irregular leucine-rich repeats (LRRs) that are characteristic of resistance genes cloned from other species. Unique features of the RGC2 gene products include a bipartite LRR region with >40 repeats. At least eight members of this family are transcribed. The level of sequence diversity between family members varied in different regions of the gene. The ratio of nonsynonymous (Ka) to synonymous (Ks) nucleotide substitutions was lowest in the region encoding the nucleotide binding site, which is the presumed effector domain of the protein. The LRR-encoding region showed an alternating pattern of conservation and hypervariability. This alternating pattern of variation was also found in all comparisons within families of resistance genes cloned from other species. The Ka /Ks ratios indicate that diversifying selection has resulted in increased variation at these codons. The patterns of variation support the predicted structure of LRR regions with solvent-exposed hypervariable residues that are potentially involved in binding pathogen-derived ligands. PMID:9811792
A Functionally Conserved Gene Regulatory Network Module Governing Olfactory Neuron Diversity.

PubMed

Li, Qingyun; Barish, Scott; Okuwa, Sumie; Maciejewski, Abigail; Brandt, Alicia T; Reinhold, Dominik; Jones, Corbin D; Volkan, Pelin Cayirlioglu

2016-01-01

Sensory neuron diversity is required for organisms to decipher complex environmental cues. In Drosophila, the olfactory environment is detected by 50 different olfactory receptor neuron (ORN) classes that are clustered in combinations within distinct sensilla subtypes. Each sensilla subtype houses stereotypically clustered 1-4 ORN identities that arise through asymmetric divisions from a single multipotent sensory organ precursor (SOP). How each class of SOPs acquires a unique differentiation potential that accounts for ORN diversity is unknown. Previously, we reported a critical component of SOP diversification program, Rotund (Rn), increases ORN diversity by generating novel developmental trajectories from existing precursors within each independent sensilla type lineages. Here, we show that Rn, along with BarH1/H2 (Bar), Bric-à-brac (Bab), Apterous (Ap) and Dachshund (Dac), constitutes a transcription factor (TF) network that patterns the developing olfactory tissue. This network was previously shown to pattern the segmentation of the leg, which suggests that this network is functionally conserved. In antennal imaginal discs, precursors with diverse ORN differentiation potentials are selected from concentric rings defined by unique combinations of these TFs along the proximodistal axis of the developing antennal disc. The combinatorial code that demarcates each precursor field is set up by cross-regulatory interactions among different factors within the network. Modifications of this network lead to predictable changes in the diversity of sensilla subtypes and ORN pools. In light of our data, we propose a molecular map that defines each unique SOP fate. Our results highlight the importance of the early prepatterning gene regulatory network as a modulator of SOP and terminally differentiated ORN diversity. Finally, our model illustrates how conserved developmental strategies are used to generate neuronal diversity.
CXC chemokine ligand 4 induces a unique transcriptome in monocyte-derived macrophages.

PubMed

Gleissner, Christian A; Shaked, Iftach; Little, Kristina M; Ley, Klaus

2010-05-01

In atherosclerotic arteries, blood monocytes differentiate to macrophages in the presence of growth factors, such as macrophage colony-stimulation factor (M-CSF), and chemokines, such as platelet factor 4 (CXCL4). To compare the gene expression signature of CXCL4-induced macrophages with M-CSF-induced macrophages or macrophages polarized with IFN-gamma/LPS (M1) or IL-4 (M2), we cultured primary human peripheral blood monocytes for 6 d. mRNA expression was measured by Affymetrix gene chips, and differences were analyzed by local pooled error test, profile of complex functionality, and gene set enrichment analysis. Three hundred seventy-five genes were differentially expressed between M-CSF- and CXCL4-induced macrophages; 206 of them overexpressed in CXCL4 macrophages coding for genes implicated in the inflammatory/immune response, Ag processing and presentation, and lipid metabolism. CXCL4-induced macrophages overexpressed some M1 and M2 genes and the corresponding cytokines at the protein level; however, their transcriptome clustered with neither M1 nor M2 transcriptomes. They almost completely lost the ability to phagocytose zymosan beads. Genes linked to atherosclerosis were not consistently upregulated or downregulated. Scavenger receptors showed lower and cholesterol efflux transporters showed higher expression in CXCL4- than M-CSF-induced macrophages, resulting in lower low-density lipoprotein content. We conclude that CXCL4 induces a unique macrophage transcriptome distinct from known macrophage types, defining a new macrophage differentiation that we propose to call M4.
A Cluster of Cuticle Protein Genes of Drosophila Melanogaster at 65a: Sequence, Structure and Evolution

PubMed Central

Charles, J. P.; Chihara, C.; Nejad, S.; Riddiford, L. M.

1997-01-01

A 36-kb genomic DNA segment of the Drosophila melanogaster genome containing 12 clustered cuticle genes has been mapped and partially sequenced. The cluster maps at 65A 5-6 on the left arm of the third chromosome, in agreement with the previously determined location of a putative cluster encompassing the genes for the third instar larval cuticle proteins LCP5, LCP6 and LCP8. This cluster is the largest cuticle gene cluster discovered to date and shows a number of surprising features that explain in part the genetic complexity of the LCP5, LCP6 and LCP8 loci. The genes encoding LCP5 and LCP8 are multiple copy genes and the presence of extensive similarity in their coding regions gives the first evidence for gene conversion in cuticle genes. In addition, five genes in the cluster are intronless. Four of these five have arisen by retroposition. The other genes in the cluster have a single intron located at an unusual location for insect cuticle genes. PMID:9383064
Genome-scale CRISPR-Cas9 knockout screening in human cells.

PubMed

Shalem, Ophir; Sanjana, Neville E; Hartenian, Ella; Shi, Xi; Scott, David A; Mikkelson, Tarjei; Heckl, Dirk; Ebert, Benjamin L; Root, David E; Doench, John G; Zhang, Feng

2014-01-03

The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12, as well as novel hits NF2, CUL3, TADA2B, and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.
Comparative Genomics Identifies Epidermal Proteins Associated with the Evolution of the Turtle Shell

PubMed Central

Holthaus, Karin Brigit; Strasser, Bettina; Sipos, Wolfgang; Schmidt, Heiko A.; Mlitz, Veronika; Sukseree, Supawadee; Weissenbacher, Anton; Tschachler, Erwin; Alibardi, Lorenzo; Eckhart, Leopold

2016-01-01

The evolution of reptiles, birds, and mammals was associated with the origin of unique integumentary structures. Studies on lizards, chicken, and humans have suggested that the evolution of major structural proteins of the outermost, cornified layers of the epidermis was driven by the diversification of a gene cluster called Epidermal Differentiation Complex (EDC). Turtles have evolved unique defense mechanisms that depend on mechanically resilient modifications of the epidermis. To investigate whether the evolution of the integument in these reptiles was associated with specific adaptations of the sequences and expression patterns of EDC-related genes, we utilized newly available genome sequences to determine the epidermal differentiation gene complement of turtles. The EDC of the western painted turtle (Chrysemys picta bellii) comprises more than 100 genes, including at least 48 genes that encode proteins referred to as beta-keratins or corneous beta-proteins. Several EDC proteins have evolved cysteine/proline contents beyond 50% of total amino acid residues. Comparative genomics suggests that distinct subfamilies of EDC genes have been expanded and partly translocated to loci outside of the EDC in turtles. Gene expression analysis in the European pond turtle (Emys orbicularis) showed that EDC genes are differentially expressed in the skin of the various body sites and that a subset of beta-keratin genes within the EDC as well as those located outside of the EDC are expressed predominantly in the shell. Our findings give strong support to the hypothesis that the evolutionary innovation of the turtle shell involved specific molecular adaptations of epidermal differentiation. PMID:26601937
Differential expression of Meis2, Mab21l2 and Tbx3 during limb development associated with diversification of limb morphology in mammals.

PubMed

Dai, Mengyao; Wang, Yao; Fang, Lu; Irwin, David M; Zhu, Tengteng; Zhang, Junpeng; Zhang, Shuyi; Wang, Zhe

2014-01-01

Bats are the only mammals capable of self-powered flight using wings. Differing from mouse or human limbs, four elongated digits within a broad wing membrane support the bat wing, and the foot of the bat has evolved a long calcar that spread the interfemoral membrane. Our recent mRNA sequencing (mRNA-Seq) study found unique expression patterns for genes at the 5' end of the Hoxd gene cluster and for Tbx3 that are associated with digit elongation and wing membrane growth in bats. In this study, we focused on two additional genes, Meis2 and Mab21l2, identified from the mRNA-Seq data. Using whole-mount in situ hybridization (WISH) we validated the mRNA-Seq results for differences in the expression patterns of Meis2 and Mab21l2 between bat and mouse limbs, and further characterize the timing and location of the expression of these two genes. These analyses suggest that Meis2 may function in wing membrane growth and Mab21l2 may have a role in AP and DV axial patterning. In addition, we found that Tbx3 is uniquely expressed in the unique calcar structure found in the bat hindlimb, suggesting a role for this gene in calcar growth and elongation. Moreover, analysis of the coding sequences for Meis2, Mab21l2 and Tbx3 showed that Meis2 and Mab21l2 have high sequence identity, consistent with the functions of genes being conserved, but that Tbx3 showed accelerated evolution in bats. However, evidence for positive selection in Tbx3 was not found, which would suggest that the function of this gene has not been changed. Together, our findings support the hypothesis that the modulation of the spatiotemporal expression patterns of multiple functional conserved genes control limb morphology and drive morphological change in the diversification of mammalian limbs.
Differential Expression of Meis2, Mab21l2 and Tbx3 during Limb Development Associated with Diversification of Limb Morphology in Mammals

PubMed Central

Fang, Lu; Irwin, David M.; Zhu, Tengteng; Zhang, Junpeng; Zhang, Shuyi; Wang, Zhe

2014-01-01

Bats are the only mammals capable of self-powered flight using wings. Differing from mouse or human limbs, four elongated digits within a broad wing membrane support the bat wing, and the foot of the bat has evolved a long calcar that spread the interfemoral membrane. Our recent mRNA sequencing (mRNA-Seq) study found unique expression patterns for genes at the 5′ end of the Hoxd gene cluster and for Tbx3 that are associated with digit elongation and wing membrane growth in bats. In this study, we focused on two additional genes, Meis2 and Mab21l2, identified from the mRNA-Seq data. Using whole-mount in situ hybridization (WISH) we validated the mRNA-Seq results for differences in the expression patterns of Meis2 and Mab21l2 between bat and mouse limbs, and further characterize the timing and location of the expression of these two genes. These analyses suggest that Meis2 may function in wing membrane growth and Mab21l2 may have a role in AP and DV axial patterning. In addition, we found that Tbx3 is uniquely expressed in the unique calcar structure found in the bat hindlimb, suggesting a role for this gene in calcar growth and elongation. Moreover, analysis of the coding sequences for Meis2, Mab21l2 and Tbx3 showed that Meis2 and Mab21l2 have high sequence identity, consistent with the functions of genes being conserved, but that Tbx3 showed accelerated evolution in bats. However, evidence for positive selection in Tbx3 was not found, which would suggest that the function of this gene has not been changed. Together, our findings support the hypothesis that the modulation of the spatiotemporal expression patterns of multiple functional conserved genes control limb morphology and drive morphological change in the diversification of mammalian limbs. PMID:25166052
Molecular profile of the unique species of traditional Chinese medicine, Chinese seahorse (Hippocampus kuda Bleeker).

PubMed

Zhang, Ning; Xu, Bin; Mou, Chunyan; Yang, Wenli; Wei, Jianwen; Lu, Liang; Zhu, Junjie; Du, Jingchun; Wu, Xiaokun; Ye, Lanting; Fu, Zhiyan; Lu, Yang; Lin, Jianghai; Sun, Zizi; Su, Jing; Dong, Meiling; Xu, Anlong

2003-08-28

A cDNA library of male Chinese seahorse (Hippocampus kuda Bleeker) was constructed to investigate the molecular profile of seahorse as one of the most famous traditional Chinese medicine materials, and to reveal immunological and physiological mechanisms of seahorse as one of the most primitive vertebrates at molecular level. A total of 3372 expressed sequence tags (ESTs) consisting of 1911 unique genes (345 clusters and 1566 singletons) were examined in the present study. Identification of the genes related to immune system, paternal brooding and physiological regulation provides not only valuable insights into the molecular mechanism of immune system in teleost fish but also plausible explanations for pharmacological activities of Chinese seahorse. Furthermore, the occurrence of high prevalent C-type lectins suggested that a lectin-complement pathway might exert a more dominant function in the innate immune system of teleost than mammal. Carbohydrate recognition domain (CRD) without a collagen-like region in the lectins of seahorse was likely an ancient characteristic of lectins similar to invertebrates.
Conserved syntenic clusters of protein coding genes are missing in birds.

PubMed

Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V

2014-01-01

Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
A comprehensive analysis of Helicobacter pylori plasticity zones reveals that they are integrating conjugative elements with intermediate integration specificity.

PubMed

Fischer, Wolfgang; Breithaupt, Ute; Kern, Beate; Smith, Stella I; Spicher, Carolin; Haas, Rainer

2014-04-27

The human gastric pathogen Helicobacter pylori is a paradigm for chronic bacterial infections. Its persistence in the stomach mucosa is facilitated by several mechanisms of immune evasion and immune modulation, but also by an unusual genetic variability which might account for the capability to adapt to changing environmental conditions during long-term colonization. This variability is reflected by the fact that almost each infected individual is colonized by a genetically unique strain. Strain-specific genes are dispersed throughout the genome, but clusters of genes organized as genomic islands may also collectively be present or absent. We have comparatively analysed such clusters, which are commonly termed plasticity zones, in a high number of H. pylori strains of varying geographical origin. We show that these regions contain fixed gene sets, rather than being true regions of genome plasticity, but two different types and several subtypes with partly diverging gene content can be distinguished. Their genetic diversity is incongruent with variations in the rest of the genome, suggesting that they are subject to horizontal gene transfer within H. pylori populations. We identified 40 distinct integration sites in 45 genome sequences, with a conserved heptanucleotide motif that seems to be the minimal requirement for integration. The significant number of possible integration sites, together with the requirement for a short conserved integration motif and the high level of gene conservation, indicates that these elements are best described as integrating conjugative elements (ICEs) with an intermediate integration site specificity.
Mrp--a new auxiliary gene essential for optimal expression of methicillin resistance in Staphylococcus aureus.

PubMed

Wu, S W; De Lencastre, H

1999-01-01

Screening of a library of Tn551 insertional mutants selected for reduction in the methicillin resistance level of the parental Staphylococcus aureus strain COL resulted in the isolation of mutant RUSA266 in which the minimal inhibitory concentration (MIC) of the parent was reduced from 1,600 to 1.5 micrograms/mL. Cloning and sequencing of the vicinity of the insertion site omega 726 identified an open reading frame (orf1365) encoding a very large polypeptide of more than 1,365 amino acids. A unique feature of the deduced amino acid sequence was the presence of multiple tandem repeats of 75 amino acids in the polypeptide, reminiscent of the structure of high-molecular-weight cell-surface proteins EF* and Emb identified in some streptococcal strains. Mutant RUSA266 with the inactivated gene, which we shall provisionally refer to as mrp (for multiple repeat polypeptide), produced a peptidoglycan with altered muropeptide composition, and both the reduced antibiotic resistance and the altered cell wall composition were co-transduced in back-crosses into the parental strain COL. Additional sequencing upstream of mrp has revealed that this gene was part of a five-gene cluster occupying a 9.2-kb region of the staphylococcal chromosome and was composed of glmM (directly upstream of mrp), two open reading frames orf310 and orf269 coding for two hypothetical proteins, and the gene encoding the staphylococcal arginase (arg). Transcriptional analysis demonstrated that the five genes in the cluster were transcribed together.
Identification of lethal cluster of genes in the yeast transcription network

NASA Astrophysics Data System (ADS)

Rho, K.; Jeong, H.; Kahng, B.

2006-05-01

Identification of essential or lethal genes would be one of the ultimate goals in drug designs. Here we introduce an in silico method to select the cluster with a high population of lethal genes, called lethal cluster, through microarray assay. We construct a gene transcription network based on the microarray expression level. Links are added one by one in the descending order of the Pearson correlation coefficients between two genes. As the link density p increases, two meaningful link densities pm and ps are observed. At pm, which is smaller than the percolation threshold, the number of disconnected clusters is maximum, and the lethal genes are highly concentrated in a certain cluster that needs to be identified. Thus the deletion of all genes in that cluster could efficiently lead to a lethal inviable mutant. This lethal cluster can be identified by an in silico method. As p increases further beyond the percolation threshold, the power law behavior in the degree distribution of a giant cluster appears at ps. We measure the degree of each gene at ps. With the information pertaining to the degrees of each gene at ps, we return to the point pm and calculate the mean degree of genes of each cluster. We find that the lethal cluster has the largest mean degree.
Identification of embryonic pancreatic genes using Xenopus DNA microarrays.

PubMed

Hayata, Tadayoshi; Blitz, Ira L; Iwata, Nahoko; Cho, Ken W Y

2009-06-01

The pancreas is both an exocrine and endocrine endodermal organ involved in digestion and glucose homeostasis. During embryogenesis, the anlagen of the pancreas arise from dorsal and ventral evaginations of the foregut that later fuse to form a single organ. To better understand the molecular genetics of early pancreas development, we sought to isolate markers that are uniquely expressed in this tissue. Microarray analysis was performed comparing dissected pancreatic buds, liver buds, and the stomach region of tadpole stage Xenopus embryos. A total of 912 genes were found to be differentially expressed between these organs during early stages of organogenesis. K-means clustering analysis predicted 120 of these genes to be specifically enriched in the pancreas. Of these, we report on the novel expression patterns of 24 genes. Our analyses implicate the involvement of previously unsuspected signaling pathways during early pancreas development. Developmental Dynamics 238:1455-1466, 2009. (c) 2009 Wiley-Liss, Inc.
The complete chloroplast genome of an irreplaceable dietary and model crop, foxtail millet (Setaria italica).

PubMed

Wang, Shuo; Gao, Li-Zhi

2016-11-01

The complete chloroplast genome sequence of foxtail millet (Setaria italica), an important food and fodder crop in the family Poaceae, is first reported in this study. The genome consists of 1 35 516 bp containing a pair of inverted repeats (IRs) of 21 804 bp separated by a large single-copy (LSC) region and a small single-copy (SSC) region of 79 896 bp and 12 012 bp, respectively. Coding sequences constitute 58.8% of the genome harboring 111 unique genes, 71 of which are protein-coding genes, 4 are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated foxtail millet clustered with Panicum virgatum and Echinochloa crus-galli belonging to the tribe Paniceae of the subfamily Panicoideae. This newly determined chloroplast genome will provide valuable information for the future breeding programs of valuable cereal crops in the family Poaceae.
Genetic interrelations in the actinomycin biosynthetic gene clusters of Streptomyces antibioticus IMRU 3720 and Streptomyces chrysomallus ATCC11523, producers of actinomycin X and actinomycin C

PubMed Central

Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich

2017-01-01

Sequencing the actinomycin (acm) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN, encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S. antibioticus. PMID:28435299
Genetic interrelations in the actinomycin biosynthetic gene clusters of Streptomyces antibioticus IMRU 3720 and Streptomyces chrysomallus ATCC11523, producers of actinomycin X and actinomycin C.

PubMed

Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich

2017-01-01

Sequencing the actinomycin ( acm ) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN , encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S. antibioticus .
Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

PubMed

Lukashin, A V; Fuchs, R

2001-05-01

Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

PubMed Central

Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

2015-01-01

The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694
A tripartite clustering analysis on microRNA, gene and disease model.

PubMed

Shen, Chengcheng; Liu, Ying

2012-02-01

Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.

From hormones to secondary metabolism: the emergence of metabolic gene clusters in plants.

PubMed

Chu, Hoi Yee; Wegel, Eva; Osbourn, Anne

2011-04-01

Gene clusters for the synthesis of secondary metabolites are a common feature of microbial genomes. Well-known examples include clusters for the synthesis of antibiotics in actinomycetes, and also for the synthesis of antibiotics and toxins in filamentous fungi. Until recently it was thought that genes for plant metabolic pathways were not clustered, and this is certainly true in many cases; however, five plant secondary metabolic gene clusters have now been discovered, all of them implicated in synthesis of defence compounds. An obvious assumption might be that these eukaryotic gene clusters have arisen by horizontal gene transfer from microbes, but there is compelling evidence to indicate that this is not the case. This raises intriguing questions about how widespread such clusters are, what the significance of clustering is, why genes for some metabolic pathways are clustered and those for others are not, and how these clusters form. In answering these questions we may hope to learn more about mechanisms of genome plasticity and adaptive evolution in plants. It is noteworthy that for the five plant secondary metabolic gene clusters reported so far, the enzymes for the first committed steps all appear to have been recruited directly or indirectly from primary metabolic pathways involved in hormone synthesis. This may or may not turn out to be a common feature of plant secondary metabolic gene clusters as new clusters emerge. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.
Assembly and features of secondary metabolite biosynthetic gene clusters in Streptomyces ansochromogenes.

PubMed

Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong

2013-07-01

A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.
Supervised group Lasso with applications to microarray data analysis

PubMed Central

Ma, Shuangge; Song, Xiao; Huang, Jian

2007-01-01

Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Modularity of Plant Metabolic Gene Clusters: A Trio of Linked Genes That Are Collectively Required for Acylation of Triterpenes in Oat[W][OA

PubMed Central

Mugford, Sam T.; Louveau, Thomas; Melton, Rachel; Qi, Xiaoquan; Bakht, Saleha; Hill, Lionel; Tsurushima, Tetsu; Honkanen, Suvi; Rosser, Susan J.; Lomonossoff, George P.; Osbourn, Anne

2013-01-01

Operon-like gene clusters are an emerging phenomenon in the field of plant natural products. The genes encoding some of the best-characterized plant secondary metabolite biosynthetic pathways are scattered across plant genomes. However, an increasing number of gene clusters encoding the synthesis of diverse natural products have recently been reported in plant genomes. These clusters have arisen through the neo-functionalization and relocation of existing genes within the genome, and not by horizontal gene transfer from microbes. The reasons for clustering are not yet clear, although this form of gene organization is likely to facilitate co-inheritance and co-regulation. Oats (Avena spp) synthesize antimicrobial triterpenoids (avenacins) that provide protection against disease. The synthesis of these compounds is encoded by a gene cluster. Here we show that a module of three adjacent genes within the wider biosynthetic gene cluster is required for avenacin acylation. Through the characterization of these genes and their encoded proteins we present a model of the subcellular organization of triterpenoid biosynthesis. PMID:23532069
Prokaryotic Gene Clusters: A Rich Toolbox for Synthetic Biology

PubMed Central

Fischbach, Michael; Voigt, Christopher A.

2014-01-01

Bacteria construct elaborate nanostructures, obtain nutrients and energy from diverse sources, synthesize complex molecules, and implement signal processing to react to their environment. These complex phenotypes require the coordinated action of multiple genes, which are often encoded in a contiguous region of the genome, referred to as a gene cluster. Gene clusters sometimes contain all of the genes necessary and sufficient for a particular function. As an evolutionary mechanism, gene clusters facilitate the horizontal transfer of the complete function between species. Here, we review recent work on a number of clusters whose functions are relevant to biotechnology. Engineering these clusters has been hindered by their regulatory complexity, the need to balance the expression of many genes, and a lack of tools to design and manipulate DNA at this scale. Advances in synthetic biology will enable the large-scale bottom-up engineering of the clusters to optimize their functions, wake up cryptic clusters, or to transfer them between organisms. Understanding and manipulating gene clusters will move towards an era of genome engineering, where multiple functions can be “mixed-and-matched” to create a designer organism. PMID:21154668
Analysis of Dissimilatory Sulfite Reductase and 16S rRNA Gene Fragments from Deep-Sea Hydrothermal Sites of the Suiyo Seamount, Izu-Bonin Arc, Western Pacific

PubMed Central

Nakagawa, Tatsunori; Ishibashi, Jun-Ichiro; Maruyama, Akihiko; Yamanaka, Toshiro; Morimoto, Yusuke; Kimura, Hiroyuki; Urabe, Tetsuro; Fukui, Manabu

2004-01-01

This study describes the occurrence of unique dissimilatory sulfite reductase (DSR) genes at a depth of 1,380 m from the deep-sea hydrothermal vent field at the Suiyo Seamount, Izu-Bonin Arc, Western Pacific, Japan. The DSR genes were obtained from microbes that grew in a catheter-type in situ growth chamber deployed for 3 days on a vent and from the effluent water of drilled holes at 5°C and natural vent fluids at 7°C. DSR clones SUIYOdsr-A and SUIYOdsr-B were not closely related to cultivated species or environmental clones. Moreover, samples of microbial communities were examined by PCR-denaturing gradient gel electrophoresis (DGGE) analysis of the 16S rRNA gene. The sequence analysis of 16S rRNA gene fragments obtained from the vent catheter after a 3-day incubation revealed the occurrence of bacterial DGGE bands affiliated with the Aquificae and γ- and ɛ-Proteobacteria as well as the occurrence of archaeal phylotypes affiliated with the Thermococcales and of a unique archaeon sequence that clustered with “Nanoarchaeota.” The DGGE bands obtained from drilled holes and natural vent fluids from 7 to 300°C were affiliated with the δ-Proteobacteria, genus Thiomicrospira, and Pelodictyon. The dominant DGGE bands retrieved from the effluent water of casing pipes at 3 and 4°C were closely related to phylotypes obtained from the Arctic Ocean. Our results suggest the presence of microorganisms corresponding to a unique DSR lineage not detected previously from other geothermal environments. PMID:14711668
Rapid divergence of histones in Hydrozoa (Cnidaria) and evolution of a novel histone involved in DNA damage response in hydra.

PubMed

Reddy, Puli Chandramouli; Ubhe, Suyog; Sirwani, Neha; Lohokare, Rasika; Galande, Sanjeev

2017-08-01

Histones are fundamental components of chromatin in all eukaryotes. Hydra, an emerging model system belonging to the basal metazoan phylum Cnidaria, provides an ideal platform to understand the evolution of core histone components at the base of eumetazoan phyla. Hydra exhibits peculiar properties such as tremendous regenerative capacity, lack of organismal senescence and rarity of malignancy. In light of the role of histone modifications and histone variants in these processes it is important to understand the nature of histones themselves and their variants in hydra. Here, we report identification of the complete repertoire of histone-coding genes in the Hydra magnipapillata genome. Hydra histones were classified based on their copy numbers, gene structure and other characteristic features. Genomic organization of canonical histone genes revealed the presence of H2A-H2B and H3-H4 paired clusters in high frequency and also a cluster with all core histones along with H1. Phylogenetic analysis of identified members of H2A and H2B histones suggested rapid expansion of these groups in Hydrozoa resulting in the appearance of unique subtypes. Amino acid sequence level comparisons of H2A and H2B forms with bilaterian counterparts suggest the possibility of a highly mobile nature of nucleosomes in hydra. Absolute quantitation of transcripts confirmed the high copy number of histones and supported the canonical nature of H2A. Furthermore, functional characterization of H2A.X.1 and a unique variant H2A.X.2 in the gastric region suggest their role in the maintenance of genome integrity and differentiation processes. These findings provide insights into the evolution of histones and their variants in hydra. Copyright © 2017 Elsevier GmbH. All rights reserved.
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

PubMed

Schulz, Tizian; Stoye, Jens; Doerr, Daniel

2018-05-08

Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
Finding approximate gene clusters with Gecko 3.

PubMed

Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

2016-11-16

Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
NIF-type iron-sulfur cluster assembly system is duplicated and distributed in the mitochondria and cytosol of Mastigamoeba balamuthi.

PubMed

Nývltová, Eva; Šuták, Robert; Harant, Karel; Šedinová, Miroslava; Hrdy, Ivan; Paces, Jan; Vlček, Čestmír; Tachezy, Jan

2013-04-30

In most eukaryotes, the mitochondrion is the main organelle for the formation of iron-sulfur (FeS) clusters. This function is mediated through the iron-sulfur cluster assembly machinery, which was inherited from the α-proteobacterial ancestor of mitochondria. In Archamoebae, including pathogenic Entamoeba histolytica and free-living Mastigamoeba balamuthi, the complex iron-sulfur cluster machinery has been replaced by an ε-proteobacterial nitrogen fixation (NIF) system consisting of two components: NifS (cysteine desulfurase) and NifU (scaffold protein). However, the cellular localization of the NIF system and the involvement of mitochondria in archamoebal FeS assembly are controversial. Here, we show that the genes for both NIF components are duplicated within the M. balamuthi genome. One paralog of each protein contains an amino-terminal extension that targets proteins to mitochondria (NifS-M and NifU-M), and the second paralog lacks a targeting signal, thereby reflecting the cytosolic form of the NIF machinery (NifS-C and NifU-C). The dual localization of the NIF system corresponds to the presence of FeS proteins in both cellular compartments, including detectable hydrogenase activity in Mastigamoeba cytosol and mitochondria. In contrast, E. histolytica possesses only single genes encoding NifS and NifU, respectively, and there is no evidence for the presence of the NIF machinery in its reduced mitochondria. Thus, M. balamuthi is unique among eukaryotes in that its FeS cluster formation is mediated through two most likely independent NIF machineries present in two cellular compartments.
NIF-type iron-sulfur cluster assembly system is duplicated and distributed in the mitochondria and cytosol of Mastigamoeba balamuthi

PubMed Central

Nývltová, Eva; Šuták, Robert; Harant, Karel; Šedinová, Miroslava; Hrdý, Ivan; Pačes, Jan; Vlček, Čestmír; Tachezy, Jan

2013-01-01

In most eukaryotes, the mitochondrion is the main organelle for the formation of iron-sulfur (FeS) clusters. This function is mediated through the iron-sulfur cluster assembly machinery, which was inherited from the α-proteobacterial ancestor of mitochondria. In Archamoebae, including pathogenic Entamoeba histolytica and free-living Mastigamoeba balamuthi, the complex iron-sulfur cluster machinery has been replaced by an ε-proteobacterial nitrogen fixation (NIF) system consisting of two components: NifS (cysteine desulfurase) and NifU (scaffold protein). However, the cellular localization of the NIF system and the involvement of mitochondria in archamoebal FeS assembly are controversial. Here, we show that the genes for both NIF components are duplicated within the M. balamuthi genome. One paralog of each protein contains an amino-terminal extension that targets proteins to mitochondria (NifS-M and NifU-M), and the second paralog lacks a targeting signal, thereby reflecting the cytosolic form of the NIF machinery (NifS-C and NifU-C). The dual localization of the NIF system corresponds to the presence of FeS proteins in both cellular compartments, including detectable hydrogenase activity in Mastigamoeba cytosol and mitochondria. In contrast, E. histolytica possesses only single genes encoding NifS and NifU, respectively, and there is no evidence for the presence of the NIF machinery in its reduced mitochondria. Thus, M. balamuthi is unique among eukaryotes in that its FeS cluster formation is mediated through two most likely independent NIF machineries present in two cellular compartments. PMID:23589868
A method to identify differential expression profiles of time-course gene data with Fourier transformation.

PubMed

Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong

2013-10-18

Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.
Methylobacterium Genome Sequences: A Reference Blueprint to Investigate Microbial Metabolism of C1 Compounds from Natural and Industrial Sources

PubMed Central

Lee, Ming-Chun; Bringel, Françoise; Lajus, Aurélie; Zhou, Yang; Gourion, Benjamin; Barbe, Valérie; Chang, Jean; Cruveiller, Stéphane; Dossat, Carole; Gillett, Will; Gruffaz, Christelle; Haugen, Eric; Hourcade, Edith; Levy, Ruth; Mangenot, Sophie; Muller, Emilie; Nadalig, Thierry; Pagni, Marco; Penny, Christian; Peyraud, Rémi; Robinson, David G.; Roche, David; Rouy, Zoé; Saenampechek, Channakhone; Salvignol, Grégory; Vallenet, David; Wu, Zaining; Marx, Christopher J.; Vorholt, Julia A.; Olson, Maynard V.; Kaul, Rajinder; Weissenbach, Jean; Médigue, Claudine; Lidstrom, Mary E.

2009-01-01

Background Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared. Methodology/Principal Findings The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name “island integration determinant” (iid). Conclusion/Significance These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic lifestyles. PMID:19440302
Functional clustering of time series gene expression data by Granger causality

PubMed Central

2012-01-01

Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425
Genome‐scale diversity and niche adaptation analysis of Lactococcus lactis by comparative genome hybridization using multi‐strain arrays

PubMed Central

Siezen, Roland J.; Bayjanov, Jumamurat R.; Felis, Giovanna E.; van der Sijde, Marijke R.; Starrenburg, Marjo; Molenaar, Douwe; Wels, Michiel; van Hijum, Sacha A. F. T.; van Hylckama Vlieg, Johan E. T.

2011-01-01

Summary Lactococcus lactis produces lactic acid and is widely used in the manufacturing of various fermented dairy products. However, the species is also frequently isolated from non‐dairy niches, such as fermented plant material. Recently, these non‐dairy strains have gained increasing interest, as they have been described to possess flavour‐forming activities that are rarely found in dairy isolates and have diverse metabolic properties. We performed an extensive whole‐genome diversity analysis on 39 L. lactis strains, isolated from dairy and plant sources. Comparative genome hybridization analysis with multi‐strain microarrays was used to assess presence or absence of genes and gene clusters in these strains, relative to all L. lactis sequences in public databases, whereby chromosomal and plasmid‐encoded genes were computationally analysed separately. Nearly 3900 chromosomal orthologous groups (chrOGs) were defined on basis of four sequenced chromosomes of L. lactis strains (IL1403, KF147, SK11, MG1363). Of these, 1268 chrOGs are present in at least 35 strains and represent the presently known core genome of L. lactis, and 72 chrOGs appear to be unique for L. lactis. Nearly 600 and 400 chrOGs were found to be specific for either the subspecies lactis or subspecies cremoris respectively. Strain variability was found in presence or absence of gene clusters related to growth on plant substrates, such as genes involved in the consumption of arabinose, xylan, α‐galactosides and galacturonate. Further niche‐specific differences were found in gene clusters for exopolysaccharides biosynthesis, stress response (iron transport, osmotolerance) and bacterial defence mechanisms (nisin biosynthesis). Strain variability of functions encoded on known plasmids included proteolysis, lactose fermentation, citrate uptake, metal ion resistance and exopolysaccharides biosynthesis. The present study supports the view of L. lactis as a species with a very flexible genome. PMID:21338475
Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes.

PubMed

Azevedo, Analice C; Bento, Cláudia B P; Ruiz, Jeronimo C; Queiroz, Marisa V; Mantovani, Hilário C

2015-10-01

Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Five unique open reading frames of infectious laryngotracheitis virus are expressed during infection but are dispensable for virus replication in cell culture.

PubMed

Veits, Jutta; Mettenleiter, Thomas C; Fuchs, Walter

2003-06-01

The chicken alphaherpesvirus infectious laryngotracheitis virus (ILTV) exhibits several unique genetic features including an internal inversion of a conserved part of the unique long genome region. At one end, this inversion is preceded by a cluster of five open reading frames (ORFs) of 335-411 codons, designated ORF A to ORF E, that are not present in any other known herpesvirus genome. In this report we analysed expression of these genes and identified the corresponding viral RNA and protein products. Northern blot analyses showed 3'-coterminal transcripts of ORFs A and B, and monocistronic mRNAs of ORFs C and D. ORF E is part of a 3'-coterminal transcription unit that includes the conserved glycoprotein H and thymidine kinase genes. Monospecific antisera obtained after immunization of rabbits with bacterial fusion proteins allowed detection of the protein products of ORF A (40 kDa), ORF B (34 kDa), ORF C (38 and 30 kDa), ORF D (41 kDa) and ORF E (44 kDa) in ILTV-infected cells. For functional analyses, five virus recombinants possessing deletions within the individual ORFs and concomitant insertions of a reporter gene cassette encoding green fluorescent protein were generated. All virus mutants were replication competent in cell culture, but exhibited reduced virus titres or plaque sizes when compared to wild-type ILTV. These findings indicate that the ILTV-specific ORF A to ORF E genes might be important for virus replication in the natural host organism.
Dynamic expression profiling of type I and type III interferon-stimulated hepatocytes reveals a stable hierarchy of gene expression.

PubMed

Bolen, Christopher R; Ding, Siyuan; Robek, Michael D; Kleinstein, Steven H

2014-04-01

Despite activating similar signaling cascades, the type I and type III interferons (IFNs) differ in their ability to antagonize virus replication. However, it is not clear whether these cytokines induce unique antiviral states, particularly in the liver, where the clinically important hepatitis B and C viruses cause persistent infection. Here, clustering and promoter analyses of microarray-based gene expression profiling were combined with mechanistic studies of signaling pathways to dynamically characterize the transcriptional responses induced by these cytokines in Huh7 hepatoma cells and primary human hepatocytes. Type I and III IFNs differed greatly in their level of interferon-stimulated gene (ISG) induction with a clearly detectable hierarchy (IFN-β > IFN-α > IFN-λ3 > IFN-λ1 > IFN-λ2). Notably, although the hierarchy identified varying numbers of differentially expressed genes when quantified using common statistical thresholds, further analysis of gene expression over multiple timepoints indicated that the individual IFNs do not in fact regulate unique sets of genes. The kinetic profiles of IFN-induced gene expression were also qualitatively similar with the important exception of IFN-α. While stimulation with either IFN-β or IFN-λs resulted in a similar long-lasting ISG induction, IFN-α signaling peaked early after stimulation then declined due to a negative feedback mechanism. The quantitative expression hierarchy and unique kinetics of IFN-α reveal potential specific roles for individual IFNs in the immune response, and elucidate the mechanism behind previously observed differences in IFN antiviral activity. While current clinical trials are focused on IFN-λ1 as a potential antiviral therapy, the finding that IFN-λ3 invariably possesses the highest activity among type III IFNs suggests that this cytokine may have superior clinical activity. © 2014 by the American Association for the Study of Liver Diseases.
Ribosome profiling reveals changes in translational status of soybean transcripts during immature cotyledon development

PubMed Central

Shamimuzzaman, Md.

2018-01-01

To understand translational capacity on a genome-wide scale across three developmental stages of immature soybean seed cotyledons, ribosome profiling was performed in combination with RNA sequencing and cluster analysis. Transcripts representing 216 unique genes demonstrated a higher level of translational activity in at least one stage by exhibiting higher translational efficiencies (TEs) in which there were relatively more ribosome footprint sequence reads mapping to the transcript than were present in the control total RNA sample. The majority of these transcripts were more translationally active at the early stage of seed development and included 12 unique serine or cysteine proteases and 16 2S albumin and low molecular weight cysteine-rich proteins that may serve as substrates for turnover and mobilization early in seed development. It would appear that the serine proteases and 2S albumins play a vital role in the early stages. In contrast, our investigation of profiles of 19 genes encoding high abundance seed storage proteins, such as glycinins, beta-conglycinins, lectin, and Kunitz trypsin inhibitors, showed that they all had similar patterns in which the TE values started at low levels and increased approximately 2 to 6-fold during development. The highest levels of these seed protein transcripts were found at the mid-developmental stage, whereas the highest ribosome footprint levels of only up to 1.6 TE were found at the late developmental stage. These experimental findings suggest that the major seed storage protein coding genes are primarily regulated at the transcriptional level during normal soybean cotyledon development. Finally, our analyses also identified a total of 370 unique gene models that showed very low TE values including over 48 genes encoding ribosomal family proteins and 95 gene models that are related to energy and photosynthetic functions, many of which have homology to the chloroplast genome. Additionally, we showed that genes of the chloroplast were relatively translationally inactive during seed development. PMID:29570733
Gene expression signatures differentiate ovarian/peritoneal serous carcinoma from breast carcinoma in effusions

PubMed Central

Davidson, Ben; Stavnes, Helene Tuft; Holth, Arild; Chen, Xu; Yang, Yanqin; Shih, Ie-Ming; Wang, Tian-Li

2011-01-01

Abstract Ovarian/primary peritoneal carcinoma and breast carcinoma are the gynaecological cancers that most frequently involve the serosal cavities. With the objective of improving on the limited diagnostic panel currently available for the differential diagnosis of these two malignancies, as well as to define tumour-specific biological targets, we compared their global gene expression patterns. Gene expression profiles of 10 serous ovarian/peritoneal and eight ductal breast carcinoma effusions were analysed using the HumanRef-8 BeadChip from Illumina. Differentially expressed candidate genes were validated using quantitative real-time PCR and immunohistochemistry. Unsupervised hierarchical clustering using all 54,675 genes in the array separated ovarian from breast carcinoma samples. We identified 288 unique probes that were significantly differentially expressed in the two cancers by greater than 3.5-fold, of which 81 and 207 were overexpressed in breast and ovarian/peritoneal carcinoma, respectively. SAM analysis identified 1078 differentially expressed probes with false discovery rate less than 0.05. Genes overexpressed in breast carcinoma included TFF1, TFF3, FOXA1, CA12, GATA3, SDC1, PITX1, TH, EHFD1, EFEMP1, TOB1 and KLF2. Genes overexpressed in ovarian/peritoneal carcinoma included SPON1, RBP1, MFGE8, TM4SF12, MMP7, KLK5/6/7, FOLR1/3, PAX8, APOL2 and NRCAM. The differential expression of 14 genes was validated by quantitative real-time PCR, and differences in 5 gene products were confirmed by immunohistochemistry. Expression profiling distinguishes ovarian/peritoneal carcinoma from breast carcinoma and identifies genes that are differentially expressed in these two tumour types. The molecular signatures unique to these cancers may facilitate their differential diagnosis and may provide a molecular basis for therapeutic target discovery. PMID:20132413

Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

PubMed

Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

2004-01-01

One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.
A beginner's guide to gene editing.

PubMed

Harrison, Patrick T; Hart, Stephen

2018-04-01

What is the topic of this review? This review summarizes the development of gene editing from early proof-of-concept studies in the 1980s to contemporary programmable and RNA-guided nucleases, which enable rapid and precise alteration of DNA sequences of almost any living cell. What advances does it highlight? With an average of one clustered regularly interspaced short palindromic repeat (CRISPR) Cas9 paper published every 4 h in 2017, this review cannot highlight all new developments, but a number of key improvements, including increases in efficiency, a range of new options to reduce off-target effects and plans for CRISPR to enter clinical trials in 2018, are discussed. Genome editing enables precise changes to be made in the genome of living cells. The technique was originally developed in the 1980s but largely limited to use in mice. The discovery that a targeted double-stranded break at a unique site in the genome, close to the site to be changed, could substantially increase the efficiency of editing raised the possibility of using the technique in a broader range of animal models and, potentially, human cells. But the challenge was to identify reagents that could create targeted breaks at a unique genomic location with minimal off-target effects. In 2005, the demonstration that programmable zinc finger nucleases (ZFNs) could perform this task led to a number of proof-of-concept studies, but a limitation was the ease with which effective ZFNs could be produced. In 2009, the development of TAL effector nucleases (TALENs) increased the specificity of gene editing and the ease of design and production. However, it was not until 2013 and the development of the clustered regularly interspaced short palindromic repeat (CRISPR) Cas9/guide RNA that gene editing became a research tool that any laboratory could use. © 2017 The Authors. Experimental Physiology © 2017 The Physiological Society.
Use of a genealogical database demonstrates heritability of pulmonary fibrosis.

PubMed

Scholand, Mary Beth; Coon, Hilary; Wolff, Roger; Cannon-Albright, Lisa

2013-10-01

Pulmonary fibrosis (PF) is a progressive fatal disease of unknown etiology. Identification of risk genes and pathways will enhance our understanding of this disease. Analysis of Utah genealogical resources has shown previously strong evidence for a genetic contribution to other disease, such as cancer. This approach has led to gene discovery in diseases, such as breast cancer and colon cancer and is used here for PF to quantify the heritability. We hypothesize that there is a heritable contribution to death from PF and use existing genealogic and death certificate data to examine patterns of relatedness amongst individuals who have died of PF. We analyzed familial clustering of individuals who died from PF using the Utah Population Database, a unique population-based genealogical resource that has been linked to death certificates dating from 1904. We identified 1,000 individuals with at least three generations of genealogy data and a cause of death documented as PF (cases). We estimated the relative risk (RR) of death from PF among the first-, second-, and third-degree relatives of cases. We also tested the hypothesis of excess relatedness among the cases by comparing the average pairwise relatedness of all cases to the average pair-wise relatedness of 1,000 sets of matched controls. We observed significantly increased risk for death from PF among the first- (RR = 4.69), second- (RR = 1.92), and third-degree relatives (RR = 1.14) of cases. The average relatedness of the 1,000 cases was significantly higher than the expected average relatedness of matched control sets (p < 0.001). When close (first- and second-degree) relationships were ignored, significantly increased relatedness remained (p = 0.002). Our results demonstrate significant clustering among both close and distant relatives, providing strong support for genetic contributions to death from PF. High-risk pedigrees derived from this unique resource may help identify new risk genes and gene pathways.
Expression of endogenous and foreign ribulose 1,5-bisphosphate carboxylase-oxygenase (RubisCO) genes in a RubisCO deletion mutant of Rhodobacter sphaeroides.

PubMed Central

Falcone, D L; Tabita, F R

1991-01-01

A Rhodobacter sphaeroides ribulose 1,5-bisphosphate carboxylase-oxygenase (RubisCO) deletion strain was constructed that was complemented by plasmids containing either the form I or form II CO2 fixation gene cluster. This strain was also complemented by genes encoding foreign RubisCO enzymes expressed from a Rhodospirillum rubrum RubisCO promoter. In R. sphaeroides, the R. rubrum promoter was regulated, resulting in variable levels of disparate RubisCO molecules under different growth conditions. Photosynthetic growth of the R. sphaeroides deletion strain complemented with cyanobacterial RubisCO revealed physiological properties reflective of the unique cellular environment of the cyanobacterial enzyme. The R. sphaeroides RubisCO deletion strain and R. rubrum promoter system may be used to assess the properties of mutagenized proteins in vivo, as well as provide a potential means to select for altered RubisCO molecules after random mutagenesis of entire genes or gene regions encoding RubisCO enzymes. Images PMID:1900508
GenCLiP 2.0: a web server for functional clustering of genes and construction of molecular networks based on free terms.

PubMed

Wang, Jia-Hong; Zhao, Ling-Feng; Lin, Pei; Su, Xiao-Rong; Chen, Shi-Jun; Huang, Li-Qiang; Wang, Hua-Feng; Zhang, Hai; Hu, Zhen-Fu; Yao, Kai-Tai; Huang, Zhong-Xi

2014-09-01

Identifying biological functions and molecular networks in a gene list and how the genes may relate to various topics is of considerable value to biomedical researchers. Here, we present a web-based text-mining server, GenCLiP 2.0, which can analyze human genes with enriched keywords and molecular interactions. Compared with other similar tools, GenCLiP 2.0 offers two unique features: (i) analysis of gene functions with free terms (i.e. any terms in the literature) generated by literature mining or provided by the user and (ii) accurate identification and integration of comprehensive molecular interactions from Medline abstracts, to construct molecular networks and subnetworks related to the free terms. http://ci.smu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

PubMed Central

2010-01-01

Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Expression of HOXB genes is significantly different in acute myeloid leukemia with a partial tandem duplication of MLL vs. a MLL translocation: a cross-laboratory study.

PubMed

Liu, Hsi-Che; Shih, Lee-Yung; May Chen, Mei-Ju; Wang, Chien-Chih; Yeh, Ting-Chi; Lin, Tung-Huei; Chen, Chien-Yu; Lin, Chih-Jen; Liang, Der-Cherng

2011-05-01

In acute myeloid leukemia (AML), the mixed lineage leukemia (MLL) gene may be rearranged to generate a partial tandem duplication (PTD), or fused to partner genes through a chromosomal translocation (tMLL). In this study, we first explored the differentially expressed genes between MLL-PTD and tMLL using gene expression profiling of our cohort (15 MLL-PTD and 10 tMLL) and one published data set. The top 250 probes were chosen from each set, resulting in 29 common probes (21 unique genes) to both sets. The selected genes include four HOXB genes, HOXB2, B3, B5, and B6. The expression values of these HOXB genes significantly differ between MLL-PTD and tMLL cases. Clustering and classification analyses were thoroughly conducted to support our gene selection results. Second, as MLL-PTD, FLT3-ITD, and NPM1 mutations are identified in AML with normal karyotypes, we briefly studied their impact on the HOXB genes. Another contribution of this study is to demonstrate that using public data from other studies enriches samples for analysis and yields more conclusive results. 2011 Elsevier Inc. All rights reserved.
Ortholog-based screening and identification of genes related to intracellular survival.

PubMed

Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

2018-04-20

Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.
A novel polyketide biosynthesis gene cluster is involved in fruiting body morphogenesis in the filamentous fungi Sordaria macrospora and Neurospora crassa.

PubMed

Nowrousian, Minou

2009-04-01

During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.
Nitrogen transporter and assimilation genes exhibit developmental stage-selective expression in maize (Zea mays L.) associated with distinct cis-acting promoter motifs.

PubMed

Liseron-Monfils, Christophe; Bi, Yong-Mei; Downs, Gregory S; Wu, Wenqing; Signorelli, Tara; Lu, Guangwen; Chen, Xi; Bondo, Eddie; Zhu, Tong; Lukens, Lewis N; Colasanti, Joseph; Rothstein, Steven J; Raizada, Manish N

2013-10-01

Nitrogen is considered the most limiting nutrient for maize (Zea mays L.), but there is limited understanding of the regulation of nitrogen-related genes during maize development. An Affymetrix 82K maize array was used to analyze the expression of ≤ 46 unique nitrogen uptake and assimilation probes in 50 maize tissues from seedling emergence to 31 d after pollination. Four nitrogen-related expression clusters were identified in roots and shoots corresponding to, or overlapping, juvenile, adult, and reproductive phases of development. Quantitative real time PCR data was consistent with the existence of these distinct expression clusters. Promoters corresponding to each cluster were screened for over-represented cis-acting elements. The 8-bp distal motif of the Arabidopsis 43-bp nitrogen response element (NRE) was over-represented in nitrogen-related maize gene promoters. This conserved motif, referred to here as NRE43-d8, was previously shown to be critical for nitrate-activated transcription of nitrate reductase (NIA1) and nitrite reductase (NIR1) by the NIN-LIKE PROTEIN 6 (NLP6) in Arabidopsis. Here, NRE43-d8 was over-represented in the promoters of maize nitrate and ammonium transporter genes, specifically those that showed peak expression during early-stage vegetative development. This result predicts an expansion of the NRE-NLP6 regulon and suggests that it may have a developmental component in maize. We also report leaf expression of putative orthologs of nitrite transporters (NiTR1), a transporter not previously reported in maize. We conclude by discussing how each of the four transcriptional modules may be responsible for the different nitrogen uptake and assimilation requirements of leaves and roots at different stages of maize development.
Legionella pneumophila strain associated with the first evidence of person-to-person transmission of Legionnaires’ disease: a unique mosaic genetic backbone

PubMed Central

Borges, Vítor; Nunes, Alexandra; Sampaio, Daniel A.; Vieira, Luís; Machado, Jorge; Simões, Maria J.; Gonçalves, Paulo; Gomes, João P.

2016-01-01

A first strong evidence of person-to-person transmission of Legionnaires’ Disease (LD) was recently reported. Here, we characterize the genetic backbone of this case-related Legionella pneumophila strain (“PtVFX/2014”), which also caused a large outbreak of LD. PtVFX/2014 is phylogenetically divergent from the most worldwide studied outbreak-associated L. pneumophila subspecies pneumophila serogroup 1 strains. In fact, this strain is also from serogroup 1, but belongs to the L. pneumophila subspecies fraseri. Its genomic mosaic backbone reveals eight horizontally transferred regions encompassing genes, for instance, involved in lipopolysaccharide biosynthesis or encoding virulence-associated Dot/Icm type IVB secretion system (T4BSS) substrates. PtVFX/2014 also inherited a rare ~65 kb pathogenicity island carrying virulence factors and detoxifying enzymes believed to contribute to the emergence of best-fitted strains in water reservoirs and in human macrophages, as well as a inter-species transferred (from L. oakridgensis) ~37.5 kb genomic island (harboring a lvh/lvr T4ASS cluster) that had never been found intact within L. pneumophila species. PtVFX/2014 encodes another lvh/lvr cluster near to CRISPR-associated genes, which may boost L. pneumophila transition from an environmental bacterium to a human pathogen. Overall, this unique genomic make-up may impact PtVFX/2014 ability to adapt to diverse environments, and, ultimately, to be transmitted and cause human disease. PMID:27196677
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

PubMed Central

Poole, William; Leinonen, Kalle; Shmulevich, Ilya

2017-01-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

PubMed

Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

2017-02-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Analysis of multiplex gene expression maps obtained by voxelation.

PubMed

An, Li; Xie, Hongbo; Chin, Mark H; Obradovic, Zoran; Smith, Desmond J; Megalooikonomou, Vasileios

2009-04-29

Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum. The experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists.
The ergot alkaloid gene cluster in Claviceps purpurea: extension of the cluster sequence and intra species evolution.

PubMed

Haarmann, Thomas; Machado, Caroline; Lübbe, Yvonne; Correia, Telmo; Schardl, Christopher L; Panaccione, Daniel G; Tudzynski, Paul

2005-06-01

The genomic region of Claviceps purpurea strain P1 containing the ergot alkaloid gene cluster [Tudzynski, P., Hölter, K., Correia, T., Arntz, C., Grammel, N., Keller, U., 1999. Evidence for an ergot alkaloid gene cluster in Claviceps purpurea. Mol. Gen. Genet. 261, 133-141] was explored by chromosome walking, and additional genes probably involved in the ergot alkaloid biosynthesis have been identified. The putative cluster sequence (extending over 68.5kb) contains 4 different nonribosomal peptide synthetase (NRPS) genes and several putative oxidases. Northern analysis showed that most of the genes were co-regulated (repressed by high phosphate), and identified probable flanking genes by lack of co-regulation. Comparison of the cluster sequences of strain P1, an ergotamine producer, with that of strain ECC93, an ergocristine producer, showed high conservation of most of the cluster genes, but significant variation in the NRPS modules, strongly suggesting that evolution of these chemical races of C. purpurea is determined by evolution of NRPS module specificity.
Macrophage Gene Expression Associated with Remodeling of the Prepartum Rat Cervix: Microarray and Pathway Analyses

PubMed Central

Dobyns, Abigail E.; Goyal, Ravi; Carpenter, Lauren Grisham; Freeman, Tom C.; Longo, Lawrence D.; Yellon, Steven M.

2015-01-01

As the critical gatekeeper for birth, prepartum remodeling of the cervix is associated with increased resident macrophages (Mφ), proinflammatory processes, and extracellular matrix degradation. This study tested the hypothesis that expression of genes unique to Mφs characterizes the prepartum from unremodeled nonpregnant cervix. Perfused cervix from prepartum day 21 postbreeding (D21) or nonpregnant (NP) rats, with or without Mφs, had RNA extracted and whole genome microarray analysis performed. By subtractive analyses, expression of 194 and 120 genes related to Mφs in the cervix from D21 rats were increased and decreased, respectively. In both D21 and NP groups, 158 and 57 Mφ genes were also more or less up- or down-regulated, respectively. Mφ gene expression patterns were most strongly correlated within groups and in 5 major clustering patterns. In the cervix from D21 rats, functional categories and canonical pathways of increased expression by Mφ gene related to extracellular matrix, cell proliferation, differentiation, as well as cell signaling. Pathways were characteristic of inflammation and wound healing, e.g., CD163, CD206, and CCR2. Signatures of only inflammation pathways, e.g., CSF1R, EMR1, and MMP12 were common to both D21 and NP groups. Thus, a novel and complex balance of Mφ genes and clusters differentiated the degraded extracellular matrix and cellular genomic activities in the cervix before birth from the unremodeled state. Predicted Mφ activities, pathways, and networks raise the possibility that expression patterns of specific genes characterize and promote prepartum remodeling of the cervix for parturition at term and with preterm labor. PMID:25811906
Platypus globin genes and flanking loci suggest a new insertional model for beta-globin evolution in birds and mammals.

PubMed

Patel, Vidushi S; Cooper, Steven J B; Deakin, Janine E; Fulton, Bob; Graves, Tina; Warren, Wesley C; Wilson, Richard K; Graves, Jennifer A M

2008-07-25

Vertebrate alpha (alpha)- and beta (beta)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the alpha- and beta-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil beta-globin gene (omega) in the marsupial alpha-cluster, however, suggested that duplication of the alpha-beta cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous alpha- and beta-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation. The platypus alpha-globin cluster (chromosome 21) contains embryonic and adult alpha- globin genes, a beta-like omega-globin gene, and the GBY globin gene with homology to cytoglobin, arranged as 5'-zeta-zeta'-alphaD-alpha3-alpha2-alpha1-omega-GBY-3'. The platypus beta-globin cluster (chromosome 2) contains single embryonic and adult globin genes arranged as 5'-epsilon-beta-3'. Surprisingly, all of these globin genes were expressed in some adult tissues. Comparison of flanking sequences revealed that all jawed vertebrate alpha-globin clusters are flanked by MPG-C16orf35 and LUC7L, whereas all bird and mammal beta-globin clusters are embedded in olfactory genes. Thus, the mammalian alpha- and beta-globin clusters are orthologous to the bird alpha- and beta-globin clusters respectively. We propose that alpha- and beta-globin clusters evolved from an ancient MPG-C16orf35-alpha-beta-GBY-LUC7L arrangement 410 million years ago. A copy of the original beta (represented by omega in marsupials and monotremes) was inserted into an array of olfactory genes before the amniote radiation (>315 million years ago), then duplicated and diverged to form orthologous clusters of beta-globin genes with different expression profiles in different lineages.
Integrated Computational Analysis of Genes Associated with Human Hereditary Insensitivity to Pain. A Drug Repurposing Perspective

PubMed Central

Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred

2017-01-01

Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388
Ontology based molecular signatures for immune cell types via gene expression analysis

PubMed Central

2013-01-01

Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649
Comparative Genomics Identifies Epidermal Proteins Associated with the Evolution of the Turtle Shell.

PubMed

Holthaus, Karin Brigit; Strasser, Bettina; Sipos, Wolfgang; Schmidt, Heiko A; Mlitz, Veronika; Sukseree, Supawadee; Weissenbacher, Anton; Tschachler, Erwin; Alibardi, Lorenzo; Eckhart, Leopold

2016-03-01

The evolution of reptiles, birds, and mammals was associated with the origin of unique integumentary structures. Studies on lizards, chicken, and humans have suggested that the evolution of major structural proteins of the outermost, cornified layers of the epidermis was driven by the diversification of a gene cluster called Epidermal Differentiation Complex (EDC). Turtles have evolved unique defense mechanisms that depend on mechanically resilient modifications of the epidermis. To investigate whether the evolution of the integument in these reptiles was associated with specific adaptations of the sequences and expression patterns of EDC-related genes, we utilized newly available genome sequences to determine the epidermal differentiation gene complement of turtles. The EDC of the western painted turtle (Chrysemys picta bellii) comprises more than 100 genes, including at least 48 genes that encode proteins referred to as beta-keratins or corneous beta-proteins. Several EDC proteins have evolved cysteine/proline contents beyond 50% of total amino acid residues. Comparative genomics suggests that distinct subfamilies of EDC genes have been expanded and partly translocated to loci outside of the EDC in turtles. Gene expression analysis in the European pond turtle (Emys orbicularis) showed that EDC genes are differentially expressed in the skin of the various body sites and that a subset of beta-keratin genes within the EDC as well as those located outside of the EDC are expressed predominantly in the shell. Our findings give strong support to the hypothesis that the evolutionary innovation of the turtle shell involved specific molecular adaptations of epidermal differentiation. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering

PubMed Central

Shi, Jiejun; Qin, Li-Xuan

2014-01-01

We report a new R package implementing the clustering of regression models (CORM) method for clustering genes using gene expression data and provide data examples illustrating each clustering function in the package. The CORM package is freely available at CRAN from http://cran.r-project.org. PMID:25452684
Clustering approaches to identifying gene expression patterns from DNA microarray data.

PubMed

Do, Jin Hwan; Choi, Dong-Kug

2008-04-30

The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.
Conditions for the Evolution of Gene Clusters in Bacterial Genomes

PubMed Central

Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.

2010-01-01

Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992
Identification and Functional Analysis of the Nocardithiocin Gene Cluster in Nocardia pseudobrasiliensis

PubMed Central

Sakai, Kanae; Komaki, Hisayuki; Gonoi, Tohru

2015-01-01

Nocardithiocin is a thiopeptide compound isolated from the opportunistic pathogen Nocardia pseudobrasiliensis. It shows a strong activity against acid-fast bacteria and is also active against rifampicin-resistant Mycobacterium tuberculosis. Here, we report the identification of the nocardithiocin gene cluster in N. pseudobrasiliensis IFM 0761 based on conserved thiopeptide biosynthesis gene sequence and the whole genome sequence. The predicted gene cluster was confirmed by gene disruption and complementation. As expected, strains containing the disrupted gene did not produce nocardithiocin while gene complementation restored nocardithiocin production in these strains. The predicted cluster was further analyzed using RNA-seq which showed that the nocardithiocin gene cluster contains 12 genes within a 15.2-kb region. This finding will promote the improvement of nocardithiocin productivity and its derivatives production. PMID:26588225
Improved efficiency in amplification of Escherichia coli o-antigen gene clusters using genome-wide sequence comparison

USDA-ARS?s Scientific Manuscript database

Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...
Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

PubMed Central

Skinnider, Michael A.; Dejong, Chris A.; Rees, Philip N.; Johnston, Chad W.; Li, Haoxin; Webster, Andrew L. H.; Wyatt, Morgan A.; Magarvey, Nathan A.

2015-01-01

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/. PMID:26442528
Investigation of terpene diversification across multiple sequenced plant genomes

PubMed Central

Boutanaev, Alexander M.; Moses, Tessa; Zi, Jiachen; Nelson, David R.; Mugford, Sam T.; Peters, Reuben J.; Osbourn, Anne

2015-01-01

Plants produce an array of specialized metabolites, including chemicals that are important as medicines, flavors, fragrances, pigments and insecticides. The vast majority of this metabolic diversity is untapped. Here we take a systematic approach toward dissecting genetic components of plant specialized metabolism. Focusing on the terpenes, the largest class of plant natural products, we investigate the basis of terpene diversity through analysis of multiple sequenced plant genomes. The primary drivers of terpene diversification are terpenoid synthase (TS) “signature” enzymes (which generate scaffold diversity), and cytochromes P450 (CYPs), which modify and further diversify these scaffolds, so paving the way for further downstream modifications. Our systematic search of sequenced plant genomes for all TS and CYP genes reveals that distinct TS/CYP gene pairs are found together far more commonly than would be expected by chance, and that certain TS/CYP pairings predominate, providing signals for key events that are likely to have shaped terpene diversity. We recover TS/CYP gene pairs for previously characterized terpene metabolic gene clusters and demonstrate new functional pairing of TSs and CYPs within previously uncharacterized clusters. Unexpectedly, we find evidence for different mechanisms of pathway assembly in eudicots and monocots; in the former, microsyntenic blocks of TS/CYP gene pairs duplicate and provide templates for the evolution of new pathways, whereas in the latter, new pathways arise by mixing and matching of individual TS and CYP genes through dynamic genome rearrangements. This is, to our knowledge, the first documented observation of the unique pattern of TS and CYP assembly in eudicots and monocots. PMID:25502595
Role and Regulation of the Flp/Tad Pilus in the Virulence of Pectobacterium atrosepticum SCRI1043 and Pectobacterium wasabiae SCC3193

PubMed Central

Nykyri, Johanna; Mattinen, Laura; Niemi, Outi; Adhikari, Satish; Kõiv, Viia; Somervuo, Panu; Fang, Xin; Auvinen, Petri; Mäe, Andres; Palva, E. Tapio; Pirhonen, Minna

2013-01-01

In this study, we characterized a putative Flp/Tad pilus-encoding gene cluster, and we examined its regulation at the transcriptional level and its role in the virulence of potato pathogenic enterobacteria of the genus Pectobacterium. The Flp/Tad pilus-encoding gene clusters in Pectobacterium atrosepticum, Pectobacterium wasabiae and Pectobacterium aroidearum were compared to previously characterized flp/tad gene clusters, including that of the well-studied Flp/Tad pilus model organism Aggregatibacter actinomycetemcomitans, in which this pilus is a major virulence determinant. Comparative analyses revealed substantial protein sequence similarity and open reading frame synteny between the previously characterized flp/tad gene clusters and the cluster in Pectobacterium, suggesting that the predicted flp/tad gene cluster in Pectobacterium encodes a Flp/Tad pilus-like structure. We detected genes for a novel two-component system adjacent to the flp/tad gene cluster in Pectobacterium, and mutant analysis demonstrated that this system has a positive effect on the transcription of selected Flp/Tad pilus biogenesis genes, suggesting that this response regulator regulate the flp/tad gene cluster. Mutagenesis of either the predicted regulator gene or selected Flp/Tad pilus biogenesis genes had a significant impact on the maceration ability of the bacterial strains in potato tubers, indicating that the Flp/Tad pilus-encoding gene cluster represents a novel virulence determinant in Pectobacterium. Soft-rot enterobacteria in the genera Pectobacterium and Dickeya are of great agricultural importance, and an investigation of the virulence of these pathogens could facilitate improvements in agricultural practices, thus benefiting farmers, the potato industry and consumers. PMID:24040039
Role and regulation of the Flp/Tad pilus in the virulence of Pectobacterium atrosepticum SCRI1043 and Pectobacterium wasabiae SCC3193.

PubMed

Nykyri, Johanna; Mattinen, Laura; Niemi, Outi; Adhikari, Satish; Kõiv, Viia; Somervuo, Panu; Fang, Xin; Auvinen, Petri; Mäe, Andres; Palva, E Tapio; Pirhonen, Minna

2013-01-01

In this study, we characterized a putative Flp/Tad pilus-encoding gene cluster, and we examined its regulation at the transcriptional level and its role in the virulence of potato pathogenic enterobacteria of the genus Pectobacterium. The Flp/Tad pilus-encoding gene clusters in Pectobacterium atrosepticum, Pectobacterium wasabiae and Pectobacterium aroidearum were compared to previously characterized flp/tad gene clusters, including that of the well-studied Flp/Tad pilus model organism Aggregatibacter actinomycetemcomitans, in which this pilus is a major virulence determinant. Comparative analyses revealed substantial protein sequence similarity and open reading frame synteny between the previously characterized flp/tad gene clusters and the cluster in Pectobacterium, suggesting that the predicted flp/tad gene cluster in Pectobacterium encodes a Flp/Tad pilus-like structure. We detected genes for a novel two-component system adjacent to the flp/tad gene cluster in Pectobacterium, and mutant analysis demonstrated that this system has a positive effect on the transcription of selected Flp/Tad pilus biogenesis genes, suggesting that this response regulator regulate the flp/tad gene cluster. Mutagenesis of either the predicted regulator gene or selected Flp/Tad pilus biogenesis genes had a significant impact on the maceration ability of the bacterial strains in potato tubers, indicating that the Flp/Tad pilus-encoding gene cluster represents a novel virulence determinant in Pectobacterium. Soft-rot enterobacteria in the genera Pectobacterium and Dickeya are of great agricultural importance, and an investigation of the virulence of these pathogens could facilitate improvements in agricultural practices, thus benefiting farmers, the potato industry and consumers.
An effective fuzzy kernel clustering analysis approach for gene expression data.

PubMed

Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

2015-01-01

Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Regulatory Feedback Loop of Two phz Gene Clusters through 5′-Untranslated Regions in Pseudomonas sp. M18

PubMed Central

Li, Yaqian; Du, Xilin; Lu, Zhi John; Wu, Daqiang; Zhao, Yilei; Ren, Bin; Huang, Jiaofang; Huang, Xianqing; Xu, Yuhong; Xu, Yuquan

2011-01-01

Background Phenazines are important compounds produced by pseudomonads and other bacteria. Two phz gene clusters called phzA1-G1 and phzA2-G2, respectively, were found in the genome of Pseudomonas sp. M18, an effective biocontrol agent, which is highly homologous to the opportunistic human pathogen P. aeruginosa PAO1, however little is known about the correlation between the expressions of two phz gene clusters. Methodology/Principal Findings Two chromosomal insertion inactivated mutants for the two gene clusters were constructed respectively and the correlation between the expressions of two phz gene clusters was investigated in strain M18. Phenazine-1-carboxylic acid (PCA) molecules produced from phzA2-G2 gene cluster are able to auto-regulate expression itself and activate the expression of phzA1-G1 gene cluster in a circulated amplification pattern. However, the post-transcriptional expression of phzA1-G1 transcript was blocked principally through 5′-untranslated region (UTR). In contrast, the phzA2-G2 gene cluster was transcribed to a lesser extent and translated efficiently and was negatively regulated by the GacA signal transduction pathway, mainly at a post-transcriptional level. Conclusions/Significance A single molecule, PCA, produced in different quantities by the two phz gene clusters acted as the functional mediator and the two phz gene clusters developed a specific regulatory mechanism which acts through 5′-UTR to transfer a single, but complex bacterial signaling event in Pseudomonas sp. strain M18. PMID:21559370
Differential regulation of ParaHox genes by retinoic acid in the invertebrate chordate amphioxus (Branchiostoma floridae).

PubMed

Osborne, Peter W; Benoit, Gérard; Laudet, Vincent; Schubert, Michael; Ferrier, David E K

2009-03-01

The ParaHox cluster is the evolutionary sister to the Hox cluster. Like the Hox cluster, the ParaHox cluster displays spatial and temporal regulation of the component genes along the anterior/posterior axis in a manner that correlates with the gene positions within the cluster (a feature called collinearity). The ParaHox cluster is however a simpler system to study because it is composed of only three genes. We provide a detailed analysis of the amphioxus ParaHox cluster and, for the first time in a single species, examine the regulation of the cluster in response to a single developmental signalling molecule, retinoic acid (RA). Embryos treated with either RA or RA antagonist display altered ParaHox gene expression: AmphiGsx expression shifts in the neural tube, and the endodermal boundary between AmphiXlox and AmphiCdx shifts its anterior/posterior position. We identified several putative retinoic acid response elements and in vitro assays suggest some may participate in RA regulation of the ParaHox genes. By comparison to vertebrate ParaHox gene regulation we explore the evolutionary implications. This work highlights how insights into the regulation and evolution of more complex vertebrate arrangements can be obtained through studies of a simpler, unduplicated amphioxus gene cluster.
A Stationary Wavelet Entropy-Based Clustering Approach Accurately Predicts Gene Expression

PubMed Central

Nguyen, Nha; Vo, An; Choi, Inchan

2015-01-01

Abstract Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation. PMID:25383910
LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.

PubMed

Wang, Dapeng; Zhang, Yubin; Fan, Zhonghua; Liu, Guiming; Yu, Jun

2012-01-01

Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database-LCGbase (a comprehensive database for lineage-based co-regulated genes)-hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.
Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma

PubMed Central

Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

2007-01-01

Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825
Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma.

PubMed

Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

2007-12-30

Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.
Evolutionary patchwork of an insecticidal toxin shared between plant-associated pseudomonads and the insect pathogens Photorhabdus and Xenorhabdus.

PubMed

Ruffner, Beat; Péchy-Tarr, Maria; Höfte, Monica; Bloemberg, Guido; Grunder, Jürg; Keel, Christoph; Maurhofer, Monika

2015-08-16

Root-colonizing fluorescent pseudomonads are known for their excellent abilities to protect plants against soil-borne fungal pathogens. Some of these bacteria produce an insecticidal toxin (Fit) suggesting that they may exploit insect hosts as a secondary niche. However, the ecological relevance of insect toxicity and the mechanisms driving the evolution of toxin production remain puzzling. Screening a large collection of plant-associated pseudomonads for insecticidal activity and presence of the Fit toxin revealed that Fit is highly indicative of insecticidal activity and predicts that Pseudomonas protegens and P. chlororaphis are exclusive Fit producers. A comparative evolutionary analysis of Fit toxin-producing Pseudomonas including the insect-pathogenic bacteria Photorhabdus and Xenorhadus, which produce the Fit related Mcf toxin, showed that fit genes are part of a dynamic genomic region with substantial presence/absence polymorphism and local variation in GC base composition. The patchy distribution and phylogenetic incongruence of fit genes indicate that the Fit cluster evolved via horizontal transfer, followed by functional integration of vertically transmitted genes, generating a unique Pseudomonas-specific insect toxin cluster. Our findings suggest that multiple independent evolutionary events led to formation of at least three versions of the Mcf/Fit toxin highlighting the dynamic nature of insect toxin evolution.
Simultaneous Production of Anabaenopeptins and Namalides by the Cyanobacterium Nostoc sp. CENA543.

PubMed

Shishido, Tânia K; Jokela, Jouni; Fewer, David P; Wahlsten, Matti; Fiore, Marli F; Sivonen, Kaarina

2017-11-17

Anabaenopeptins are a diverse group of cyclic peptides, which contain an unusual ureido linkage. Namalides are shorter structural homologues of anabaenopeptins, which also contain an ureido linkage. The biosynthetic origins of namalides are unknown despite a strong resemblance to anabaenopeptins. Here, we show the cyanobacterium Nostoc sp. CENA543 strain producing new (nostamide B-E (2, 4, 5, and 6)) and known variants of anabaenopeptins (schizopeptin 791 (1) and anabaenopeptin 807 (3)). Surprisingly, Nostoc sp. CENA543 also produced namalide B (8) and the new namalides D (7), E (9), and F (10) in similar amounts to anabaenopeptins. Analysis of the complete Nostoc sp. CENA543 genome sequence indicates that both anabaenopeptins and namalides are produced by the same biosynthetic pathway through module skipping during biosynthesis. This unique process involves the skipping of two modules present in different nonribosomal peptide synthetases during the namalide biosynthesis. This skipping is an efficient mechanism since both anabaenopeptins and namalides are synthesized in similar amounts by Nostoc sp. CENA543. Consequently, gene skipping may be used to increase and possibly broaden the chemical diversity of related peptides produced by a single biosynthetic gene cluster. Genome mining demonstrated that the anabaenopeptin gene clusters are widespread in cyanobacteria and can also be found in tectomicrobia bacteria.
454 Pyrosequencing of Olive (Olea europaea L.) Transcriptome in Response to Salinity

PubMed Central

Bazakos, Christos; Manioudaki, Maria E.; Sarropoulou, Elena; Spano, Thodhoraq; Kalaitzis, Panagiotis

2015-01-01

Olive (Olea europaea L.) is one of the most important crops in the Mediterranean region. The expansion of cultivation in areas irrigated with low quality and saline water has negative effects on growth and productivity however the investigation of the molecular basis of salt tolerance in olive trees has been only recently initiated. To this end, we investigated the molecular response of cultivar Kalamon to salinity stress using next-generation sequencing technology to explore the transcriptome profile of olive leaves and roots and identify differentially expressed genes that are related to salt tolerance response. Out of 291,958 obtained trimmed reads, 28,270 unique transcripts were identified of which 35% are annotated, a percentage that is comparable to similar reports on non-model plants. Among the 1,624 clusters in roots that comprise more than one read, 24 were differentially expressed comprising 9 down- and 15 up-regulated genes. Respectively, inleaves, among the 2,642 clusters, 70 were identified as differentially expressed, with 14 down- and 56 up-regulated genes. Using next-generation sequencing technology we were able to identify salt-response-related transcripts. Furthermore we provide an annotated transcriptome of olive as well as expression data, which are both significant tools for further molecular studies in olive. PMID:26576008
454 Pyrosequencing of Olive (Olea europaea L.) Transcriptome in Response to Salinity.

PubMed

Bazakos, Christos; Manioudaki, Maria E; Sarropoulou, Elena; Spano, Thodhoraq; Kalaitzis, Panagiotis

2015-01-01

Olive (Olea europaea L.) is one of the most important crops in the Mediterranean region. The expansion of cultivation in areas irrigated with low quality and saline water has negative effects on growth and productivity however the investigation of the molecular basis of salt tolerance in olive trees has been only recently initiated. To this end, we investigated the molecular response of cultivar Kalamon to salinity stress using next-generation sequencing technology to explore the transcriptome profile of olive leaves and roots and identify differentially expressed genes that are related to salt tolerance response. Out of 291,958 obtained trimmed reads, 28,270 unique transcripts were identified of which 35% are annotated, a percentage that is comparable to similar reports on non-model plants. Among the 1,624 clusters in roots that comprise more than one read, 24 were differentially expressed comprising 9 down- and 15 up-regulated genes. Respectively, inleaves, among the 2,642 clusters, 70 were identified as differentially expressed, with 14 down- and 56 up-regulated genes. Using next-generation sequencing technology we were able to identify salt-response-related transcripts. Furthermore we provide an annotated transcriptome of olive as well as expression data, which are both significant tools for further molecular studies in olive.

Lincomycin Biosynthesis Involves a Tyrosine Hydroxylating Heme Protein of an Unusual Enzyme Family

PubMed Central

Novotna, Jitka; Olsovska, Jana; Novak, Petr; Mojzes, Peter; Chaloupkova, Radka; Kamenik, Zdenek; Spizek, Jaroslav; Kutejova, Eva; Mareckova, Marketa; Tichy, Pavel; Damborsky, Jiri; Janata, Jiri

2013-01-01

The gene lmbB2 of the lincomycin biosynthetic gene cluster of Streptomyces lincolnensis ATCC 25466 was shown to code for an unusual tyrosine hydroxylating enzyme involved in the biosynthetic pathway of this clinically important antibiotic. LmbB2 was expressed in Escherichia coli, purified near to homogeneity and shown to convert tyrosine to 3,4-dihydroxyphenylalanine (DOPA). In contrast to the well-known tyrosine hydroxylases (EC 1.14.16.2) and tyrosinases (EC 1.14.18.1), LmbB2 was identified as a heme protein. Mass spectrometry and Soret band-excited Raman spectroscopy of LmbB2 showed that LmbB2 contains heme b as prosthetic group. The CO-reduced differential absorption spectra of LmbB2 showed that the coordination of Fe was different from that of cytochrome P450 enzymes. LmbB2 exhibits sequence similarity to Orf13 of the anthramycin biosynthetic gene cluster, which has recently been classified as a heme peroxidase. Tyrosine hydroxylating activity of LmbB2 yielding DOPA in the presence of (6R)-5,6,7,8-tetrahydro-L-biopterin (BH4) was also observed. Reaction mechanism of this unique heme peroxidases family is discussed. Also, tyrosine hydroxylation was confirmed as the first step of the amino acid branch of the lincomycin biosynthesis. PMID:24324587
A cluster merging method for time series microarray with production values.

PubMed

Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

2014-09-01

A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.
Bacterial CRISPR Regions: General Features and their Potential for Epidemiological Molecular Typing Studies.

PubMed

Karimi, Zahra; Ahmadi, Ali; Najafi, Ali; Ranjbar, Reza

2018-01-01

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci as novel and applicable regions in prokaryotic genomes have gained great attraction in the post genomics era. These unique regions are diverse in number and sequence composition in different pathogenic bacteria and thereby can be a suitable candidate for molecular epidemiology and genotyping studies. Results:Furthermore, the arrayed structure of CRISPR loci (several unique repeats spaced with the variable sequence) and associated cas genes act as an active prokaryotic immune system against viral replication and conjugative elements. This property can be used as a tool for RNA editing in bioengineering studies. The aim of this review was to survey some details about the history, nature, and potential applications of CRISPR arrays in both genetic engineering and bacterial genotyping studies.
Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.

PubMed

Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J

2006-01-01

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.
Biosynthetic Investigations of Lactonamycin and Lactonamycin Z: Cloning of the Biosynthetic Gene Clusters and Discovery of an Unusual Starter Unit▿ †

PubMed Central

Zhang, Xiujun; Alemany, Lawrence B.; Fiedler, Hans-Peter; Goodfellow, Michael; Parry, Ronald J.

2008-01-01

The antibiotics lactonamycin and lactonamycin Z provide attractive leads for antibacterial drug development. Both antibiotics contain a novel aglycone core called lactonamycinone. To gain insight into lactonamycinone biosynthesis, cloning and precursor incorporation experiments were undertaken. The lactonamycin gene cluster was initially cloned from Streptomyces rishiriensis. Sequencing of ca. 61 kb of S. rishiriensis DNA revealed the presence of 57 open reading frames. These included genes coding for the biosynthesis of l-rhodinose, the sugar found in lactonamycin, and genes similar to those in the tetracenomycin biosynthetic gene cluster. Since lactonamycin production by S. rishiriensis could not be sustained, additional proof for the identity of the S. rishiriensis cluster was obtained by cloning the lactonamycin Z gene cluster from Streptomyces sanglieri. Partial sequencing of the S. sanglieri cluster revealed 15 genes that exhibited a very high degree of similarity to genes within the lactonamycin cluster, as well as an identical organization. Double-crossover disruption of one gene in the S. sanglieri cluster abolished lactonamycin Z production, and production was restored by complementation. These results confirm the identity of the genetic locus cloned from S. sanglieri and indicate that the highly similar locus in S. rishiriensis encodes lactonamycin biosynthetic genes. Precursor incorporation experiments with S. sanglieri revealed that lactonamycinone is biosynthesized in an unusual manner whereby glycine or a glycine derivative serves as a starter unit that is extended by nine acetate units. Analysis of the gene clusters and of the precursor incorporation data suggested a hypothetical scheme for lactonamycinone biosynthesis. PMID:18070976
Altered gene expression of the innate immune, neuroendocrine, and nuclear factor-kappa B (NF-κB) systems is associated with posttraumatic stress disorder in military personnel.

PubMed

Guardado, Pedro; Olivera, Anlys; Rusch, Heather L; Roy, Michael; Martin, Christiana; Lejbman, Natasha; Lee, Hwyunhwa; Gill, Jessica M

2016-03-01

Whole transcriptome analysis provides an unbiased examination of biological activity, and likely, unique insight into the mechanisms underlying posttraumatic stress disorder (PTSD) and comorbid depression and traumatic brain injury. This study compared gene-expression profiles in military personnel with PTSD (n=28) and matched controls without PTSD (n=27) using HG-U133 Plus 2.0 microarrays (Affymetrix), which contain 54,675 probe sets representing more than 38,500 genes. Analysis of expression profiles revealed 203 differentially expressed genes in PTSD, of which 72% were upregulated. Using Partek Genomics Suite 6.6, differentially expressed transcription clusters were filtered based on a selection criterion of ≥1.5 relative fold change at a false discovery rate of ≤5%. Ingenuity Pathway Analysis (Qiagen) of the differentially expressed genes indicated a dysregulation of genes associated with the innate immune, neuroendocrine, and NF-κB systems. These findings provide novel insights that may lead to new pharmaceutical agents for PTSD treatments and help mitigate mental and physical comorbidity risk. Copyright © 2016. Published by Elsevier Ltd.
BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-Expression Analysis.

PubMed

Wang, Duolin; Wang, Juexin; Jiang, Yuexu; Liang, Yanchun; Xu, Dong

2017-02-03

Comparing the gene-expression profiles between biological conditions is useful for understanding gene regulation underlying complex phenotypes. Along this line, analysis of differential co-expression (DC) has gained attention in the recent years, where genes under one condition have different co-expression patterns compared with another. We developed an R package Bayes Factor approach for Differential Co-expression Analysis (BFDCA) for DC analysis. BFDCA is unique in integrating various aspects of DC patterns (including Shift, Cross, and Re-wiring) into one uniform Bayes factor. We tested BFDCA using simulation data and experimental data. Simulation results indicate that BFDCA outperforms existing methods in accuracy and robustness of detecting DC pairs and DC modules. Results of using experimental data suggest that BFDCA can cluster disease-related genes into functional DC subunits and estimate the regulatory impact of disease-related genes well. BFDCA also achieves high accuracy in predicting case-control phenotypes by using significant DC gene pairs as markers. BFDCA is publicly available at http://dx.doi.org/10.17632/jdz4vtvnm3.1. Copyright © 2016 Elsevier Ltd. All rights reserved.
Anhydrobiosis vs. aging: comparative genomics of protein repair L-isoaspartyl methyltransferases in the sleeping chironomid. .

NASA Astrophysics Data System (ADS)

Gusev, Oleg; Kikawada, Takahiro; Shagimardanova, Elena; Suetsugu, Yoshitaka; Ayupov, Rustam

Origin of anhydrobiosis in the larvae of the sleeping chironomid Polypedilum vanderplanki represents unique example of set of evolutionary events in a single species, resulted in acquiring new ability allowing survival in extremely changeable environment. Complex comparative analysis of the genome of P. vanderplanki resulted in discovery of a set of features, including existence of the set of unique clusters of genes contributing in desiccation resistance. Surprisingly, in several cases, the genes mainly contributing to the formation of the molecular shield in the larvae are sleeping chironomid-specific and have no homology with genes from other insects, including P. nubifer - a chironomid from the same genus. Protein L-isoaspartyl methyltransferase (PIMT) acts on proteins that have been non-enzymatically damaged due to age, and partially restores aspartic residues, extending life of the polypeptides. PIMT a highly conserved enzyme present in nearly all eukaryotes, and microorganisms mostly in a single copy (or in a few isoforms in certain plants and some bacteria). While conducting a comparative analysis of the genomes of two chironomid midge species different in their ability to stand complete water loss, we have noticed that structure and number of PIMT-coding genes in the desiccation resistant (anhydrobiotic) midge (Polypedilum vanderplanki, Pv) is different from those of the common desiccation-sensitive midge (Polypedilum nubifer, Pn) and the rest of insects. Both species have a clear orthologous PIMT shared by all insects. At the same time, in contrast to Pn which has only one PIMT gene (PnPimt-1), the Pv genome contains 12 additional genes paralogous to Pimt1 (PvPimt-2-12) presumably coding functional PIMT proteins, which are arranged in a single cluster. Remarkably, PvPimt-1 location in the Pv is different from the rest of Pimt-like genes. PvPimt-1 gene is ubiquitously expressed during the life cycle, but expression of the PvPimt2-12 is limited to the eggs and larval stages. Finally, the expression of Pimt1 gene in both chironomids was not changed in response to desiccation, while the clustered PvPimt2-12 showed strong up-regulation in response to water loss and other abiotic stresses. The abundance of PvPimt2-12 mRNAs was maximal in anhydrobiotic larvae, and it resembles the case of plant seeds where accumulation of PIMT provides additional protection for proteins during long dry storage. Predicted proteins of PvPimT2-12 contain conservative L-isoaspartyl methyltransferase functional domain. At the same time the length and structure of N- and C- terminals of the predicted proteins show significant variation, suggesting different substrate preferences or other specific properties of different Pv-PIMT Furthermore, the multi-member family in Pv is the first observation of drastic expansion and evolution of Pimt genes in general, and particularly in a single insect species. This work was supported by Russian Foundation for Basic Research (No. 12-08-33157 mol_a_ved and No. 14-04-01657_A).
Detailed dissection of the chromosomal region containing the Ph1 locus in wheat Triticum aestivum: with deletion mutants and expression profiling.

PubMed

Al-Kaff, Nadia; Knight, Emilie; Bertin, Isabelle; Foote, Tracie; Hart, Nicola; Griffiths, Simon; Moore, Graham

2008-04-01

Understanding Ph1, a dominant homoeologous chromosome pairing suppressor locus on the long arm of chromosome 5B in wheat Triticum aestivum L., is the core of the investigation in this article. The Ph1 locus restricts chromosome pairing and recombination at meiosis to true homologues. The importance of wheat as a crop and the need to exploit its wild relatives as donors for economically important traits in wheat breeding programmes is the main drive to uncover the mechanism of the Ph1 locus and regulate its activity. Following the molecular genetic characterization of the Ph1 locus, five additional deletion mutants covering the region have been identified. In addition, more bacterial artificial chromosomes (BACs) were sequenced and analysed to elucidate the complexity of this locus. A semi-quantitative RT-PCR was used to compare the expression profiles of different genes in the 5B region containing the Ph1 locus with their homoeologues on 5A and 5D. PCR products were cloned and sequenced to identify the gene from which they were derived. Deletion mutants and expression profiling of genes in the region containing the Ph1 locus on 5B has further restricted Ph1 to a cluster of cdk-like genes. Bioinformatic analysis of the cdk-like genes revealed their close homology to the checkpoint kinase Cdk2 from humans. Cdk2 is involved in the initiation of replication and is required in early meiosis. Expression profiling has revealed that the cdk-like gene cluster is unique within the region analysed on 5B in that these genes are transcribed. Deletion of the cdk-like locus on 5B results in activation of transcription of functional cdk-like copies on 5A and 5D. Thus the cdk locus on 5B is dominant to those on 5A and 5D in determining the overall activity, which will be dependent on a complex interplay between transcription from non-functional and functional cdk-like genes. The Ph1 locus has been defined to a cdk-like gene cluster related to Cdk2 in humans, a master checkpoint gene involved in the initiation of replication and required for early meiosis.
Establishment of the Inducible Tet-On System for the Activation of the Silent Trichosetin Gene Cluster in Fusarium fujikuroi

PubMed Central

Janevska, Slavica; Arndt, Birgit; Baumann, Leonie; Apken, Lisa Helene; Mauriz Marques, Lucas Maciel; Humpf, Hans-Ulrich; Tudzynski, Bettina

2017-01-01

The PKS-NRPS-derived tetramic acid equisetin and its N-desmethyl derivative trichosetin exhibit remarkable biological activities against a variety of organisms, including plants and bacteria, e.g., Staphylococcus aureus. The equisetin biosynthetic gene cluster was first described in Fusarium heterosporum, a species distantly related to the notorious rice pathogen Fusarium fujikuroi. Here we present the activation and characterization of a homologous, but silent, gene cluster in F. fujikuroi. Bioinformatic analysis revealed that this cluster does not contain the equisetin N-methyltransferase gene eqxD and consequently, trichosetin was isolated as final product. The adaption of the inducible, tetracycline-dependent Tet-on promoter system from Aspergillus niger achieved a controlled overproduction of this toxic metabolite and a functional characterization of each cluster gene in F. fujikuroi. Overexpression of one of the two cluster-specific transcription factor (TF) genes, TF22, led to an activation of the three biosynthetic cluster genes, including the PKS-NRPS key gene. In contrast, overexpression of TF23, encoding a second Zn(II)2Cys6 TF, did not activate adjacent cluster genes. Instead, TF23 was induced by the final product trichosetin and was required for expression of the transporter-encoding gene MFS-T. TF23 and MFS-T likely act in consort and contribute to detoxification of trichosetin and therefore, self-protection of the producing fungus. PMID:28379186
Establishment of the Inducible Tet-On System for the Activation of the Silent Trichosetin Gene Cluster in Fusarium fujikuroi.

PubMed

Janevska, Slavica; Arndt, Birgit; Baumann, Leonie; Apken, Lisa Helene; Mauriz Marques, Lucas Maciel; Humpf, Hans-Ulrich; Tudzynski, Bettina

2017-04-05

The PKS-NRPS-derived tetramic acid equisetin and its N -desmethyl derivative trichosetin exhibit remarkable biological activities against a variety of organisms, including plants and bacteria, e.g., Staphylococcus aureus . The equisetin biosynthetic gene cluster was first described in Fusarium heterosporum , a species distantly related to the notorious rice pathogen Fusarium fujikuroi . Here we present the activation and characterization of a homologous, but silent, gene cluster in F. fujikuroi . Bioinformatic analysis revealed that this cluster does not contain the equisetin N -methyltransferase gene eqxD and consequently, trichosetin was isolated as final product. The adaption of the inducible, tetracycline-dependent Tet-on promoter system from Aspergillus niger achieved a controlled overproduction of this toxic metabolite and a functional characterization of each cluster gene in F. fujikuroi . Overexpression of one of the two cluster-specific transcription factor (TF) genes, TF22 , led to an activation of the three biosynthetic cluster genes, including the PKS-NRPS key gene. In contrast, overexpression of TF23 , encoding a second Zn(II)₂Cys₆ TF, did not activate adjacent cluster genes. Instead, TF23 was induced by the final product trichosetin and was required for expression of the transporter-encoding gene MFS-T . TF23 and MFS-T likely act in consort and contribute to detoxification of trichosetin and therefore, self-protection of the producing fungus.
Computational gene expression profiling under salt stress reveals patterns of co-expression

PubMed Central

Sanchita; Sharma, Ashok

2016-01-01

Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
Identification of transcripts involved in meiosis and follicle formation during ovine ovary development.

PubMed

Baillet, Adrienne; Mandon-Pépin, Béatrice; Cabau, Cédric; Poumerol, Elodie; Pailhoux, Eric; Cotinot, Corinne

2008-09-23

The key steps in germ cell survival during ovarian development are the entry into meiosis of oogonies and the formation of primordial follicles, which then determine the reproductive lifespan of the ovary. In sheep, these steps occur during fetal life, between 55 and 80 days of gestation, respectively. The aim of this study was to identify differentially expressed ovarian genes during prophase I meiosis and early folliculogenesis in sheep. In order to elucidate the molecular events associated with early ovarian differentiation, we generated two ovary stage-specific subtracted cDNA libraries using SSH. Large-scale sequencing of these SSH libraries identified 6,080 ESTs representing 2,535 contigs. Clustering and assembly of these ESTs resulted in a total of 2,101 unique sequences depicted in 1,305 singleton (62.11%) and 796 contigs (37.9%) ESTs (clusters). BLASTX evaluation indicated that 99% of the ESTs were homologous to various known genes/proteins in a broad range of organisms, especially ovine, bovine and human species. The remaining 1% which exhibited any homology to known gene sequences was considered as novel. Detailed study of the expression patterns of some of these genes using RT-PCR revealed new promising candidates for ovary differentiation genes in sheep. We showed that the SSH approach was relevant to determining new mammalian genes which might be involved in oogenesis and early follicle development, and enabled the discovery of new potential oocyte and granulosa cell markers for future studies. These genes may have significant implications regarding our understanding of ovarian function in molecular terms, and for the development of innovative strategies to both promote and control fertility.
Fine-Scale Analysis Reveals Cryptic Landscape Genetic Structure in Desert Tortoises

PubMed Central

Latch, Emily K.; Boarman, William I.; Walde, Andrew; Fleischer, Robert C.

2011-01-01

Characterizing the effects of landscape features on genetic variation is essential for understanding how landscapes shape patterns of gene flow and spatial genetic structure of populations. Most landscape genetics studies have focused on patterns of gene flow at a regional scale. However, the genetic structure of populations at a local scale may be influenced by a unique suite of landscape variables that have little bearing on connectivity patterns observed at broader spatial scales. We investigated fine-scale spatial patterns of genetic variation and gene flow in relation to features of the landscape in desert tortoise (Gopherus agassizii), using 859 tortoises genotyped at 16 microsatellite loci with associated data on geographic location, sex, elevation, slope, and soil type, and spatial relationship to putative barriers (power lines, roads). We used spatially explicit and non-explicit Bayesian clustering algorithms to partition the sample into discrete clusters, and characterize the relationships between genetic distance and ecological variables to identify factors with the greatest influence on gene flow at a local scale. Desert tortoises exhibit weak genetic structure at a local scale, and we identified two subpopulations across the study area. Although genetic differentiation between the subpopulations was low, our landscape genetic analysis identified both natural (slope) and anthropogenic (roads) landscape variables that have significantly influenced gene flow within this local population. We show that desert tortoise movements at a local scale are influenced by features of the landscape, and that these features are different than those that influence gene flow at larger scales. Our findings are important for desert tortoise conservation and management, particularly in light of recent translocation efforts in the region. More generally, our results indicate that recent landscape changes can affect gene flow at a local scale and that their effects can be detected almost immediately. PMID:22132143
Fine-scale analysis reveals cryptic landscape genetic structure in desert tortoises.

PubMed

Latch, Emily K; Boarman, William I; Walde, Andrew; Fleischer, Robert C

2011-01-01

Characterizing the effects of landscape features on genetic variation is essential for understanding how landscapes shape patterns of gene flow and spatial genetic structure of populations. Most landscape genetics studies have focused on patterns of gene flow at a regional scale. However, the genetic structure of populations at a local scale may be influenced by a unique suite of landscape variables that have little bearing on connectivity patterns observed at broader spatial scales. We investigated fine-scale spatial patterns of genetic variation and gene flow in relation to features of the landscape in desert tortoise (Gopherus agassizii), using 859 tortoises genotyped at 16 microsatellite loci with associated data on geographic location, sex, elevation, slope, and soil type, and spatial relationship to putative barriers (power lines, roads). We used spatially explicit and non-explicit Bayesian clustering algorithms to partition the sample into discrete clusters, and characterize the relationships between genetic distance and ecological variables to identify factors with the greatest influence on gene flow at a local scale. Desert tortoises exhibit weak genetic structure at a local scale, and we identified two subpopulations across the study area. Although genetic differentiation between the subpopulations was low, our landscape genetic analysis identified both natural (slope) and anthropogenic (roads) landscape variables that have significantly influenced gene flow within this local population. We show that desert tortoise movements at a local scale are influenced by features of the landscape, and that these features are different than those that influence gene flow at larger scales. Our findings are important for desert tortoise conservation and management, particularly in light of recent translocation efforts in the region. More generally, our results indicate that recent landscape changes can affect gene flow at a local scale and that their effects can be detected almost immediately.
Genetic Screening Strategy for Rapid Access to Polyether Ionophore Producers and Products in Actinomycetes ▿ †

PubMed Central

Wang, Hao; Liu, Ning; Xi, Lijun; Rong, Xiaoying; Ruan, Jisheng; Huang, Ying

2011-01-01

Polyether ionophores are a unique class of polyketides with broad-spectrum activity and outstanding potency for the control of drug-resistant bacteria and parasites, and they are produced exclusively by actinomycetes. A special epoxidase gene encoding a critical tailoring enzyme involved in the biosynthesis of these compounds has been found in all five of the complete gene clusters of polyether ionophores published so far. To detect potential producer strains of these antibiotics, a pair of degenerate primers was designed according to the conserved regions of the five known polyether epoxidases. A total of 44 putative polyether epoxidase gene-positive strains were obtained by the PCR-based screening of 1,068 actinomycetes isolated from eight different habitats and 236 reference strains encompassing eight major families of Actinomycetales. The isolates spanned a wide taxonomic diversity based on 16S rRNA gene analysis, and actinomycetes isolated from acidic soils seemed to be a promising source of polyether ionophores. Four genera were detected to contain putative polyether epoxidases, including Micromonospora, which has not previously been reported to produce polyether ionophores. The designed primers also detected putative epoxidase genes from diverse known producer strains that produce polyether ionophores unrelated to the five published gene clusters. Moreover, phylogenetic and chemical analyses showed a strong correlation between the sequence of polyether epoxidases and the structure of encoded polyethers. Thirteen positive isolates were proven to be polyether ionophore producers as expected, and two new analogues were found. These results demonstrate the feasibility of using this epoxidase gene screening strategy to aid the rapid identification of known products and the discovery of unknown polyethers in actinomycetes. PMID:21421776
Genetic screening strategy for rapid access to polyether ionophore producers and products in actinomycetes.

PubMed

Wang, Hao; Liu, Ning; Xi, Lijun; Rong, Xiaoying; Ruan, Jisheng; Huang, Ying

2011-05-01

Polyether ionophores are a unique class of polyketides with broad-spectrum activity and outstanding potency for the control of drug-resistant bacteria and parasites, and they are produced exclusively by actinomycetes. A special epoxidase gene encoding a critical tailoring enzyme involved in the biosynthesis of these compounds has been found in all five of the complete gene clusters of polyether ionophores published so far. To detect potential producer strains of these antibiotics, a pair of degenerate primers was designed according to the conserved regions of the five known polyether epoxidases. A total of 44 putative polyether epoxidase gene-positive strains were obtained by the PCR-based screening of 1,068 actinomycetes isolated from eight different habitats and 236 reference strains encompassing eight major families of Actinomycetales. The isolates spanned a wide taxonomic diversity based on 16S rRNA gene analysis, and actinomycetes isolated from acidic soils seemed to be a promising source of polyether ionophores. Four genera were detected to contain putative polyether epoxidases, including Micromonospora, which has not previously been reported to produce polyether ionophores. The designed primers also detected putative epoxidase genes from diverse known producer strains that produce polyether ionophores unrelated to the five published gene clusters. Moreover, phylogenetic and chemical analyses showed a strong correlation between the sequence of polyether epoxidases and the structure of encoded polyethers. Thirteen positive isolates were proven to be polyether ionophore producers as expected, and two new analogues were found. These results demonstrate the feasibility of using this epoxidase gene screening strategy to aid the rapid identification of known products and the discovery of unknown polyethers in actinomycetes.
Photosynthetic Trichomes Contain a Specific Rubisco with a Modified pH-Dependent Activity.

PubMed

Laterre, Raphaëlle; Pottier, Mathieu; Remacle, Claire; Boutry, Marc

2017-04-01

Ribulose-1,5-biphosphate carboxylase/oxygenase (Rubisco) is the most abundant enzyme in plants and is responsible for CO 2 fixation during photosynthesis. This enzyme is assembled from eight large subunits (RbcL) encoded by a single chloroplast gene and eight small subunits (RbcS) encoded by a nuclear gene family. Rubisco is primarily found in the chloroplasts of mesophyll (C3 plants), bundle-sheath (C4 plants), and guard cells. In certain species, photosynthesis also takes place in the secretory cells of glandular trichomes, which are epidermal outgrowths (hairs) involved in the secretion of specialized metabolites. However, photosynthesis and, in particular, Rubisco have not been characterized in trichomes. Here, we show that tobacco ( Nicotiana tabacum ) trichomes contain a specific Rubisco small subunit, NtRbcS-T, which belongs to an uncharacterized phylogenetic cluster (T). This cluster contains RbcS from at least 33 species, including monocots, many of which are known to possess glandular trichomes. Cluster T is distinct from the cluster M, which includes the abundant, functionally characterized RbcS isoforms expressed in mesophyll or bundle-sheath cells. Expression of NtRbcS-T in Chlamydomonas reinhardtii and purification of the full Rubisco complex showed that this isoform conferred higher V max and K m values as well as higher acidic pH-dependent activity than NtRbcS-M, an isoform expressed in the mesophyll. This observation was confirmed with trichome extracts. These data show that an ancient divergence allowed for the emergence of a so-far-uncharacterized RbcS cluster. We propose that secretory trichomes have a particular Rubisco uniquely adapted to secretory cells where CO 2 is released by the active specialized metabolism. © 2017 American Society of Plant Biologists. All Rights Reserved.
Phylogenetic Analysis of Prevalent Tuberculosis and Non-Tuberculosis Mycobacteria in Isfahan, Iran, Based on a 360 bp Sequence of the rpoB Gene

PubMed Central

Nasr Esfahani, Bahram; Moghim, Sharareh; Ghasemian Safaei, Hajieh; Moghoofei, Mohsen; Sedighi, Mansour; Hadifar, Shima

2016-01-01

Background Taxonomic and phylogenetic studies of Mycobacterium species have been based around the 16sRNA gene for many years. However, due to the high strain similarity between species in the Mycobacterium genus (94.3% - 100%), defining a valid phylogenetic tree is difficult; consequently, its use in estimating the boundaries between species is limited. The sequence of the rpoB gene makes it an appropriate gene for phylogenetic analysis, especially in bacteria with limited variation. Objectives In the present study, a 360bp sequence of rpoB was used for precise classification of Mycobacterium strains isolated in Isfahan, Iran. Materials and Methods From February to October 2013, 57 clinical and environmental isolates were collected, subcultured, and identified by phenotypic methods. After DNA extraction, a 360bp fragment was PCR-amplified and sequenced. The phylogenetic tree was constructed based on consensus sequence data, using MEGA5 software. Results Slow and fast-growing groups of the Mycobacterium strains were clearly differentiated based on the constructed tree of 56 common Mycobacterium isolates. Each species with a unique title in the tree was identified; in total, 13 nods with a bootstrap value of over 50% were supported. Among the slow-growing group was Mycobacterium kansasii, with M. tuberculosis in a cluster with a bootstrap value of 98% and M. gordonae in another cluster with a bootstrap value of 90%. In the fast-growing group, one cluster with a bootstrap value of 89% was defined, including all fast-growing members present in this study. Conclusions The results suggest that only the application of the rpoB gene sequence is sufficient for taxonomic categorization and definition of a new Mycobacterium species, due to its high resolution power and proper variation in its sequence (85% - 100%); the resulting tree has high validity. PMID:27284397
Genomic and Transcriptomic Analyses to Identify Pathways Involved in Nanoparticle Generation in the Ubiquitous Marine Bacterium Alteromonas macleodii Under Elevated Copper Conditions

NASA Astrophysics Data System (ADS)

Cusick, K. D.; Dale, J.; Little, B.; Cockrell, A.; Biffinger, J.

2016-02-01

Alteromonas macleodii is a ubiquitous marine bacterium that clusters by molecular analyses into two ecotypes: surface and deep-water. Our group isolated a marine bacterium from copper coupons that generates nanoparticles (NPs) at elevated copper concentrations. Sequencing of the 16S rRNA gene identified it as an A. macleodii strain. In phylogenetic analyses based on the gyrB gene, it clustered with other surface isolates; however, it formed a unique cluster separate from that of other surface isolates based on rpoB gene sequences. Copper is commonly employed as an antifouling agent on the hulls of ships, and so copper tolerance and NP generation is under investigation in this strain. The overall goals of this study were: (1) to determine if copper tolerance is the result of changes at the genetic or transcriptional level and (2) to identify the genes involved in NP formation. Sub-cultures were established from the initial isolate in which copper concentrations were increased in .25 mM increments through multiple generations. These sub-cultures were assayed for NP formation in seawater medium supplemented with 3-4 mM copper. Scanning electron microscopy revealed large aggregates of NPs on the exterior surface of all sub-cultures. Additionally, a portion of the cells in all sub-cultures displayed an elongated morphology in comparison to the wild-type. No NPs were observed in wild-type controls grown without the addition of increased copper. Metagenomic sequencing of natural populations of A. macleodii revealed extreme divergence in several large genomic regions whose content includes genes coding for exopolysaccharide production and metal resistance. High-throughput sequencing is being used to determine whether copper tolerance and NP generation is the result of genetic or transcriptional changes. These results will be extended to natural communities to gain insights into the role of bacterial NPs during conditions of elevated metal concentrations in coastal systems.

Mining subspace clusters from DNA microarray data using large itemset techniques.

PubMed

Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi

2009-05-01

Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.
Platypus globin genes and flanking loci suggest a new insertional model for beta-globin evolution in birds and mammals

PubMed Central

Patel, Vidushi S; Cooper, Steven JB; Deakin, Janine E; Fulton, Bob; Graves, Tina; Warren, Wesley C; Wilson, Richard K; Graves, Jennifer AM

2008-01-01

Background Vertebrate alpha (α)- and beta (β)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the α- and β-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil β-globin gene (ω) in the marsupial α-cluster, however, suggested that duplication of the α-β cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous α- and β-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation. Results The platypus α-globin cluster (chromosome 21) contains embryonic and adult α- globin genes, a β-like ω-globin gene, and the GBY globin gene with homology to cytoglobin, arranged as 5'-ζ-ζ'-αD-α3-α2-α1-ω-GBY-3'. The platypus β-globin cluster (chromosome 2) contains single embryonic and adult globin genes arranged as 5'-ε-β-3'. Surprisingly, all of these globin genes were expressed in some adult tissues. Comparison of flanking sequences revealed that all jawed vertebrate α-globin clusters are flanked by MPG-C16orf35 and LUC7L, whereas all bird and mammal β-globin clusters are embedded in olfactory genes. Thus, the mammalian α- and β-globin clusters are orthologous to the bird α- and β-globin clusters respectively. Conclusion We propose that α- and β-globin clusters evolved from an ancient MPG-C16orf35-α-β-GBY-LUC7L arrangement 410 million years ago. A copy of the original β (represented by ω in marsupials and monotremes) was inserted into an array of olfactory genes before the amniote radiation (>315 million years ago), then duplicated and diverged to form orthologous clusters of β-globin genes with different expression profiles in different lineages. PMID:18657265
DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, Mansi; Larroux, Claire; Lu, Daniel R

LIM homeobox (Lhx) transcription factors are unique to the animal lineage and have patterning roles during embryonic development in flies, nematodes and vertebrates, with a conserved role in specifying neuronal identity. Though genes of this family have been reported in a sponge and a cnidarian, the expression patterns and functions of the Lhx family during development in non-bilaterian phyla are not known. We identified Lhx genes in two cnidarians and a placozoan and report the expression of Lhx genes during embryonic development in Nematostella and the demosponge Amphimedon. Members of the six major LIM homeobox subfamilies are represented in themore » genomes of the starlet sea anemone, Nematostella vectensis, and the placozoan Trichoplax adhaerens. The hydrozoan cnidarian, Hydra magnipapillata, has retained four of the six Lhx subfamilies, but apparently lost two others. Only three subfamilies are represented in the haplosclerid demosponge Amphimedon queenslandica. A tandem cluster of three Lhx genes of different subfamilies and a gene containing two LIM domains in the genome of T. adhaerens (an animal without any neurons) indicates that Lhx subfamilies were generated by tandem duplication. This tandem cluster in Trichoplax is likely a remnant of the original chromosomal context in which Lhx subfamilies first appeared. Three of the six Trichoplax Lhx genes are expressed in animals in laboratory culture, as are all Lhx genes in Hydra. Expression patterns of Nematostella Lhx genes correlate with neural territories in larval and juvenile polyp stages. In the aneural demosponge, A. queenslandica, the three Lhx genes are expressed widely during development, including in cells that are associated with the larval photosensory ring. The Lhx family expanded and diversified early in animal evolution, with all six subfamilies already diverged prior to the cnidarian-placozoan-bilaterian last common ancestor. In Nematostella, Lhx gene expression is correlated with neural territories in larval and juvenile polyp stages. This pattern is consistent with a possible role in patterning the Nematostella nervous system. We propose a scenario in which Lhx genes play a homologous role in neural patterning across eumetazoans.« less
Genome-wide DNA methylation analysis reveals estrogen-mediated epigenetic repression of metallothionein-1 gene cluster in breast cancer.

PubMed

Jadhav, Rohit R; Ye, Zhenqing; Huang, Rui-Lan; Liu, Joseph; Hsu, Pei-Yin; Huang, Yi-Wen; Rangel, Leticia B; Lai, Hung-Cheng; Roa, Juan Carlos; Kirma, Nameer B; Huang, Tim Hui-Ming; Jin, Victor X

2015-01-01

Recent genome-wide analysis has shown that DNA methylation spans long stretches of chromosome regions consisting of clusters of contiguous CpG islands or gene families. Hypermethylation of various gene clusters has been reported in many types of cancer. In this study, we conducted methyl-binding domain capture (MBDCap) sequencing (MBD-seq) analysis on a breast cancer cohort consisting of 77 patients and 10 normal controls, as well as a panel of 38 breast cancer cell lines. Bioinformatics analysis determined seven gene clusters with a significant difference in overall survival (OS) and further revealed a distinct feature that the conservation of a large gene cluster (approximately 70 kb) metallothionein-1 (MT1) among 45 species is much lower than the average of all RefSeq genes. Furthermore, we found that DNA methylation is an important epigenetic regulator contributing to gene repression of MT1 gene cluster in both ERα positive (ERα+) and ERα negative (ERα-) breast tumors. In silico analysis revealed much lower gene expression of this cluster in The Cancer Genome Atlas (TCGA) cohort for ERα + tumors. To further investigate the role of estrogen, we conducted 17β-estradiol (E2) and demethylating agent 5-aza-2'-deoxycytidine (DAC) treatment in various breast cancer cell types. Cell proliferation and invasion assays suggested MT1F and MT1M may play an anti-oncogenic role in breast cancer. Our data suggests that DNA methylation in large contiguous gene clusters can be potential prognostic markers of breast cancer. Further investigation of these clusters revealed that estrogen mediates epigenetic repression of MT1 cluster in ERα + breast cancer cell lines. In all, our studies identify thousands of breast tumor hypermethylated regions for the first time, in particular, discovering seven large contiguous hypermethylated gene clusters.
MeSH key terms for validation and annotation of gene expression clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rechtsteiner, A.; Rocha, L. M.

2004-01-01

Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert whomore » had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes associated with MeSH terms. MeSH terms can be associated with genes through co-occurrence of these in MEDLINE citations, i.e. the genes occur in titles or abstracts and the MeSH terms are assigned by experts. To identify MeSH terms associated with a group of genes we used the tool MESHGENE developed at the Information Dynamics Lab at HP Labs (http://www-idl.hpl.hp.com/meshgene/). When presented with a list of human genes, MESHGENE uses some sophisticated techniques to search for these gene symbols in the titles and abstracts of all MEDLINE citations. MeSH terms and the number of co-occurrences can be retrieved. Gene symbols that are aliases of each other are pooled from several databases. This addresses the problem of synonymy, the fact that several symbols can refer to the same gene. MESHGENE employs some sophisticated algorithms that disregards symbols that are likely to be acronyms for other concepts than a gene. This addresses the problem of polysemy, i.e. possible multiple meanings of a gene symbol. We applied our approach to gene expression data from herpes virus infected human fibroblast cells. The data contains 12 time-points, between 1/2 hrs and 48 hrs after infection. Singular Value Decomposition was used to identify the dominant modes of expression. 75% of the variance in the expression data was captured by the first two modes, the first exhibiting a monotonly increasing expression pattern and the second a more transient pattern. Projection of the gene expression vectors onto this first two modes identified 3 statistically significant clusters of co-expressed genes. 500 genes from cluster 1 and 300 genes from clusters 2 and 3 each were uploaded to MESHGENE and the MeSH terms and co-occurrence values were retrieved. MeSH terms were also obtained for 5 groups of randomly selected genes with similar numbers of genes. The log was taken of the co-occurrence values and for each MeSH term these log co-occurrence values were summed for each group over the genes in that group. A matrix with 8 columns for the 8 groups of genes and with 14,000 rows with the MeSH terms was obtained. To analyze this association matrix we used a Latent Semantic Analysis (LSA) approach. We applied SVD to this gene-group vs. MeSH term association matrix. The first 2 modes that capture most of the variation (and therefore most times also information) in the association matrix were highly associated with MeSH terms that occurred uniquely or disproportionally in the 3 gene clusters. MeSH terms highly associated with the 5 groups of randomly selected genes were associated with the lower modes. These modes seem to just capture 'noise' in the association matrix. This result by itself is of great interest for gene expression analysis. We were able to show that the 3 clusters of genes not only separated in 'expression space' but also in the MeSH term space with which they are associated through the literature.« less
An OmpA family protein, a target of the GinI/GinR quorum-sensing system in Gluconacetobacter intermedius, controls acetic acid fermentation.

PubMed

Iida, Aya; Ohnishi, Yasuo; Horinouchi, Sueharu

2008-07-01

Via N-acylhomoserine lactones, the GinI/GinR quorum-sensing system in Gluconacetobacter intermedius NCI1051, a gram-negative acetic acid bacterium, represses acetic acid and gluconic acid fermentation. Two-dimensional polyacrylamide gel electrophoretic analysis of protein profiles of strain NCI1051 and ginI and ginR mutants identified a protein that was produced in response to the GinI/GinR regulatory system. Cloning and nucleotide sequencing of the gene encoding this protein revealed that it encoded an OmpA family protein, named GmpA. gmpA was a member of the gene cluster containing three adjacent homologous genes, gmpA to gmpC, the organization of which appeared to be unique to vinegar producers, including "Gluconacetobacter polyoxogenes." In addition, GmpA was unique among the OmpA family proteins in that its N-terminal membrane domain forming eight antiparallel transmembrane beta-strands contained an extra sequence in one of the surface-exposed loops. Transcriptional analysis showed that only gmpA of the three adjacent gmp genes was activated by the GinI/GinR quorum-sensing system. However, gmpA was not controlled directly by GinR but was controlled by an 89-amino-acid protein, GinA, a target of this quorum-sensing system. A gmpA mutant grew more rapidly in the presence of 2% (vol/vol) ethanol and accumulated acetic acid and gluconic acid in greater final yields than strain NCI1051. Thus, GmpA plays a role in repressing oxidative fermentation, including acetic acid fermentation, which is unique to acetic acid bacteria and allows ATP synthesis via ethanol oxidation. Consistent with the involvement of gmpA in oxidative fermentation, its transcription was also enhanced by ethanol and acetic acid.
WordCluster: detecting clusters of DNA words and genomic elements

PubMed Central

2011-01-01

Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981
Large clusters of co-expressed genes in the Drosophila genome.

PubMed

Boutanaev, Alexander M; Kalmykova, Alla I; Shevelyov, Yuri Y; Nurminsky, Dmitry I

2002-12-12

Clustering of co-expressed, non-homologous genes on chromosomes implies their co-regulation. In lower eukaryotes, co-expressed genes are often found in pairs. Clustering of genes that share aspects of transcriptional regulation has also been reported in higher eukaryotes. To advance our understanding of the mode of coordinated gene regulation in multicellular organisms, we performed a genome-wide analysis of the chromosomal distribution of co-expressed genes in Drosophila. We identified a total of 1,661 testes-specific genes, one-third of which are clustered on chromosomes. The number of clusters of three or more genes is much higher than expected by chance. We observed a similar trend for genes upregulated in the embryo and in the adult head, although the expression pattern of individual genes cannot be predicted on the basis of chromosomal position alone. Our data suggest that the prevalent mechanism of transcriptional co-regulation in higher eukaryotes operates with extensive chromatin domains that comprise multiple genes.
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants

DOE PAGES

Schläpfer, Pascal; Zhang, Peifen; Wang, Chuan; ...

2017-04-01

Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we will need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can bemore » used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.« less
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants1[OPEN

PubMed Central

Zhang, Peifen; Kim, Taehyong; Banf, Michael; Chavali, Arvind K.; Nilo-Poyanco, Ricardo; Bernard, Thomas

2017-01-01

Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters. PMID:28228535
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schläpfer, Pascal; Zhang, Peifen; Wang, Chuan

Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we will need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can bemore » used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.« less
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants.

PubMed

Schläpfer, Pascal; Zhang, Peifen; Wang, Chuan; Kim, Taehyong; Banf, Michael; Chae, Lee; Dreher, Kate; Chavali, Arvind K; Nilo-Poyanco, Ricardo; Bernard, Thomas; Kahn, Daniel; Rhee, Seung Y

2017-04-01

Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters. © 2017 American Society of Plant Biologists. All Rights Reserved.
Function Clustering Self-Organization Maps (FCSOMs) for mining differentially expressed genes in Drosophila and its correlation with the growth medium.

PubMed

Liu, L L; Liu, M J; Ma, M

2015-09-28

The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
Hox gene duplications correlate with posterior heteronomy in scorpions

PubMed Central

Sharma, Prashant P.; Schwager, Evelyn E.; Extavour, Cassandra G.; Wheeler, Ward C.

2014-01-01

The evolutionary success of the largest animal phylum, Arthropoda, has been attributed to tagmatization, the coordinated evolution of adjacent metameres to form morphologically and functionally distinct segmental regions called tagmata. Specification of regional identity is regulated by the Hox genes, of which 10 are inferred to be present in the ancestor of arthropods. With six different posterior segmental identities divided into two tagmata, the bauplan of scorpions is the most heteronomous within Chelicerata. Expression domains of the anterior eight Hox genes are conserved in previously surveyed chelicerates, but it is unknown how Hox genes regionalize the three tagmata of scorpions. Here, we show that the scorpion Centruroides sculpturatus has two paralogues of all Hox genes except Hox3, suggesting cluster and/or whole genome duplication in this arachnid order. Embryonic anterior expression domain boundaries of each of the last four pairs of Hox genes (two paralogues each of Antp, Ubx, abd-A and Abd-B) are unique and distinguish segmental groups, such as pectines, book lungs and the characteristic tail, while maintaining spatial collinearity. These distinct expression domains suggest neofunctionalization of Hox gene paralogues subsequent to duplication. Our data reconcile previous understanding of Hox gene function across arthropods with the extreme heteronomy of scorpions. PMID:25122224
Genome sequence of an enhancin gene-rich nucleopolyhedrovirus (NPV) from Agrotis segetum: collinearity with Spodoptera exigua multiple NPV.

PubMed

Jakubowska, Agata K; Peters, Sander A; Ziemnicka, Jadwiga; Vlak, Just M; van Oers, Monique M

2006-03-01

The genome sequence of a Polish isolate of Agrotis segetum nucleopolyhedrovirus (AgseNPV-A) was determined and analysed. The circular genome is composed of 147,544 bp and has a G+C content of 45.7 mol%. It contains 153 putative, non-overlapping open reading frames (ORFs) encoding predicted proteins of more than 50 aa, together making up 89.8 % of the genome. The remaining 10.2 % of the DNA constitutes non-coding regions and homologous-repeat regions. One hundred and forty-three AgseNPV-A ORFs are homologues of previously reported baculovirus gene sequences. There are ten unique ORFs and they account for 3 % of the genome in total. All 62 lepidopteran baculovirus genes, including the 29 core baculovirus genes, were found in the AgseNPV-A genome. The gene content and gene order of AgseNPV-A are most similar to those of Spodoptera exigua (Se) multiple NPV and their shared homologous genes are 100 % collinear. Three putative enhancin genes were identified in the AgseNPV-A genome. In phylogenetic analysis, the AgseNPV-A enhancins form a cluster separated from enhancins of the Mamestra species NPVs.
Functional genomics of commercial baker's yeasts that have different abilities for sugar utilization and high-sucrose tolerance under different sugar conditions.

PubMed

Tanaka-Tsuno, Fumiko; Mizukami-Murata, Satomi; Murata, Yoshinori; Nakamura, Toshihide; Ando, Akira; Takagi, Hiroshi; Shima, Jun

2007-10-01

In the modern baking industry, high-sucrose-tolerant (HS) and maltose-utilizing (LS) yeast were developed using breeding techniques and are now used commercially. Sugar utilization and high-sucrose tolerance differ significantly between HS and LS yeasts. We analysed the gene expression profiles of HS and LS yeasts under different sucrose conditions in order to determine their basic physiology. Two-way hierarchical clustering was performed to obtain the overall patterns of gene expression. The clustering clearly showed that the gene expression patterns of LS yeast differed from those of HS yeast. Quality threshold clustering was used to identify the gene clusters containing upregulated genes (cluster 1) and downregulated genes (cluster 2) under high-sucrose conditions. Clusters 1 and 2 contained numerous genes involved in carbon and nitrogen metabolism, respectively. The expression level of the genes involved in the metabolism of glycerol and trehalose, which are known to be osmoprotectants, in LS yeast was higher than that in HS yeast under sucrose concentrations of 5-40%. No clear correlation was found between the expression level of the genes involved in the biosynthesis of the osmoprotectants and the intracellular contents of the osmoprotectants. The present gene expression data were compared with data previously reported in a comprehensive analysis of a gene deletion strain collection. Welch's t-test for this comparison showed that the relative growth rates of the deletion strains whose deletion occurred in genes belonging to cluster 1 were significantly higher than the average growth rates of all deletion strains. Copyright 2007 John Wiley & Sons, Ltd.
Complete chloroplast genome sequence of green foxtail (Setaria viridis), a promising model system for C4 photosynthesis.

PubMed

Wang, Shuo; Gao, Li-Zhi

2016-09-01

The complete chloroplast genome of green foxtail (Setaria viridis), a promising model system for C4 photosynthesis, is first reported in this study. The genome harbors a large single copy (LSC) region of 81 016 bp and a small single copy (SSC) region of 12 456 bp separated by a pair of inverted repeat (IRa and IRb) regions of 22 315 bp. GC content is 38.92%. The proportion of coding sequence is 57.97%, comprising of 111 (19 duplicated in IR regions) unique genes, 71 of which are protein-coding genes, four are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated that S. viridis was clustered with its cultivated species S. italica in the tribe Paniceae of the family Poaceae. This newly determined chloroplast genome will provide valuable genetic resources to assist future studies on C4 photosynthesis in grasses.
Differential Retention of Gene Functions in a Secondary Metabolite Cluster.

PubMed

Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W

2017-08-01

In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

2008-05-12

The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Transcriptomes define distinct subgroups of salivary gland adenoid cystic carcinoma with different driver mutations and outcomes

PubMed Central

Frerich, Candace A.; Brayer, Kathryn J.; Painter, Brandon M.; Kang, Huining; Mitani, Yoshitsugu; El-Naggar, Adel K.; Ness, Scott A.

2018-01-01

The relative rarity of salivary gland adenoid cystic carcinoma (ACC) and its slow growing yet aggressive nature has complicated the development of molecular markers for patient stratification. To analyze molecular differences linked to the protracted disease course of ACC and metastases that form 5 or more years after diagnosis, detailed RNA-sequencing (RNA-seq) analysis was performed on 68 ACC tumor samples, starting with archived, formalin-fixed paraffin-embedded (FFPE) samples up to 25 years old, so that clinical outcomes were available. A statistical peak-finding approach was used to classify the tumors that expressed MYB or MYBL1, which had overlapping gene expression signatures, from a group that expressed neither oncogene and displayed a unique phenotype. Expression of MYB or MYBL1 was closely correlated to the expression of the SOX4 and EN1 genes, suggesting that they are direct targets of Myb proteins in ACC tumors. Unsupervised hierarchical clustering identified a subgroup of approximately 20% of patients with exceptionally poor overall survival (median less than 30 months) and a unique gene expression signature resembling embryonic stem cells. The results provide a strategy for stratifying ACC patients and identifying the high-risk, poor-outcome group that are candidates for personalized therapies. PMID:29484115

Organization of the Escherichia coli K-12 gene cluster responsible for production of the extracellular polysaccharide colanic acid.

PubMed Central

Stevenson, G; Andrianopoulos, K; Hobbs, M; Reeves, P R

1996-01-01

Colanic acid (CA) is an extracellular polysaccharide produced by most Escherichia coli strains as well as by other species of the family Enterobacteriaceae. We have determined the sequence of a 23-kb segment of the E. coli K-12 chromosome which includes the cluster of genes necessary for production of CA. The CA cluster comprises 19 genes. Two other sequenced genes (orf1.3 and galF), which are situated between the CA cluster and the O-antigen cluster, were shown to be unnecessary for CA production. The CA cluster includes genes for synthesis of GDP-L-fucose, one of the precursors of CA, and the gene for one of the enzymes in this pathway (GDP-D-mannose 4,6-dehydratase) was identified by biochemical assay. Six of the inferred proteins show sequence similarity to glycosyl transferases, and two others have sequence similarity to acetyl transferases. Another gene (wzx) is predicted to encode a protein with multiple transmembrane segments and may function in export of the CA repeat unit from the cytoplasm into the periplasm in a process analogous to O-unit export. The first three genes of the cluster are predicted to encode an outer membrane lipoprotein, a phosphatase, and an inner membrane protein with an ATP-binding domain. Since homologs of these genes are found in other extracellular polysaccharide gene clusters, they may have a common function, such as export of polysaccharide from the cell. PMID:8759852
The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks1[C][W][OA

PubMed Central

Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex

2010-01-01

Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks.

PubMed

Ficklin, Stephen P; Luo, Feng; Feltus, F Alex

2010-09-01

Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
The sirodesmin biosynthetic gene cluster of the plant pathogenic fungus Leptosphaeria maculans.

PubMed

Gardiner, Donald M; Cozijnsen, Anton J; Wilson, Leanne M; Pedras, M Soledade C; Howlett, Barbara J

2004-09-01

Sirodesmin PL is a phytotoxin produced by the fungus Leptosphaeria maculans, which causes blackleg disease of canola (Brassica napus). This phytotoxin belongs to the epipolythiodioxopiperazine (ETP) class of toxins produced by fungi including mammalian and plant pathogens. We report the cloning of a cluster of genes with predicted roles in the biosynthesis of sirodesmin PL and show via gene disruption that one of these genes (encoding a two-module non-ribosomal peptide synthetase) is essential for sirodesmin PL biosynthesis. Of the nine genes in the cluster tested, all are co-regulated with the production of sirodesmin PL in culture. A similar cluster is present in the genome of the opportunistic human pathogen Aspergillus fumigatus and is most likely responsible for the production of gliotoxin, which is also an ETP. Homologues of the genes in the cluster were also identified in expressed sequence tags of the ETP producing fungus Chaetomium globosum. Two other fungi with publicly available genome sequences, Magnaporthe grisea and Fusarium graminearum, had similar gene clusters. A comparative analysis of all four clusters is presented. This is the first report of the genes responsible for the biosynthesis of an ETP. Copyright 2004 Blackwell Publishing Ltd
Many nonuniversal archaeal ribosomal proteins are found in conserved gene clusters

PubMed Central

WANG, JIACHEN; DASGUPTA, INDRANI; FOX, GEORGE E.

2009-01-01

The genomic associations of the archaeal ribosomal proteins, (r-proteins), were examined in detail. The archaeal versions of the universal r-protein genes are typically in clusters similar or identical and to those found in bacteria. Of the 35 nonuniversal archaeal r-protein genes examined, the gene encoding L18e was found to be associated with the conserved L13 cluster, whereas the genes for S4e, L32e and L19e were found in the archaeal version of the spc operon. Eleven nonuniversal protein genes were not associated with any common genomic context. Of the remaining 19 protein genes, 17 were convincingly assigned to one of 10 previously unrecognized gene clusters. Examination of the gene content of these clusters revealed multiple associations with genes involved in the initiation of protein synthesis, transcription or other cellular processes. The lack of such associations in the universal clusters suggests that initially the ribosome evolved largely independently of other processes. More recently it likely has evolved in concert with other cellular systems. It was also verified that a second copy of the gene encoding L7ae found in some bacteria is actually a homolog of the gene encoding L30e and should be annotated as such. PMID:19478915
A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters.

PubMed

Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing

2017-10-27

Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
Anticancer Properties of Distinct Antimalarial Drug Classes

PubMed Central

Hooft van Huijsduijnen, Rob; Guy, R. Kiplin; Chibale, Kelly; Haynes, Richard K.; Peitz, Ingmar; Kelter, Gerhard; Phillips, Margaret A.; Vennerstrom, Jonathan L.; Yuthavong, Yongyuth; Wells, Timothy N. C.

2013-01-01

We have tested five distinct classes of established and experimental antimalarial drugs for their anticancer potential, using a panel of 91 human cancer lines. Three classes of drugs: artemisinins, synthetic peroxides and DHFR (dihydrofolate reductase) inhibitors effected potent inhibition of proliferation with IC50s in the nM- low µM range, whereas a DHODH (dihydroorotate dehydrogenase) and a putative kinase inhibitor displayed no activity. Furthermore, significant synergies were identified with erlotinib, imatinib, cisplatin, dasatinib and vincristine. Cluster analysis of the antimalarials based on their differential inhibition of the various cancer lines clearly segregated the synthetic peroxides OZ277 and OZ439 from the artemisinin cluster that included artesunate, dihydroartemisinin and artemisone, and from the DHFR inhibitors pyrimethamine and P218 (a parasite DHFR inhibitor), emphasizing their shared mode of action. In order to further understand the basis of the selectivity of these compounds against different cancers, microarray-based gene expression data for 85 of the used cell lines were generated. For each compound, distinct sets of genes were identified whose expression significantly correlated with compound sensitivity. Several of the antimalarials tested in this study have well-established and excellent safety profiles with a plasma exposure, when conservatively used in malaria, that is well above the IC50s that we identified in this study. Given their unique mode of action and potential for unique synergies with established anticancer drugs, our results provide a strong basis to further explore the potential application of these compounds in cancer in pre-clinical or and clinical settings. PMID:24391728
Cortical Folding of the Primate Brain: An Interdisciplinary Examination of the Genetic Architecture, Modularity, and Evolvability of a Significant Neurological Trait in Pedigreed Baboons (Genus Papio)

PubMed Central

Atkinson, Elizabeth G.; Rogers, Jeffrey; Mahaney, Michael C.; Cox, Laura A.; Cheverud, James M.

2015-01-01

Folding of the primate brain cortex allows for improved neural processing power by increasing cortical surface area for the allocation of neurons. The arrangement of folds (sulci) and ridges (gyri) across the cerebral cortex is thought to reflect the underlying neural network. Gyrification, an adaptive trait with a unique evolutionary history, is affected by genetic factors different from those affecting brain volume. Using a large pedigreed population of ∼1000 Papio baboons, we address critical questions about the genetic architecture of primate brain folding, the interplay between genetics, brain anatomy, development, patterns of cortical–cortical connectivity, and gyrification’s potential for future evolution. Through Mantel testing and cluster analyses, we find that the baboon cortex is quite evolvable, with high integration between the genotype and phenotype. We further find significantly similar partitioning of variation between cortical development, anatomy, and connectivity, supporting the predictions of tension-based models for sulcal development. We identify a significant, moderate degree of genetic control over variation in sulcal length, with gyrus-shape features being more susceptible to environmental effects. Finally, through QTL mapping, we identify novel chromosomal regions affecting variation in brain folding. The most significant QTL contain compelling candidate genes, including gene clusters associated with Williams and Down syndromes. The QTL distribution suggests a complex genetic architecture for gyrification with both polygeny and pleiotropy. Our results provide a solid preliminary characterization of the genetic basis of primate brain folding, a unique and biomedically relevant phenotype with significant implications in primate brain evolution. PMID:25873632
Bacillus safensis FO-36b and Bacillus pumilus SAFR-032: a whole genome comparison of two spacecraft assembly facility isolates.

PubMed

Tirumalai, Madhan R; Stepanov, Victor G; Wünsche, Andrea; Montazari, Saied; Gonzalez, Racquel O; Venkateswaran, Kasturi; Fox, George E

2018-06-08

Bacillus strains producing highly resistant spores have been isolated from cleanrooms and space craft assembly facilities. Organisms that can survive such conditions merit planetary protection concern and if that resistance can be transferred to other organisms, a health concern too. To further efforts to understand these resistances, the complete genome of Bacillus safensis strain FO-36b, which produces spores resistant to peroxide and radiation was determined. The genome was compared to the complete genome of B. pumilus SAFR-032, and the draft genomes of B. safensis JPL-MERTA-8-2 and the type strain B. pumilus ATCC7061 T . Additional comparisons were made to 61 draft genomes that have been mostly identified as strains of B. pumilus or B. safensis. The FO-36b gene order is essentially the same as that in SAFR-032 and other B. pumilus strains. The annotated genome has 3850 open reading frames and 40 noncoding RNAs and riboswitches. Of these, 307 are not shared by SAFR-032, and 65 are also not shared by MERTA and ATCC7061 T . The FO-36b genome has ten unique open reading frames and two phage-like regions, homologous to the Bacillus bacteriophage SPP1 and Brevibacillus phage Jimmer1. Differing remnants of the Jimmer1 phage are found in essentially all B. safensis / B. pumilus strains. Seven unique genes are part of these phage elements. Whole Genome Phylogenetic Analysis of the B. pumilus, B. safensis and other Firmicutes genomes, separate them into three distinct clusters. Two clusters are subgroups of B. pumilus while one houses all the B. safensis strains. The Genome-genome distance analysis and a phylogenetic analysis of gyrA sequences corroborated these results. It is not immediately obvious that the presence or absence of any specific gene or combination of genes is responsible for the variations in resistance seen. It is quite possible that distinctions in gene regulation can alter the expression levels of key proteins thereby changing the organism's resistance properties without gain or loss of a particular gene. What is clear is that phage elements contribute significantly to genome variability. Multiple genome comparison indicates that many strains named as B. pumilus likely belong to the B. safensis group.
Clustering change patterns using Fourier transformation with time-course gene expression data.

PubMed

Kim, Jaehee

2011-01-01

To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
Conversion of the high-yield salinomycin producer Streptomyces albus BK3-25 into a surrogate host for polyketide production.

PubMed

Zhang, Xiaojie; Lu, Chenyang; Bai, Linquan

2017-09-01

An ideal surrogate host for heterologous production of various natural products is expected to have efficient nutrient utilization, fast growth, abundant precursors and energy supply, and a pronounced gene expression. Streptomyces albus BK3-25 is a high-yield industrial strain producing type-I polyketide salinomycin, with a unique ability of bean oil utilization. Its potential of being a surrogate host for heterologous production of PKS was engineered and evaluated herein. Firstly, introduction of a three-gene cassette for the biosynthesis of ethylmalonyl-CoA resulted in accumulation of ethylmalonyl-CoA precursor and salinomycin, and subsequent deletion of the salinomycin biosynthetic gene cluster resulted in a host with rich supplies of common polyketide precursors, including malonyl-CoA, methylmalonyl-CoA, and ethylmalonyl-CoA. Secondly, the energy and reducing force were measured, and the improved accumulation of ATP and NADPH was observed in the mutant. Furthermore, the strength of a series of selected endogenous promoters based on microarray data was assessed at different growth phases, and a strong constitutive promoter was identified, providing a useful tool for further engineered gene expression. Finally, the potential of the BK3-25 derived host ZXJ-6 was evaluated with the introduction of the actinorhodin biosynthetic gene cluster from Streptomyces coelicolor, and the heterologous production of actinorhodin was obtained. This work clearly indicated the potential of the high-yield salinomycin producer as a surrogate host for heterologous production of polyketides, although more genetic manipulation should be conducted to streamline its performance.
Arrangement of the Clostridium baratii F7 Toxin Gene Cluster with Identification of a σ Factor That Recognizes the Botulinum Toxin Gene Cluster Promoters

DOE PAGES

Dover, Nir; Barash, Jason R.; Burke, Julianne N.; ...

2014-05-22

Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bontmore » gene that is part of a toxin gene cluster that includes several accessory genes. In this paper, we sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ 70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. Finally, this TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.« less
A method to identify differential expression profiles of time-course gene data with Fourier transformation

PubMed Central

2013-01-01

Background Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. Results This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Conclusions Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics. PMID:24134721
Variation in the fumonisin biosynthetic gene cluster in fumonisin-producing and nonproducing black aspergilli.

PubMed

Susca, Antonia; Proctor, Robert H; Butchko, Robert A E; Haidukowski, Miriam; Stea, Gaetano; Logrieco, Antonio; Moretti, Antonio

2014-12-01

The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack six fum genes, but nonproducing isolates of Aspergillus niger do not. In the current study, analyses of black aspergilli from grapes from the Mediterranean Basin indicate that the genomic context of the fum cluster is the same in isolates of A. niger and A. welwitschiae regardless of fumonisin-production ability and that full-length clusters occur in producing isolates of both species and nonproducing isolates of A. niger. In contrast, the cluster has undergone an eight-gene deletion in fumonisin-nonproducing isolates of A. welwitschiae. Phylogenetic analyses suggest each species consists of a mixed population of fumonisin-producing and nonproducing individuals, and that existence of both production phenotypes may provide a selective advantage to these species. Differences in gene content of fum cluster homologues and phylogenetic relationships of fum genes suggest that the mutation(s) responsible for the nonproduction phenotype differs, and therefore arose independently, in the two species. Partial fum cluster homologues were also identified in genome sequences of four other black Aspergillus species. Gene content of these partial clusters and phylogenetic relationships of fum sequences indicate that non-random partial deletion of the cluster has occurred multiple times among the species. This in turn suggests that an intact cluster and fumonisin production were once more widespread among black aspergilli. Copyright © 2014 Elsevier Inc. All rights reserved.
Differences in community composition of bacteria in four glaciers in western China

NASA Astrophysics Data System (ADS)

An, L. Z.; Chen, Y.; Xiang, S.-R.; Shang, T.-C.; Tian, L.-D.

2010-06-01

Microbial community patterns vary in glaciers worldwide, presenting unique responses to global climatic and environmental changes. Four bacterial clone libraries were established by 16S rRNA gene amplification from four ice layers along the 42-m-long ice core MuztB drilled from the Muztag Ata Glacier. A total of 151 bacterial sequences obtained from the ice core MuztB were phylogenetically compared with the 71 previously reported sequences from three ice cores extracted from ice caps Malan, Dunde, and Puruogangri. Six phylogenetic clusters Flavisolibacter, Flexibacter (Bacteroidetes), Acinetobacter, Enterobacter (Gammaproteobacteria), Planococcus/Anoxybacillus (Firmicutes), and Propionibacter/Luteococcus (Actinobacteria) frequently occurred along the Muztag Ata Glacier profile, and their proportion varied by seasons. Sequence analysis showed that most of the sequences from the ice core clustered with those from cold environments, and the sequence clusters from the same glacier more closely grouped together than those from the geographically isolated glaciers. Moreover, bacterial communities from the same location or similarly aged ice formed a cluster, and were clearly separate from those from other geographically isolated glaciers. In summary, the findings provide preliminary evidence of zonal distribution of microbial community, and suggest biogeography of microorganisms in glacier ice.
Differences in community composition of bacteria in four deep ice sheets in western China

NASA Astrophysics Data System (ADS)

An, L.; Chen, Y.; Xiang, S.-R.; Shang, T.-C.; Tian, L.-De

2010-02-01

Microbial community patterns vary in glaciers world wide, presenting unique responses to global climatic and environmental changes. Four bacterial clone libraries were established by 16S rRNA gene amplification from four ice layers along the 42-m-long ice core MuztB drilled from the Muztag Ata Glacier. A total of 152 bacterial sequences obtained from the ice core MuztB were phylogenetically compared with the 71 previously reported sequences from three ice cores extracted from ice caps Malan, Dunde, and Puruoganri. The six functional clusters Flavisolibacter, Flexibacter (Bacteroidetes), Acinetobacter, Enterobacter (Gammaproteobacteria), Planococcus/Anoxybacillus (Firmicutes), and Propionibacter/Luteococcus (Actinobacteria) frequently occurred along the Muztag Ata Glacier profile. Sequence analysis showed that most of the sequences from the ice core clustered with those from cold environments, and the sequences from the same glacier formed a distinct cluster. Moreover, bacterial communities from the same location or similarly aged ice formed a cluster, and were clearly separate from those from other geographically isolated glaciers. In a summary, the findings provide preliminary evidence of zone distribution of microbial community, support our hypothesis of the spatial and temporal biogeography of microorganisms in glacial ice.
Discovery of Gene Cluster for Mycosporine-Like Amino Acid Biosynthesis from Actinomycetales Microorganisms and Production of a Novel Mycosporine-Like Amino Acid by Heterologous Expression

PubMed Central

Miyamoto, Kiyoko T.; Komatsu, Mamoru

2014-01-01

Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. PMID:24907338
Discovery of gene cluster for mycosporine-like amino acid biosynthesis from Actinomycetales microorganisms and production of a novel mycosporine-like amino acid by heterologous expression.

PubMed

Miyamoto, Kiyoko T; Komatsu, Mamoru; Ikeda, Haruo

2014-08-01

Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Identification of Loci and Functional Characterization of Trichothecene Biosynthesis Genes in Filamentous Fungi of the Genus Trichoderma▿†

PubMed Central

Cardoza, R. E.; Malmierca, M. G.; Hermosa, M. R.; Alexander, N. J.; McCormick, S. P.; Proctor, R. H.; Tijerino, A. M.; Rumbero, A.; Monte, E.; Gutiérrez, S.

2011-01-01

Trichothecenes are mycotoxins produced by Trichoderma, Fusarium, and at least four other genera in the fungal order Hypocreales. Fusarium has a trichothecene biosynthetic gene (TRI) cluster that encodes transport and regulatory proteins as well as most enzymes required for the formation of the mycotoxins. However, little is known about trichothecene biosynthesis in the other genera. Here, we identify and characterize TRI gene orthologues (tri) in Trichoderma arundinaceum and Trichoderma brevicompactum. Our results indicate that both Trichoderma species have a tri cluster that consists of orthologues of seven genes present in the Fusarium TRI cluster. Organization of genes in the cluster is the same in the two Trichoderma species but differs from the organization in Fusarium. Sequence and functional analysis revealed that the gene (tri5) responsible for the first committed step in trichothecene biosynthesis is located outside the cluster in both Trichoderma species rather than inside the cluster as it is in Fusarium. Heterologous expression analysis revealed that two T. arundinaceum cluster genes (tri4 and tri11) differ in function from their Fusarium orthologues. The Tatri4-encoded enzyme catalyzes only three of the four oxygenation reactions catalyzed by the orthologous enzyme in Fusarium. The Tatri11-encoded enzyme catalyzes a completely different reaction (trichothecene C-4 hydroxylation) than the Fusarium orthologue (trichothecene C-15 hydroxylation). The results of this study indicate that although some characteristics of the tri/TRI cluster have been conserved during evolution of Trichoderma and Fusarium, the cluster has undergone marked changes, including gene loss and/or gain, gene rearrangement, and divergence of gene function. PMID:21642405
Circumpolar Genetic Structure and Recent Gene Flow of Polar Bears: A Reanalysis.

PubMed

Malenfant, René M; Davis, Corey S; Cullingham, Catherine I; Coltman, David W

2016-01-01

Recently, an extensive study of 2,748 polar bears (Ursus maritimus) from across their circumpolar range was published in PLOS ONE, which used microsatellites and mitochondrial haplotypes to apparently show altered population structure and a dramatic change in directional gene flow towards the Canadian Archipelago-an area believed to be a future refugium for polar bears as their southernmost habitats decline under climate change. Although this study represents a major international collaborative effort and promised to be a baseline for future genetics work, methodological shortcomings and errors of interpretation undermine some of the study's main conclusions. Here, we present a reanalysis of this data in which we address some of these issues, including: (1) highly unbalanced sample sizes and large amounts of systematically missing data; (2) incorrect calculation of FST and of significance levels; (3) misleading estimates of recent gene flow resulting from non-convergence of the program BayesAss. In contrast to the original findings, in our reanalysis we find six genetic clusters of polar bears worldwide: the Hudson Bay Complex, the Western and Eastern Canadian Arctic Archipelago, the Western and Eastern Polar Basin, and-importantly-we reconfirm the presence of a unique and possibly endangered cluster of bears in Norwegian Bay near Canada's expected last sea-ice refugium. Although polar bears' abundance, distribution, and population structure will certainly be negatively affected by ongoing-and increasingly rapid-loss of Arctic sea ice, these genetic data provide no evidence of strong directional gene flow in response to recent climate change.

Clustered Genes Involved in Cyclopiazonic Acid Production are Next to the Aflatoxin Biosynthesis Gene Cluster in Aspergillus flavus

USDA-ARS?s Scientific Manuscript database

Cyclopiazonic acid (CPA), an indole-tetramic acid toxin, is produced by many species of Aspergillus and Penicillium. In addition to CPA Aspergillus flavus produces polyketide-derived carcinogenic aflatoxins (AFs). AF biosynthesis genes form a gene cluster in a subtelomeric region. Isolates of A. fla...
Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

PubMed

Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

2014-01-01

Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.
Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

PubMed Central

Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

2014-01-01

Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417
Nitrogenase assembly

PubMed Central

Hu, Yilin; Ribbe, Markus W.

2013-01-01

Nitrogenase contains two unique metalloclusters: the P-cluster and the M-cluster. The assembly processes of P- and M-clusters are arguably the most complicated processes in bioinorganic chemistry. There is considerable interest in decoding the biosynthetic mechanisms of the P- and M-clusters, because these clusters are not only biologically important, but also chemically unprecedented. Understanding the assembly mechanisms of these unique metalloclusters is crucial for understanding the structure-function relationship of nitrogenase. Here, we review the recent advances in this research area, with an emphasis on our work that provide important insights into the biosynthetic pathways of these high-nuclearity metal centers. PMID:23232096
The complete genome sequence of Clostridium indolis DSM 755T

PubMed Central

Leschine, Susan; Huntemann, Marcel; Han, James; Chen, Amy; Kyrpides, Nikos; Markowitz, Victor; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Schaumberg, Andrew; Pati, Amrita; Stamatis, Dimitrios; Reddy, Tatiparthi; Lobos, Elizabeth; Goodwin, Lynne; Nordberg, Henrik P.; Cantor, Michael N.; Hua, Susan X.; Woyke, Tanja; Blanchard, Jeffrey L.

2014-01-01

Clostridium indolis DSM 755T is a bacterium commonly found in soils and the feces of birds and mammals. Despite its prevalence, little is known about the ecology or physiology of this species. However, close relatives, C. saccharolyticum and C. hathewayi, have demonstrated interesting metabolic potentials related to plant degradation and human health. The genome of C. indolis DSM 755T reveals an abundance of genes in functional groups associated with the transport and utilization of carbohydrates, as well as citrate, lactate, and aromatics. Ecologically relevant gene clusters related to nitrogen fixation and a unique type of bacterial microcompartment, the CoAT BMC, are also detected. Our genome analysis suggests hypotheses to be tested in future culture based work to better understand the physiology of this poorly described species. PMID:25197485
The complete genome sequence of Clostridium indolis DSM 755(T.).

PubMed

Biddle, Amy S; Leschine, Susan; Huntemann, Marcel; Han, James; Chen, Amy; Kyrpides, Nikos; Markowitz, Victor; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Schaumberg, Andrew; Pati, Amrita; Stamatis, Dimitrios; Reddy, Tatiparthi; Lobos, Elizabeth; Goodwin, Lynne; Nordberg, Henrik P; Cantor, Michael N; Hua, Susan X; Woyke, Tanja; Blanchard, Jeffrey L

2014-06-15

Clostridium indolis DSM 755(T) is a bacterium commonly found in soils and the feces of birds and mammals. Despite its prevalence, little is known about the ecology or physiology of this species. However, close relatives, C. saccharolyticum and C. hathewayi, have demonstrated interesting metabolic potentials related to plant degradation and human health. The genome of C. indolis DSM 755(T) reveals an abundance of genes in functional groups associated with the transport and utilization of carbohydrates, as well as citrate, lactate, and aromatics. Ecologically relevant gene clusters related to nitrogen fixation and a unique type of bacterial microcompartment, the CoAT BMC, are also detected. Our genome analysis suggests hypotheses to be tested in future culture based work to better understand the physiology of this poorly described species.
Novel genomic rearrangements mediated by multiple genetic elements in Streptococcus pyogenes M23ND confer potential for evolutionary persistence

PubMed Central

Bao, Yun-Juan; Liang, Zhong; Mayfield, Jeffrey A.; McShan, William M.; Lee, Shaun W.; Ploplis, Victoria A.; Castellino, Francis J.

2016-01-01

Symmetric genomic rearrangements around replication axes in genomes are commonly observed in prokaryotic genomes, including Group A Streptococcus (GAS). However, asymmetric rearrangements are rare. Our previous studies showed that the hypervirulent invasive GAS strain, M23ND, containing an inactivated transcriptional regulator system, covRS, exhibits unique extensive asymmetric rearrangements, which reconstructed a genomic structure distinct from other GAS genomes. In the current investigation, we identified the rearrangement events and examined the genetic consequences and evolutionary implications underlying the rearrangements. By comparison with a close phylogenetic relative, M18-MGAS8232, we propose a molecular model wherein a series of asymmetric rearrangements have occurred in M23ND, involving translocations, inversions and integrations mediated by multiple factors, viz., rRNA-comX (factor for late competence), transposons and phage-encoded gene segments. Assessments of the cumulative gene orientations and GC skews reveal that the asymmetric genomic rearrangements did not affect the general genomic integrity of the organism. However, functional distributions reveal re-clustering of a broad set of CovRS-regulated actively transcribed genes, including virulence factors and metabolic genes, to the same leading strand, with high confidence (p-value ~10−10). The re-clustering of the genes suggests a potential selection advantage for the spatial proximity to the transcription complexes, which may contain the global transcriptional regulator, CovRS, and other RNA polymerases. Their proximities allow for efficient transcription of the genes required for growth, virulence and persistence. A new paradigm of survival strategies of GAS strains is provided through multiple genomic rearrangements, while, at the same time, maintaining genomic integrity. PMID:27329479
Hox gene cluster of the ascidian, Halocynthia roretzi, reveals multiple ancient steps of cluster disintegration during ascidian evolution.

PubMed

Sekigami, Yuka; Kobayashi, Takuya; Omi, Ai; Nishitsuji, Koki; Ikuta, Tetsuro; Fujiyama, Asao; Satoh, Noriyuki; Saiga, Hidetoshi

2017-01-01

Hox gene clusters with at least 13 paralog group (PG) members are common in vertebrate genomes and in that of amphioxus. Ascidians, which belong to the subphylum Tunicata (Urochordata), are phylogenetically positioned between vertebrates and amphioxus, and traditionally divided into two groups: the Pleurogona and the Enterogona. An enterogonan ascidian, Ciona intestinalis ( Ci ), possesses nine Hox genes localized on two chromosomes; thus, the Hox gene cluster is disintegrated. We investigated the Hox gene cluster of a pleurogonan ascidian, Halocynthia roretzi ( Hr ) to investigate whether Hox gene cluster disintegration is common among ascidians, and if so, how such disintegration occurred during ascidian or tunicate evolution. Our phylogenetic analysis reveals that the Hr Hox gene complement comprises nine members, including one with a relatively divergent Hox homeodomain sequence. Eight of nine Hr Hox genes were orthologous to Ci-Hox1 , 2, 3, 4, 5, 10, 12 and 13. Following the phylogenetic classification into 13 PGs, we designated Hr Hox genes as Hox1, 2, 3, 4, 5, 10, 11/12/13.a , 11/12/13.b and HoxX . To address the chromosomal arrangement of the nine Hox genes, we performed two-color chromosomal fluorescent in situ hybridization, which revealed that the nine Hox genes are localized on a single chromosome in Hr , distinct from their arrangement in Ci . We further examined the order of the nine Hox genes on the chromosome by chromosome/scaffold walking. This analysis suggested a gene order of Hox1 , 11/12/13.b, 11/12/13.a, 10, 5, X, followed by either Hox4, 3, 2 or Hox2, 3, 4 on the chromosome. Based on the present results and those previously reported in Ci , we discuss the establishment of the Hox gene complement and disintegration of Hox gene clusters during the course of ascidian or tunicate evolution. The Hox gene cluster and the genome must have experienced extensive reorganization during the course of evolution from the ancestral tunicate to Hr and Ci . Nevertheless, some features are shared in Hox gene components and gene arrangement on the chromosomes, suggesting that Hox gene cluster disintegration in ascidians involved early events common to tunicates as well as later ascidian lineage-specific events.
Multi-Dimensional Prioritization of Dental Caries Candidate Genes and Its Enriched Dense Network Modules

PubMed Central

Wang, Quan; Jia, Peilin; Cuenco, Karen T.; Feingold, Eleanor; Marazita, Mary L.; Wang, Lily; Zhao, Zhongming

2013-01-01

A number of genetic studies have suggested numerous susceptibility genes for dental caries over the past decade with few definite conclusions. The rapid accumulation of relevant information, along with the complex architecture of the disease, provides a challenging but also unique opportunity to review and integrate the heterogeneous data for follow-up validation and exploration. In this study, we collected and curated candidate genes from four major categories: association studies, linkage scans, gene expression analyses, and literature mining. Candidate genes were prioritized according to the magnitude of evidence related to dental caries. We then searched for dense modules enriched with the prioritized candidate genes through their protein-protein interactions (PPIs). We identified 23 modules comprising of 53 genes. Functional analyses of these 53 genes revealed three major clusters: cytokine network relevant genes, matrix metalloproteinases (MMPs) family, and transforming growth factor-beta (TGF-β) family, all of which have been previously implicated to play important roles in tooth development and carious lesions. Through our extensive data collection and an integrative application of gene prioritization and PPI network analyses, we built a dental caries-specific sub-network for the first time. Our study provided insights into the molecular mechanisms underlying dental caries. The framework we proposed in this work can be applied to other complex diseases. PMID:24146904
Comparison of expression of secondary metabolite biosynthesis cluster genes in Aspergillus flavus, A. parasiticus, and A. oryzae.

PubMed

Ehrlich, Kenneth C; Mack, Brian M

2014-06-23

Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity.
Comparison of Expression of Secondary Metabolite Biosynthesis Cluster Genes in Aspergillus flavus, A. parasiticus, and A. oryzae

PubMed Central

Ehrlich, Kenneth C.; Mack, Brian M.

2014-01-01

Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity. PMID:24960201
Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera) Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

PubMed Central

2010-01-01

Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS) were predicted by in silico analysis of the grapevine (Vitis vinifera) genome assembly [1]. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information about gene structure and phylogeny for the entire currently known VvTPS gene family. PMID:20964856
Gene Cluster Encoding Cholate Catabolism in Rhodococcus spp.

PubMed Central

Wilbrink, Maarten H.; Casabon, Israël; Stewart, Gordon R.; Liu, Jie; van der Geize, Robert; Eltis, Lindsay D.

2012-01-01

Bile acids are highly abundant steroids with important functions in vertebrate digestion. Their catabolism by bacteria is an important component of the carbon cycle, contributes to gut ecology, and has potential commercial applications. We found that Rhodococcus jostii RHA1 grows well on cholate, as well as on its conjugates, taurocholate and glycocholate. The transcriptome of RHA1 growing on cholate revealed 39 genes upregulated on cholate, occurring in a single gene cluster. Reverse transcriptase quantitative PCR confirmed that selected genes in the cluster were upregulated 10-fold on cholate versus on cholesterol. One of these genes, kshA3, encoding a putative 3-ketosteroid-9α-hydroxylase, was deleted and found essential for growth on cholate. Two coenzyme A (CoA) synthetases encoded in the cluster, CasG and CasI, were heterologously expressed. CasG was shown to transform cholate to cholyl-CoA, thus initiating side chain degradation. CasI was shown to form CoA derivatives of steroids with isopropanoyl side chains, likely occurring as degradation intermediates. Orthologous gene clusters were identified in all available Rhodococcus genomes, as well as that of Thermomonospora curvata. Moreover, Rhodococcus equi 103S, Rhodococcus ruber Chol-4 and Rhodococcus erythropolis SQ1 each grew on cholate. In contrast, several mycolic acid bacteria lacking the gene cluster were unable to grow on cholate. Our results demonstrate that the above-mentioned gene cluster encodes cholate catabolism and is distinct from a more widely occurring gene cluster encoding cholesterol catabolism. PMID:23024343
Comparative genomic analysis of six new-found integrative conjugative elements (ICEs) in Vibrio alginolyticus.

PubMed

Luo, Peng; He, Xiangyan; Wang, Yanhong; Liu, Qiuting; Hu, Chaoqun

2016-05-04

Vibrio alginolyticus is ubiquitous in marine and estuarine environments. In 2012-2013, SXT/R391-like integrative conjugative elements (ICEs) in environmental V. alginolyticus strains were discovered and found to occur in 8.9 % of 192 V. alginolyticus strains, which suggests that V. alginolyticus may be a natural pool possessing resourceful ICEs. However, complete ICE sequences originating from this bacterium have not been reported, which represents a significant barrier to characterizing the ICEs of this bacterium and exploring their relationships with other ICEs. In the present study, we acquired six ICE sequences from five V. alginolyticus strains and performed a comparative analysis of these ICE genomes. A sequence analysis showed that there were only 14 variable bases dispersed between ICEValE0601 and ICEValHN492. ICEValE0601 and ICEValHN492 were treated as the same ICE. ICEValA056-1, ICEValE0601 and ICEValHN492 integrate into the 5' end of the host's prfC gene, and their Int and Xis share at least 97 % identity with their counterparts from SXT. ICEValE0601 or ICEValHN492 contain 50 of 52 conserved core genes in the SXT/R391 ICEs (not s025 or s026). ICEValA056-2, ICEValHN396 and ICEValHN437 have a different tRNA-ser integration site and a distinct int/xis module; however, the remaining backbone genes are highly similar to their counterparts in SXT/R391 ICEs. DNA sequences inserted into hotspot and variable regions of the ICEs are of various sizes. The variable genes of six ICEs encode a large array of functions to bestow various adaptive abilities upon their hosts, and only ICEValA056-1 contains drug-resistant genes. Many variable genes have orthologous and functionally related genes to those found in SXT/R391 ICEs, such as genes coding for a toxin-antitoxin system, a restriction-modification system, helicases and endonucleases. Six ICEs also contain a large number of unique genes or gene clusters that were not found in other ICEs. Six ICEs harbor more abundant transposase genes compared with other parts of their host genomes. A phylogenetic analysis indicated that transposase genes in these ICEs are highly diverse. ICEValA056-1, ICEValE0601 and ICEValHN492 are typical members of the SXT/R391 family. ICEValA056-2, ICEValHN396 and ICEValHN437 form a new atypical group belonging to the SXT/R391 family. In addition to the many genes found to be present in other ICEs, six ICEs contain a large number of unique genes or gene clusters that were not found in other ICEs. ICEs may serve as a carrier for transposable genetic elements (TEs) and largely facilitate the dissemination of TEs.
Molecular epidemiology and clinical characteristics of drug-resistant Mycobacterium tuberculosis in a tuberculosis referral hospital in China.

PubMed

Wang, Qi; Lau, Susanna K P; Liu, Fei; Zhao, Yanlin; Li, Hong Min; Li, Bing Xi; Hu, Yong Liang; Woo, Patrick C Y; Liu, Cui Hua

2014-01-01

Despite the large number of drug-resistant tuberculosis (TB) cases in China, few studies have comprehensively analyzed the drug resistance-associated gene mutations and genotypes in relation to the clinical characteristics of M. tuberculosis (Mtb) isolates. We thus analyzed the phenotypic and genotypic drug resistance profiles of 115 Mtb clinical isolates recovered from a tuberculosis referral hospital in Beijing, China. We also performed genotyping by 28 loci MIRU-VNTR analysis. Socio-demographic and clinical data were retrieved from medical records and analyzed. In total, 78 types of mutations (including 42 previously reported and 36 newly identified ones) were identified in 115 Mtb clinical isolates. There was significant correlation between phenotypic and genotypic drug resistance rates for first-line anti-TB drugs (P<0.001). Genotyping revealed 101 MIRU-VNTR types, with 20 isolates (17.4%) being clustered and 95 isolates (82.6%) having unique genotypes. Higher proportion of re-treatment cases was observed among patients with clustered isolates than those with unique MIRU-VNTR genotypes (75.0% vs. 41.1%). Moreover, clinical epidemiological links were identified among patients infected by Mtb strains belonging to the same clusters, suggesting a potential of transmission among patients. Our study provided information on novel potential drug resistance-associated mutations in Mtb. In addition, the genotyping data from our study suggested that enforcement of the implementation of genotyping in diagnostic routines would provide important information for better monitor and control of TB transmission.
Bacterial CRISPR Regions: General Features and their Potential for Epidemiological Molecular Typing Studies

PubMed Central

Karimi, Zahra; Ahmadi, Ali; Najafi, Ali; Ranjbar, Reza

2018-01-01

Introduction: CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci as novel and applicable regions in prokaryotic genomes have gained great attraction in the post genomics era. Methods: These unique regions are diverse in number and sequence composition in different pathogenic bacteria and thereby can be a suitable candidate for molecular epidemiology and genotyping studies. Results:Furthermore, the arrayed structure of CRISPR loci (several unique repeats spaced with the variable sequence) and associated cas genes act as an active prokaryotic immune system against viral replication and conjugative elements. This property can be used as a tool for RNA editing in bioengineering studies. Conclusion: The aim of this review was to survey some details about the history, nature, and potential applications of CRISPR arrays in both genetic engineering and bacterial genotyping studies. PMID:29755603
Evolution of homeobox genes.

PubMed

Holland, Peter W H

2013-01-01

Many homeobox genes encode transcription factors with regulatory roles in animal and plant development. Homeobox genes are found in almost all eukaryotes, and have diversified into 11 gene classes and over 100 gene families in animal evolution, and 10 to 14 gene classes in plants. The largest group in animals is the ANTP class which includes the well-known Hox genes, plus other genes implicated in development including ParaHox (Cdx, Xlox, Gsx), Evx, Dlx, En, NK4, NK3, Msx, and Nanog. Genomic data suggest that the ANTP class diversified by extensive tandem duplication to generate a large array of genes, including an NK gene cluster and a hypothetical ProtoHox gene cluster that duplicated to generate Hox and ParaHox genes. Expression and functional data suggest that NK, Hox, and ParaHox gene clusters acquired distinct roles in patterning the mesoderm, nervous system, and gut. The PRD class is also diverse and includes Pax2/5/8, Pax3/7, Pax4/6, Gsc, Hesx, Otx, Otp, and Pitx genes. PRD genes are not generally arranged in ancient genomic clusters, although the Dux, Obox, and Rhox gene clusters arose in mammalian evolution as did several non-clustered PRD genes. Tandem duplication and genome duplication expanded the number of homeobox genes, possibly contributing to the evolution of developmental complexity, but homeobox gene loss must not be ignored. Evolutionary changes to homeobox gene expression have also been documented, including Hox gene expression patterns shifting in concert with segmental diversification in vertebrates and crustaceans, and deletion of a Pitx1 gene enhancer in pelvic-reduced sticklebacks. WIREs Dev Biol 2013, 2:31-45. doi: 10.1002/wdev.78 For further resources related to this article, please visit the WIREs website. The author declares that he has no conflicts of interest. Copyright © 2012 Wiley Periodicals, Inc.
Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis

PubMed Central

Koh, Esther G. L.; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V.; Brenner, Sydney; Venkatesh, Byrappa

2003-01-01

The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes. PMID:12547909
Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis.

PubMed

Koh, Esther G L; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V; Brenner, Sydney; Venkatesh, Byrappa

2003-02-04

The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes.
Characterization of a Major Cluster of nif, fix, and Associated Genes in a Sugarcane Endophyte, Acetobacter diazotrophicus

PubMed Central

Lee, Sunhee; Reth, Alexander; Meletzus, Dietmar; Sevilla, Myrna; Kennedy, Christina

2000-01-01

A major 30.5-kb cluster of nif and associated genes of Acetobacter diazotrophicus (syn. Gluconacetobacter diazotrophicus), a nitrogen-fixing endophyte of sugarcane, was sequenced and analyzed. This cluster represents the largest assembly of contiguous nif-fix and associated genes so far characterized in any diazotrophic bacterial species. Northern blots and promoter sequence analysis indicated that the genes are organized into eight transcriptional units. The overall arrangement of genes is most like that of the nif-fix cluster in Azospirillum brasilense, while the individual gene products are more similar to those in species of Rhizobiaceae or in Rhodobacter capsulatus. PMID:11092875

Distribution of Suicin Gene Clusters in Streptococcus suis Serotype 2 Belonging to Sequence Types 25 and 28.

PubMed

Athey, Taryn B T; Vaillancourt, Katy; Frenette, Michel; Fittipaldi, Nahuel; Gottschalk, Marcelo; Grenier, Daniel

2016-01-01

Recently, we reported the purification and characterization of three distinct lantibiotics (named suicin 90-1330, suicin 3908, and suicin 65) produced by Streptococcus suis . In this study, we investigated the distribution of the three suicin lantibiotic gene clusters among serotype 2 S. suis strains belonging to sequence type (ST) 25 and ST28, the two dominant STs identified in North America. The genomes of 102 strains were interrogated for the presence of suicin gene clusters encoding suicins 90-1330, 3908, and 65. The gene cluster encoding suicin 65 was the most prevalent and mainly found among ST25 strains. In contrast, none of the genes related to suicin 90-1330 production were identified in 51 ST25 strains nor in 35/51 ST28 strains. However, the complete suicin 90-1330 gene cluster was found in ten ST28 strains, although some genes in the cluster were truncated in three of these isolates. The vast majority (101/102) of S. suis strains did not possess any of the genes encoding suicin 3908. In conclusion, this study indicates heterogeneous distribution of suicin genes in S. suis .
Entomologic and molecular investigation into Plasmodium vivax transmission in Singapore, 2009.

PubMed

Ng, Lee-Ching; Lee, Kim-Sung; Tan, Cheong-Huat; Ooi, Peng-Lim; Lam-Phua, Sai-Gek; Lin, Raymond; Pang, Sook-Cheng; Lai, Yee-Ling; Solhan, Suhana; Chan, Pei-Pei; Wong, Kit-Yin; Ho, Swee-Tuan; Vythilingam, Indra

2010-10-29

Singapore has been certified malaria free since November 1982 by the World Health Organization and despite occasional local transmission, the country has maintained the standing. In 2009, three clusters of malaria cases were reported in Singapore. Epidemiological, entomological and molecular studies were carried out to investigate the three clusters, namely Mandai-Sungei Kadut, Jurong Island and Sembawang. A total of 29 malaria patients, with no recent travel history, were reported in the three clusters. Molecular analysis based on the msp3α and msp1 genes showed two independent local transmissions: one in Mandai-Sungei Kadut and another in Sembawang. Almost all cases within each cluster were epidemiologically linked. In Jurong Island cluster, epidemiological link remains uncertain, as almost all cases had a unique genetic profile. Only two cases shared a common profile and were found to be linked to the Mandai-Sungei Kadut cluster. Entomological investigation found Anopheles sinensis to be the predominant Anopheline in the two areas where local transmission of P. vivax was confirmed. Anopheles sinensis was found to be attracted to human bait and bites as early as 19:45 hrs. However, all Anopheles mosquitoes caught were negative for sporozoites and oocysts by dissection. Investigation of P. vivax cases from the three cluster areas confirmed the occurrence of local transmission in two areas. Although An. sinensis was the predominant Anopheline found in areas with confirmed transmission, the vector/s responsible for the outbreaks still remains cryptic.
MUSE spectroscopy and deep observations of a unique compact JWST target, lensing cluster CLIO

NASA Astrophysics Data System (ADS)

Griffiths, Alex; Conselice, Christopher J.; Alpaslan, Mehmet; Frye, Brenda L.; Diego, Jose M.; Zitrin, Adi; Yan, Haojing; Ma, Zhiyuan; Barone-Nugent, Robert; Bhatawdekar, Rachana; Driver, Simon P.; Robotham, Aaron S. G.; Windhorst, Rogier A.; Wyithe, J. Stuart B.

2018-04-01

We present the results of a VLT MUSE/FORS2 and Spitzer survey of a unique compact lensing cluster CLIO at z = 0.42, discovered through the GAMA survey using spectroscopic redshifts. Compact and massive clusters such as this are understudied, but provide a unique prospective on dark matter distributions and for finding background lensed high-z galaxies. The CLIO cluster was identified for follow-up observations due to its almost unique combination of high-mass and dark matter halo concentration, as well as having observed lensing arcs from ground-based images. Using dual band optical and infra-red imaging from FORS2 and Spitzer, in combination with MUSE optical spectroscopy we identify 89 cluster members and find background sources out to z = 6.49. We describe the physical state of this cluster, finding a strong correlation between environment and galaxy spectral type. Under the assumption of an NFW profile, we measure the total mass of CLIO to be M200 = (4.49 ± 0.25) × 1014 M⊙. We build and present an initial strong-lensing model for this cluster, and measure a relatively low intracluster light (ICL) fraction of 7.21 ± 1.53 per cent through galaxy profile fitting. Due to its strong potential for lensing background galaxies and its low ICL, the CLIO cluster will be a target for our 110 h James Webb Space Telescope `Webb Medium-Deep Field' (WMDF) GTO program.
A cross-species bi-clustering approach to identifying conserved co-regulated genes.

PubMed

Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

2016-06-15

A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.
The evolutionary life cycle of the polysaccharide biosynthetic gene cluster based on the Sphingomonadaceae.

PubMed

Wu, Mengmeng; Huang, Haidong; Li, Guoqiang; Ren, Yi; Shi, Zhong; Li, Xiaoyan; Dai, Xiaohui; Gao, Ge; Ren, Mengnan; Ma, Ting

2017-04-21

Although clustering of genes from the same metabolic pathway is a widespread phenomenon, the evolution of the polysaccharide biosynthetic gene cluster remains poorly understood. To determine the evolution of this pathway, we identified a scattered production pathway of the polysaccharide sanxan by Sphingomonas sanxanigenens NX02, and compared the distribution of genes between sphingan-producing and other Sphingomonadaceae strains. This allowed us to determine how the scattered sanxan pathway developed, and how the polysaccharide gene cluster evolved. Our findings suggested that the evolution of microbial polysaccharide biosynthesis gene clusters is a lengthy cyclic process comprising cluster 1 → scatter → cluster 2. The sanxan biosynthetic pathway proved the existence of a dispersive process. We also report the complete genome sequence of NX02, in which we identified many unstable genetic elements and powerful secretion systems. Furthermore, nine enzymes for the formation of activated precursors, four glycosyltransferases, four acyltransferases, and four polymerization and export proteins were identified. These genes were scattered in the NX02 genome, and the positive regulator SpnA of sphingans synthesis could not regulate sanxan production. Finally, we concluded that the evolution of the sanxan pathway was independent. NX02 evolved naturally as a polysaccharide producing strain over a long-time evolution involving gene acquisitions and adaptive mutations.
Chamber Specific Gene Expression Landscape of the Zebrafish Heart

PubMed Central

Singh, Angom Ramcharan; Sivadas, Ambily; Sabharwal, Ankit; Vellarikal, Shamsudheen Karuthedath; Jayarajan, Rijith; Verma, Ankit; Kapoor, Shruti; Joshi, Adita; Scaria, Vinod; Sivasubbu, Sridhar

2016-01-01

The organization of structure and function of cardiac chambers in vertebrates is defined by chamber-specific distinct gene expression. This peculiarity and uniqueness of the genetic signatures demonstrates functional resolution attributed to the different chambers of the heart. Altered expression of the cardiac chamber genes can lead to individual chamber related dysfunctions and disease patho-physiologies. Information on transcriptional repertoire of cardiac compartments is important to understand the spectrum of chamber specific anomalies. We have carried out a genome wide transcriptome profiling study of the three cardiac chambers in the zebrafish heart using RNA sequencing. We have captured the gene expression patterns of 13,396 protein coding genes in the three cardiac chambers—atrium, ventricle and bulbus arteriosus. Of these, 7,260 known protein coding genes are highly expressed (≥10 FPKM) in the zebrafish heart. Thus, this study represents nearly an all-inclusive information on the zebrafish cardiac transcriptome. In this study, a total of 96 differentially expressed genes across the three cardiac chambers in zebrafish were identified. The atrium, ventricle and bulbus arteriosus displayed 20, 32 and 44 uniquely expressing genes respectively. We validated the expression of predicted chamber-restricted genes using independent semi-quantitative and qualitative experimental techniques. In addition, we identified 23 putative novel protein coding genes that are specifically restricted to the ventricle and not in the atrium or bulbus arteriosus. In our knowledge, these 23 novel genes have either not been investigated in detail or are sparsely studied. The transcriptome identified in this study includes 68 differentially expressing zebrafish cardiac chamber genes that have a human ortholog. We also carried out spatiotemporal gene expression profiling of the 96 differentially expressed genes throughout the three cardiac chambers in 11 developmental stages and 6 tissue types of zebrafish. We hypothesize that clustering the differentially expressed genes with both known and unknown functions will deliver detailed insights on fundamental gene networks that are important for the development and specification of the cardiac chambers. It is also postulated that this transcriptome atlas will help utilize zebrafish in a better way as a model for studying cardiac development and to explore functional role of gene networks in cardiac disease pathogenesis. PMID:26815362
Hox cluster polarity in early transcriptional availability: a high order regulatory level of clustered Hox genes in the mouse.

PubMed

Roelen, Bernard A J; de Graaff, Wim; Forlani, Sylvie; Deschamps, Jacqueline

2002-11-01

The molecular mechanism underlying the 3' to 5' polarity of induction of mouse Hox genes is still elusive. While relief from a cluster-encompassing repression was shown to lead to all Hoxd genes being expressed like the 3'most of them, Hoxd1 (Kondo and Duboule, 1999), the molecular basis of initial activation of this 3'most gene, is not understood yet. We show that, already before primitive streak formation, prior to initial expression of the first Hox gene, a dramatic transcriptional stimulation of the 3'most genes, Hoxb1 and Hoxb2, is observed upon a short pulse of exogenous retinoic acid (RA), whereas it is not in the case for more 5', cluster-internal, RA-responsive Hoxb genes. In contrast, the RA-responding Hoxb1lacZ transgene that faithfully mimics the endogenous gene (Marshall et al., 1994) did not exhibit the sensitivity of Hoxb1 to precocious activation. We conclude that polarity in initial activation of Hoxb genes reflects a greater availability of 3'Hox genes for transcription, suggesting a pre-existing (susceptibility to) opening of the chromatin structure at the 3' extremity of the cluster. We discuss the data in the context of prevailing models involving differential chromatin opening in the directionality of clustered Hox gene transcription, and regarding the importance of the cluster context for correct timing of initial Hox gene expression.Interestingly, Cdx1 manifested the same early transcriptional availability as Hoxb1. Copyright 2002 Elsevier Science Ireland Ltd.
Coral comparative genomics reveal expanded Hox cluster in the cnidarian-bilaterian ancestor.

PubMed

DuBuc, Timothy Q; Ryan, Joseph F; Shinzato, Chuya; Satoh, Nori; Martindale, Mark Q

2012-12-01

The key developmental role of the Hox cluster of genes was established prior to the last common ancestor of protostomes and deuterostomes and the subsequent evolution of this cluster has played a major role in the morphological diversity exhibited in extant bilaterians. Despite 20 years of research into cnidarian Hox genes, the nature of the cnidarian-bilaterian ancestral Hox cluster remains unclear. In an attempt to further elucidate this critical phylogenetic node, we have characterized the Hox cluster of the recently sequenced Acropora digitifera genome. The A. digitifera genome contains two anterior Hox genes (PG1 and PG2) linked to an Eve homeobox gene and an Anthox1A gene, which is thought to be either a posterior or posterior/central Hox gene. These data show that the Hox cluster of the cnidarian-bilaterian ancestor was more extensive than previously thought. The results are congruent with the existence of an ancient set of constraints on the Hox cluster and reinforce the importance of incorporating a wide range of animal species to reconstruct critical ancestral nodes.
Broad spectrum antibiotic compounds and use thereof

DOEpatents

Koglin, Alexander; Strieker, Matthias

2016-07-05

The discovery of a non-ribosomal peptide synthetase (NRPS) gene cluster in the genome of Clostridium thermocellum that produces a secondary metabolite that is assembled outside of the host membrane is described. Also described is the identification of homologous NRPS gene clusters from several additional microorganisms. The secondary metabolites produced by the NRPS gene clusters exhibit broad spectrum antibiotic activity. Thus, antibiotic compounds produced by the NRPS gene clusters, and analogs thereof, their use for inhibiting bacterial growth, and methods of making the antibiotic compounds are described.
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

PubMed

Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

2011-05-26

Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features.

PubMed

Haakensen, Vilde D; Lingjaerde, Ole Christian; Lüders, Torben; Riis, Margit; Prat, Aleix; Troester, Melissa A; Holmen, Marit M; Frantzen, Jan Ole; Romundstad, Linda; Navjord, Dina; Bukholm, Ida K; Johannesen, Tom B; Perou, Charles M; Ursin, Giske; Kristensen, Vessela N; Børresen-Dale, Anne-Lise; Helland, Aslaug

2011-11-01

Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer.
Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications

PubMed Central

Siegel, Nicol; Hoegg, Simone; Salzburger, Walter; Braasch, Ingo; Meyer, Axel

2007-01-01

Background The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. Results We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. Conclusion There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular – but possibly clusters of genes more generally – might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters. PMID:17822543
Clustering Algorithms: Their Application to Gene Expression Data

PubMed Central

Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

2016-01-01

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
Genomic analyses of bacterial porin-cytochrome gene clusters

DOE PAGES

Shi, Liang; Fredrickson, James K.; Zachara, John M.

2014-11-26

In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteriamore » from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III) and Mn(IV) oxides.« less
Activation and comparative analysis of cryptic xiamycin gene cluster from marine-derived Streptomyces sp. FXJ 7.388.

PubMed

Uhong Lü, Yuhong; Liu, Xiaoli; Wang, Miao; Li, Yuanyuan; Liu, Ning; Bao, Yuxin; Liu, Minghao; Li, Xiaoqian; Wang, Yinyin; Qian, Shenyan; Yue, Changwu; Huang, Ying

2016-09-01

In order to obtain the natural products synthesized by the three putative xiamycin biosynthesis gene clusters which were predicted via antiSMASH during the genome mining of marine Streptomyces sp. FXJ 7.388, Streptomyces sp. FXJ 8.012, and Streptomyces olivaceus FXJ 7.023. Sixteen genes involved in xiamycin assembly, modification, and regulation with higher identity than the newest reported xiamycin biosynthetic gene cluster from marine Streptomyces sp. SCSIO 02999, Streptomyces sp. HKI0576, and Streptomyces sp. FXJ 7.388 were discovered via gene cluster comparative analysis. A ribosome engineering strategy was adopted to activate such cryptic gene clusters with different final concentrations antibiotics that act on the ribosome, and two indolosesquiterpenes were isolated from idlethaldose streptomycin-resistant Streptomyces sp. FXJ 7.388 strains. However, no such product was detected in Streptomyces sp. FXJ 8.012 and Streptomyces olivaceus FXJ 7.023 under the same treatment. This result suggested that these genes might hold the least gene content for xiamycin biosynthesis.
Genes encoding cuticular proteins are components of the Nimrod gene cluster in Drosophila.

PubMed

Cinege, Gyöngyi; Zsámboki, János; Vidal-Quadras, Maite; Uv, Anne; Csordás, Gábor; Honti, Viktor; Gábor, Erika; Hegedűs, Zoltán; Varga, Gergely I B; Kovács, Attila L; Juhász, Gábor; Williams, Michael J; Andó, István; Kurucz, Éva

2017-08-01

The Nimrod gene cluster, located on the second chromosome of Drosophila melanogaster, is the largest synthenic unit of the Drosophila genome. Nimrod genes show blood cell specific expression and code for phagocytosis receptors that play a major role in fruit fly innate immune functions. We previously identified three homologous genes (vajk-1, vajk-2 and vajk-3) located within the Nimrod cluster, which are unrelated to the Nimrod genes, but are homologous to a fourth gene (vajk-4) located outside the cluster. Here we show that, unlike the Nimrod candidates, the Vajk proteins are expressed in cuticular structures of the late embryo and the late pupa, indicating that they contribute to cuticular barrier functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
A natural plasmid uniquely encodes two biosynthetic pathways creating a potent anti-MRSA antibiotic.

PubMed

Fukuda, Daisuke; Haines, Anthony S; Song, Zhongshu; Murphy, Annabel C; Hothersall, Joanne; Stephens, Elton R; Gurney, Rachel; Cox, Russell J; Crosby, John; Willis, Christine L; Simpson, Thomas J; Thomas, Christopher M

2011-03-31

Understanding how complex antibiotics are synthesised by their producer bacteria is essential for creation of new families of bioactive compounds. Thiomarinols, produced by marine bacteria belonging to the genus Pseudoalteromonas, are hybrids of two independently active species: the pseudomonic acid mixture, mupirocin, which is used clinically against MRSA, and the pyrrothine core of holomycin. High throughput DNA sequencing of the complete genome of the producer bacterium revealed a novel 97 kb plasmid, pTML1, consisting almost entirely of two distinct gene clusters. Targeted gene knockouts confirmed the role of these clusters in biosynthesis of the two separate components, pseudomonic acid and the pyrrothine, and identified a putative amide synthetase that joins them together. Feeding mupirocin to a mutant unable to make the endogenous pseudomonic acid created a novel hybrid with the pyrrothine via "mutasynthesis" that allows inhibition of mupirocin-resistant isoleucyl-tRNA synthetase, the mupirocin target. A mutant defective in pyrrothine biosynthesis was also able to incorporate alternative amine substrates. Plasmid pTML1 provides a paradigm for combining independent antibiotic biosynthetic pathways or using mutasynthesis to develop a new family of hybrid derivatives that may extend the effective use of mupirocin against MRSA.
Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM).

PubMed

Skinnider, Michael A; Dejong, Chris A; Rees, Philip N; Johnston, Chad W; Li, Haoxin; Webster, Andrew L H; Wyatt, Morgan A; Magarvey, Nathan A

2015-11-16

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
A highly efficient targeted recombination system for engineering linear chromosomes of industrial bacteria Streptomyces.

PubMed

Pan, Hung-Yin; Chen, Carton W; Huang, Chih-Hung

2018-04-17

Soil bacteria Streptomyces are the most important producers of secondary metabolites, including most known antibiotics. These bacteria and their close relatives are unique in possessing linear chromosomes, which typically harbor 20 to 30 biosynthetic gene clusters of tens to hundreds of kb in length. Many Streptomyces chromosomes are accompanied by linear plasmids with sizes ranging from several to several hundred kb. The large linear plasmids also often contain biosynthetic gene clusters. We have developed a targeted recombination procedure for arm exchanges between a linear plasmid and a linear chromosome. A chromosomal segment inserted in an artificially constructed plasmid allows homologous recombination between the two replicons at the homology. Depending on the design, the recombination may result in two recombinant replicons or a single recombinant chromosome with the loss of the recombinant plasmid that lacks a replication origin. The efficiency of such targeted recombination ranges from 9 to 83% depending on the locations of the homology (and thus the size of the chromosomal arm exchanged), essentially eliminating the necessity of selection. The targeted recombination is useful for the efficient engineering of the Streptomyces genome for large-scale deletion, addition, and shuffling.
Endophytic actinobacteria: Diversity, secondary metabolism and mechanisms to unsilence biosynthetic gene clusters.

PubMed

Dinesh, Raghavan; Srinivasan, Veeraraghavan; T E, Sheeja; Anandaraj, Muthuswamy; Srambikkal, Hamza

2017-09-01

Endophytic actinobacteria, which reside in the inner tissues of host plants, are gaining serious attention due to their capacity to produce a plethora of secondary metabolites (e.g. antibiotics) possessing a wide variety of biological activity with diverse functions. This review encompasses the recent reports on endophytic actinobacterial species diversity, in planta habitats and mechanisms underlying their mode of entry into plants. Besides, their metabolic potential, novel bioactive compounds they produce and mechanisms to unravel their hidden metabolic repertoire by activation of cryptic or silent biosynthetic gene clusters (BGCs) for eliciting novel secondary metabolite production are discussed. The study also reviews the classical conservative techniques (chemical/biological/physical elicitation, co-culturing) as well as modern microbiology tools (e.g. next generation sequencing) that are being gainfully employed to uncover the vast hidden scaffolds for novel secondary metabolites produced by these endophytes, which would subsequently herald a revolution in drug engineering. The potential role of these endophytes in the agro-environment as promising biological candidates for inhibition of phytopathogens and the way forward to thoroughly exploit this unique microbial community by inducing expression of cryptic BGCs for encoding unseen products with novel therapeutic properties are also discussed.

Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.

PubMed

Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D; Ober, Carole; Nicolae, Dan L; Barnes, Kathleen C; London, Stephanie J; Gilliland, Frank; Weiss, Scott T; Raby, Benjamin A; Cohn, Lauren; Chupp, Geoffrey L

2015-05-15

The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively. There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.
Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma

PubMed Central

Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F.; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D.; Ober, Carole; Nicolae, Dan L.; Barnes, Kathleen C.; London, Stephanie J.; Gilliland, Frank; Weiss, Scott T.; Raby, Benjamin A.; Cohn, Lauren

2015-01-01

Rationale: The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. Objectives: We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Methods: Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Measurements and Main Results: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10−6) and hospitalization (P = 0.01), respectively. Conclusions: There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma. PMID:25763605
Analysis of genetic association using hierarchical clustering and cluster validation indices.

PubMed

Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

2017-10-01

It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.
The Sound of Silence: Activating Silent Biosynthetic Gene Clusters in Marine Microorganisms.

PubMed

Reen, F Jerry; Romano, Stefano; Dobson, Alan D W; O'Gara, Fergal

2015-07-31

Unlocking the rich harvest of marine microbial ecosystems has the potential to both safeguard the existence of our species for the future, while also presenting significant lifestyle benefits for commercial gain. However, while significant advances have been made in the field of marine biodiscovery, leading to the introduction of new classes of therapeutics for clinical medicine, cosmetics and industrial products, much of what this natural ecosystem has to offer is locked in, and essentially hidden from our screening methods. Releasing this silent potential represents a significant technological challenge, the key to which is a comprehensive understanding of what controls these systems. Heterologous expression systems have been successful in awakening a number of these cryptic marine biosynthetic gene clusters (BGCs). However, this approach is limited by the typically large size of the encoding sequences. More recently, focus has shifted to the regulatory proteins associated with each BGC, many of which are signal responsive raising the possibility of exogenous activation. Abundant among these are the LysR-type family of transcriptional regulators, which are known to control production of microbial aromatic systems. Although the environmental signals that activate these regulatory systems remain unknown, it offers the exciting possibility of evoking mimic molecules and synthetic expression systems to drive production of potentially novel natural products in microorganisms. Success in this field has the potential to provide a quantum leap forward in medical and industrial bio-product development. To achieve these new endpoints, it is clear that the integrated efforts of bioinformaticians and natural product chemists will be required as we strive to uncover new and potentially unique structures from silent or cryptic marine gene clusters.
The Sound of Silence: Activating Silent Biosynthetic Gene Clusters in Marine Microorganisms

PubMed Central

Reen, F. Jerry; Romano, Stefano; Dobson, Alan D.W.; O’Gara, Fergal

2015-01-01

Unlocking the rich harvest of marine microbial ecosystems has the potential to both safeguard the existence of our species for the future, while also presenting significant lifestyle benefits for commercial gain. However, while significant advances have been made in the field of marine biodiscovery, leading to the introduction of new classes of therapeutics for clinical medicine, cosmetics and industrial products, much of what this natural ecosystem has to offer is locked in, and essentially hidden from our screening methods. Releasing this silent potential represents a significant technological challenge, the key to which is a comprehensive understanding of what controls these systems. Heterologous expression systems have been successful in awakening a number of these cryptic marine biosynthetic gene clusters (BGCs). However, this approach is limited by the typically large size of the encoding sequences. More recently, focus has shifted to the regulatory proteins associated with each BGC, many of which are signal responsive raising the possibility of exogenous activation. Abundant among these are the LysR-type family of transcriptional regulators, which are known to control production of microbial aromatic systems. Although the environmental signals that activate these regulatory systems remain unknown, it offers the exciting possibility of evoking mimic molecules and synthetic expression systems to drive production of potentially novel natural products in microorganisms. Success in this field has the potential to provide a quantum leap forward in medical and industrial bio-product development. To achieve these new endpoints, it is clear that the integrated efforts of bioinformaticians and natural product chemists will be required as we strive to uncover new and potentially unique structures from silent or cryptic marine gene clusters. PMID:26264003
Application of High-Density DNA Resequencing Microarray for Detection and Characterization of Botulinum Neurotoxin-Producing Clostridia

PubMed Central

Vanhomwegen, Jessica; Berthet, Nicolas; Mazuet, Christelle; Guigon, Ghislaine; Vallaeys, Tatiana; Stamboliyska, Rayna; Dubois, Philippe; Kennedy, Giulia C.; Cole, Stewart T.; Caro, Valérie; Manuguerra, Jean-Claude; Popoff, Michel-Robert

2013-01-01

Background Clostridium botulinum and related clostridia express extremely potent toxins known as botulinum neurotoxins (BoNTs) that cause severe, potentially lethal intoxications in humans. These BoNT-producing bacteria are categorized in seven major toxinotypes (A through G) and several subtypes. The high diversity in nucleotide sequence and genetic organization of the gene cluster encoding the BoNT components poses a great challenge for the screening and characterization of BoNT-producing strains. Methodology/Principal Findings In the present study, we designed and evaluated the performances of a resequencing microarray (RMA), the PathogenId v2.0, combined with an automated data approach for the simultaneous detection and characterization of BoNT-producing clostridia. The unique design of the PathogenID v2.0 array allows the simultaneous detection and characterization of 48 sequences targeting the BoNT gene cluster components. This approach allowed successful identification and typing of representative strains of the different toxinotypes and subtypes, as well as the neurotoxin-producing C. botulinum strain in a naturally contaminated food sample. Moreover, the method allowed fine characterization of the different neurotoxin gene cluster components of all studied strains, including genomic regions exhibiting up to 24.65% divergence with the sequences tiled on the arrays. Conclusions/Significance The severity of the disease demands rapid and accurate means for performing risk assessments of BoNT-producing clostridia and for tracing potentials sources of contamination in outbreak situations. The RMA approach constitutes an essential higher echelon component in a diagnostics and surveillance pipeline. In addition, it is an important asset to characterise potential outbreak related strains, but also environment isolates, in order to obtain a better picture of the molecular epidemiology of BoNT-producing clostridia. PMID:23818983
Variability among Cucurbitaceae species (melon, cucumber and watermelon) in a genomic region containing a cluster of NBS-LRR genes.

PubMed

Morata, Jordi; Puigdomènech, Pere

2017-02-08

Cucurbitaceae species contain a significantly lower number of genes coding for proteins with similarity to plant resistance genes belonging to the NBS-LRR family than other plant species of similar genome size. A large proportion of these genes are organized in clusters that appear to be hotspots of variability. The genomes of the Cucurbitaceae species measured until now are intermediate in size (between 350 and 450 Mb) and they apparently have not undergone any genome duplications beside those at the origin of eudicots. The cluster containing the largest number of NBS-LRR genes has previously been analyzed in melon and related species and showed a high degree of interspecific and intraspecific variability. It was of interest to study whether similar behavior occurred in other cluster of the same family of genes. The cluster of NBS-LRR genes located in melon chromosome 9 was analyzed and compared with the syntenic regions in other cucurbit genomes. This is the second cluster in number within this species and it contains nine sequences with a NBS-LRR annotation including two genes, Fom1 and Prv, providing resistance against Fusarium and Ppapaya ring-spot virus (PRSV). The variability within the melon species appears to consist essentially of single nucleotide polymorphisms. Clusters of similar genes are present in the syntenic regions of the two species of Cucurbitaceae that were sequenced, cucumber and watermelon. Most of the genes in the syntenic clusters can be aligned between species and a hypothesis of generation of the cluster is proposed. The number of genes in the watermelon cluster is similar to that in melon while a higher number of genes (12) is present in cucumber, a species with a smaller genome than melon. After comparing genome resequencing data of 115 cucumber varieties, deletion of a group of genes is observed in a group of varieties of Indian origin. Clusters of genes coding for NBS-LRR proteins in cucurbits appear to have specific variability in different regions of the genome and between different species. This observation is in favour of considering that the adaptation of plant species to changing environments is based upon the variability that may occur at any location in the genome and that has been produced by specific mechanisms of sequence variation acting on plant genomes. This information could be useful both to understand the evolution of species and for plant breeding.
Lactobacillus buchneri genotyping on the basis of clustered regularly interspaced short palindromic repeat (CRISPR) locus diversity.

PubMed

Briner, Alexandra E; Barrangou, Rodolphe

2014-02-01

Clustered regularly interspaced short palindromic repeats (CRISPR) in combination with associated sequences (cas) constitute the CRISPR-Cas immune system, which uptakes DNA from invasive genetic elements as novel "spacers" that provide a genetic record of immunization events. We investigated the potential of CRISPR-based genotyping of Lactobacillus buchneri, a species relevant for commercial silage, bioethanol, and vegetable fermentations. Upon investigating the occurrence and diversity of CRISPR-Cas systems in Lactobacillus buchneri genomes, we observed a ubiquitous occurrence of CRISPR arrays containing a 36-nucleotide (nt) type II-A CRISPR locus adjacent to four cas genes, including the universal cas1 and cas2 genes and the type II signature gene cas9. Comparative analysis of CRISPR spacer content in 26 L. buchneri pickle fermentation isolates associated with spoilage revealed 10 unique locus genotypes that contained between 9 and 29 variable spacers. We observed a set of conserved spacers at the ancestral end, reflecting a common origin, as well as leader-end polymorphisms, reflecting recent divergence. Some of these spacers showed perfect identity with phage sequences, and many spacers showed homology to Lactobacillus plasmid sequences. Following a comparative analysis of sequences immediately flanking protospacers that matched CRISPR spacers, we identified a novel putative protospacer-adjacent motif (PAM), 5'-AAAA-3'. Overall, these findings suggest that type II-A CRISPR-Cas systems are valuable for genotyping of L. buchneri.
Divergent homologs of the predicted small RNA BpCand697 in Burkholderia spp.

NASA Astrophysics Data System (ADS)

Damiri, Nadzirah; Mohd-Padil, Hirzahida; Firdaus-Raih, Mohd

2015-09-01

The small RNA (sRNA) gene candidate, BpCand697 was previously reported to be unique to Burkholderia spp. and is encoded at 3' non-coding region of a putative AraC family transcription regulator gene. This study demonstrates the conservation of BpCand697 sequence across 32 Burkholderia spp. including B. pseudomallei, B. mallei, B. thailandensis and Burkholderia sp. by integrating both sequence homology and secondary structural analyses of BpCand697 within the dataset. The divergent sequence of BpCand697 was also used as a discriminatory power in clustering the dataset according to the potential virulence of Burkholderia spp., showing that B. thailandensis was clearly secluded from the virulent cluster of B. pseudomallei and B. mallei. Finally, the differential co-transcript expression of BpCand697 and its flanking gene, bpsl2391 was detected in Burkholderia pseudomallei D286 after grown under two different culture conditions using nutrient-rich and minimal media. It is hypothesized that the differential expression of BpCand697-bpsl2391 co-transcript between the two standard prepared media might correlate with nutrient availability in the culture media, suggesting that the physical co-localization of BpCand697 in B. pseudomallei D286 might be directly or indirectly involved with the transcript regulation of bpsl2391 under the selected in vitro culture conditions.
Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.

PubMed

Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita

2016-04-01

Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
High-throughput platform for the discovery of elicitors of silent bacterial gene clusters.

PubMed

Seyedsayamdost, Mohammad R

2014-05-20

Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as "cryptic" or "silent" to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria.
A conserved gene cluster as a putative functional unit in insect innate immunity.

PubMed

Somogyi, Kálmán; Sipos, Botond; Pénzes, Zsolt; Andó, István

2010-11-05

The Nimrod gene superfamily is an important component of the innate immune response. The majority of its member genes are located in close proximity within the Drosophila melanogaster genome and they lie in a larger conserved cluster ("Nimrod cluster"), made up of non-related groups (families, superfamilies) of genes. This cluster has been a part of the Arthropod genomes for about 300-350 million years. The available data suggest that the Nimrod cluster is a functional module of the insect innate immune response. Copyright © 2010 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Cloning and Characterization of the Pyrrolomycin Biosynthetic Gene Clusters from Actinosporangium vitaminophilum ATCC 31673 and Streptomyces sp. Strain UC 11065▿

PubMed Central

Zhang, Xiujun; Parry, Ronald J.

2007-01-01

The pyrrolomycins are a family of polyketide antibiotics, some of which contain a nitro group. To gain insight into the nitration mechanism associated with the formation of these antibiotics, the pyrrolomycin biosynthetic gene cluster from Actinosporangium vitaminophilum was cloned. Sequencing of ca. 56 kb of A. vitaminophilum DNA revealed 35 open reading frames (ORFs). Sequence analysis revealed a clear relationship between some of these ORFs and the biosynthetic gene cluster for pyoluteorin, a structurally related antibiotic. Since a gene transfer system could not be devised for A. vitaminophilum, additional proof for the identity of the cloned gene cluster was sought by cloning the pyrrolomycin gene cluster from Streptomyces sp. strain UC 11065, a transformable pyrrolomycin producer. Sequencing of ca. 26 kb of UC 11065 DNA revealed the presence of 17 ORFs, 15 of which exhibit strong similarity to ORFs in the A. vitaminophilum cluster as well as a nearly identical organization. Single-crossover disruption of two genes in the UC 11065 cluster abolished pyrrolomycin production in both cases. These results confirm that the genetic locus cloned from UC 11065 is essential for pyrrolomycin production, and they also confirm that the highly similar locus in A. vitaminophilum encodes pyrrolomycin biosynthetic genes. Sequence analysis revealed that both clusters contain genes encoding the two components of an assimilatory nitrate reductase. This finding suggests that nitrite is required for the formation of the nitrated pyrrolomycins. However, sequence analysis did not provide additional insights into the nitration process, suggesting the operation of a novel nitration mechanism. PMID:17158935
pySAPC, a python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data.

PubMed

Cao, Huojun; Amendt, Brad A

2016-11-01

Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis. Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016. Published by Elsevier B.V.
Clusters of antibiotic resistance genes enriched together stay together in swine agriculture

DOE PAGES

Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; ...

2016-04-12

Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if they are genetically linked. No links to bacterial membership were observed for these clusters of resistance genes. These findings urge deeper understanding of colocalization of resistance genes and mobile genetic elements in resistance islands and their distribution throughout antibiotic-exposed microbiomes. In addition, as governments seek to combat the rise in antibiotic resistance, a balance is sought between ensuring proper animal health and welfare and preserving medically important antibiotics for therapeutic use. Metagenomic and genomic monitoring will be critical to determine if resistance genes can be reduced in animal microbiomes, or if these gene clusters will continue to be coselected by antibiotics not deemed medically important for human health but used for growth promotion or by medically important antibiotics used therapeutically.« less
Clusters of antibiotic resistance genes enriched together stay together in swine agriculture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong

Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if they are genetically linked. No links to bacterial membership were observed for these clusters of resistance genes. These findings urge deeper understanding of colocalization of resistance genes and mobile genetic elements in resistance islands and their distribution throughout antibiotic-exposed microbiomes. In addition, as governments seek to combat the rise in antibiotic resistance, a balance is sought between ensuring proper animal health and welfare and preserving medically important antibiotics for therapeutic use. Metagenomic and genomic monitoring will be critical to determine if resistance genes can be reduced in animal microbiomes, or if these gene clusters will continue to be coselected by antibiotics not deemed medically important for human health but used for growth promotion or by medically important antibiotics used therapeutically.« less
Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

PubMed

Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

2016-04-12

Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if they are genetically linked. No links to bacterial membership were observed for these clusters of resistance genes. These findings urge deeper understanding of colocalization of resistance genes and mobile genetic elements in resistance islands and their distribution throughout antibiotic-exposed microbiomes. As governments seek to combat the rise in antibiotic resistance, a balance is sought between ensuring proper animal health and welfare and preserving medically important antibiotics for therapeutic use. Metagenomic and genomic monitoring will be critical to determine if resistance genes can be reduced in animal microbiomes, or if these gene clusters will continue to be coselected by antibiotics not deemed medically important for human health but used for growth promotion or by medically important antibiotics used therapeutically. Copyright © 2016 Johnson et al.
TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.

PubMed

Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun

2017-12-01

Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
ORGANIZATION OF THE nif GENES OF THE NONHETEROCYSTOUS CYANOBACTERIUM TRICHODESMIUM SP. IMS101.

PubMed

Dominic, Benny; Zani, Sabino; Chen, Yi-Bu; Mellon, Mark T; Zehr, Jonathan P

2000-08-26

An approximately 16-kb fragment of the Trichodesmium sp. IMS101 (a nonheterocystous filamentous cyanobacterium) "conventional"nif gene cluster was cloned and sequenced. The gene organization of the Trichodesmium and Anabaena variabilis vegetative (nif 2) nitrogenase gene clusters spanning the region from nif B to nif W are similar except for the absence of two open reading frames (ORF3 and ORF1) in Trichodesmium. The Trichodesmium nif EN genes encode a fused Nif EN polypeptide that does not appear to be processed into individual Nif E and Nif N polypeptides. Fused nif EN genes were previously found in the A. variabilis nif 2 genes, but we have found that fused nif EN genes are widespread in the nonheterocystous cyanobacteria. Although the gene organization of the nonheterocystous filamentous Trichodesmium nif gene cluster is very similar to that of the A. variabilis vegetative nif 2 gene cluster, phylogenetic analysis of nif sequences do not support close relatedness of Trichodesmium and A. variabilis vegetative (nif 2) nitrogenase genes.
A mixture model-based approach to the clustering of microarray expression data.

PubMed

McLachlan, G J; Bean, R W; Peel, D

2002-03-01

This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters.

PubMed

Zhang, Huixian; Ravi, Vydianathan; Tay, Boon-Hui; Tohari, Sumanty; Pillai, Nisha E; Prasad, Aravind; Lin, Qiang; Brenner, Sydney; Venkatesh, Byrappa

2017-08-22

ParaHox genes ( Gsx , Pdx , and Cdx ) are an ancient family of developmental genes closely related to the Hox genes. They play critical roles in the patterning of brain and gut. The basal chordate, amphioxus, contains a single ParaHox cluster comprising one member of each family, whereas nonteleost jawed vertebrates contain four ParaHox genomic loci with six or seven ParaHox genes. Teleosts, which have experienced an additional whole-genome duplication, contain six ParaHox genomic loci with six ParaHox genes. Jawless vertebrates, represented by lampreys and hagfish, are the most ancient group of vertebrates and are crucial for understanding the origin and evolution of vertebrate gene families. We have previously shown that lampreys contain six Hox gene loci. Here we report that lampreys contain only two ParaHox gene clusters (designated as α- and β-clusters) bearing five ParaHox genes ( Gsxα , Pdxα , Cdxα , Gsxβ , and Cdxβ ). The order and orientation of the three genes in the α-cluster are identical to that of the single cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homologs in jawed vertebrates, which are expressed mainly in the brain. The lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate Pdx genes, indicating that the pancreatic expression of Pdx was acquired before the divergence of jawless and jawed vertebrate lineages. It is likely that the lamprey Pdxα plays a crucial role in pancreas specification and insulin production similar to the Pdx of jawed vertebrates.
VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria.

PubMed

Li, Jun; Tai, Cui; Deng, Zixin; Zhong, Weihong; He, Yongqun; Ou, Hong-Yu

2017-01-10

VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Fragmentation of an aflatoxin-like gene cluster in a forest pathogen

USDA-ARS?s Scientific Manuscript database

Secondary metabolic pathway genes are typically clustered in fungi. An exception to this paradigm is seen for genes required for the production of dothistromin, an aflatoxin-like virulence factor produced by the pine needle pathogen Dothistroma septosporum. In contrast to the tight clustering of gen...
Genome mining-directed activation of a silent angucycline biosynthetic gene cluster in Streptomyces chattanoogensis.

PubMed

Zhou, Zhenxing; Xu, Qingqing; Bu, Qingting; Guo, Yuanyang; Liu, Shuiping; Liu, Yu; Du, Yiling; Li, Yongquan

2015-02-09

Genomic sequencing of actinomycetes has revealed the presence of numerous gene clusters seemingly capable of natural product biosynthesis, yet most clusters are cryptic under laboratory conditions. Bioinformatics analysis of the completely sequenced genome of Streptomyces chattanoogensis L10 (CGMCC 2644) revealed a silent angucycline biosynthetic gene cluster. The overexpression of a pathway-specific activator gene under the constitutive ermE* promoter successfully triggered the expression of the angucycline biosynthetic genes. Two novel members of the angucycline antibiotic family, chattamycins A and B, were further isolated and elucidated. Biological activity assays demonstrated that chattamycin B possesses good antitumor activities against human cancer cell lines and moderate antibacterial activities. The results presented here provide a feasible method to activate silent angucycline biosynthetic gene clusters to discover potential new drug leads. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The ergot alkaloid gene cluster: functional analyses and evolutionary aspects.

PubMed

Lorenz, Nicole; Haarmann, Thomas; Pazoutová, Sylvie; Jung, Manfred; Tudzynski, Paul

2009-01-01

Ergot alkaloids and their derivatives have been traditionally used as therapeutic agents in migraine, blood pressure regulation and help in childbirth and abortion. Their production in submerse culture is a long established biotechnological process. Ergot alkaloids are produced mainly by members of the genus Claviceps, with Claviceps purpurea as best investigated species concerning the biochemistry of ergot alkaloid synthesis (EAS). Genes encoding enzymes involved in EAS have been shown to be clustered; functional analyses of EAS cluster genes have allowed to assign specific functions to several gene products. Various Claviceps species differ with respect to their host specificity and their alkaloid content; comparison of the ergot alkaloid clusters in these species (and of clavine alkaloid clusters in other genera) yields interesting insights into the evolution of cluster structure. This review focuses on recently published and also yet unpublished data on the structure and evolution of the EAS gene cluster and on the function and regulation of cluster genes. These analyses have also significant biotechnological implications: the characterization of non-ribosomal peptide synthetases (NRPS) involved in the synthesis of the peptide moiety of ergopeptines opened interesting perspectives for the synthesis of ergot alkaloids; on the other hand, defined mutants could be generated producing interesting intermediates or only single peptide alkaloids (instead of the alkaloid mixtures usually produced by industrial strains).
Scoring clustering solutions by their biological relevance.

PubMed

Gat-Viks, I; Sharan, R; Shamir, R

2003-12-12

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
Comprehensive Expression Map of Transcription Regulators in the Adult Zebrafish Telencephalon Reveals Distinct Neurogenic Niches

PubMed Central

Diotel, Nicolas; Rodriguez Viales, Rebecca; Armant, Olivier; März, Martin; Ferg, Marco; Rastegar, Sepand; Strähle, Uwe

2015-01-01

The zebrafish has become a model to study adult vertebrate neurogenesis. In particular, the adult telencephalon has been an intensely studied structure in the zebrafish brain. Differential expression of transcriptional regulators (TRs) is a key feature of development and tissue homeostasis. Here we report an expression map of 1,202 TR genes in the telencephalon of adult zebrafish. Our results are summarized in a database with search and clustering functions to identify genes expressed in particular regions of the telencephalon. We classified 562 genes into 13 distinct patterns, including genes expressed in the proliferative zone. The remaining 640 genes displayed unique and complex patterns of expression and could thus not be grouped into distinct classes. The neurogenic ventricular regions express overlapping but distinct sets of TR genes, suggesting regional differences in the neurogenic niches in the telencephalon. In summary, the small telencephalon of the zebrafish shows a remarkable complexity in TR gene expression. The adult zebrafish telencephalon has become a model to study neurogenesis. We established the expression pattern of more than 1200 transcription regulators (TR) in the adult telencephalon. The neurogenic regions express overlapping but distinct sets of TR genes suggesting regional differences in the neurogenic potential. J. Comp. Neurol. 523:1202–1221, 2015. © 2015 Wiley Periodicals, Inc. PMID:25556858
Comprehensive expression map of transcription regulators in the adult zebrafish telencephalon reveals distinct neurogenic niches.

PubMed

Diotel, Nicolas; Rodriguez Viales, Rebecca; Armant, Olivier; März, Martin; Ferg, Marco; Rastegar, Sepand; Strähle, Uwe

2015-06-01

The zebrafish has become a model to study adult vertebrate neurogenesis. In particular, the adult telencephalon has been an intensely studied structure in the zebrafish brain. Differential expression of transcriptional regulators (TRs) is a key feature of development and tissue homeostasis. Here we report an expression map of 1,202 TR genes in the telencephalon of adult zebrafish. Our results are summarized in a database with search and clustering functions to identify genes expressed in particular regions of the telencephalon. We classified 562 genes into 13 distinct patterns, including genes expressed in the proliferative zone. The remaining 640 genes displayed unique and complex patterns of expression and could thus not be grouped into distinct classes. The neurogenic ventricular regions express overlapping but distinct sets of TR genes, suggesting regional differences in the neurogenic niches in the telencephalon. In summary, the small telencephalon of the zebrafish shows a remarkable complexity in TR gene expression. The adult zebrafish telencephalon has become a model to study neurogenesis. We established the expression pattern of more than 1200 transcription regulators (TR) in the adult telencephalon. The neurogenic regions express overlapping but distinct sets of TR genes suggesting regional differences in the neurogenic potential. © 2015 Wiley Periodicals, Inc.
De novo sequencing and analysis of the transcriptome of Panax ginseng in the leaf-expansion period.

PubMed

Liu, Shichao; Wang, Siming; Liu, Meichen; Yang, Fei; Zhang, Hui; Liu, Shiyang; Wang, Qun; Zhao, Yu

2016-08-01

Panax ginseng, a traditional Chinese medicine, is used worldwide for its variety of health benefits and its treatment efficacy. However, it is difficult to cultivate due to its vulnerability to environmental stresses. The present study provided the first report, to the best of our knowledge, of transcriptome analysis of ginseng at the leaf‑expansion stage. Using the Illumina sequencing platform, >40,000,000 high‑quality paired‑end reads were obtained and assembled into 100,533 unique sequences. When the sequences were searched against the publicly available National Center for Biotechnology Information protein database using The Basic Local Alignment Search Tool, 61,599 sequences exhibited similarity to known proteins. Functional annotation and classification, including use of the Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases, revealed that the activated genes in ginseng were predominantly ribonuclease‑like storage genes, environmental stress genes, pathogenesis-related genes and other antioxidant genes. A number of candidate genes in environmental stress‑associated pathways were also identified. These novel data provide useful information on the growth and development stages of ginseng, and serve as an important public information platform for further understanding of the molecular mechanisms and functional genomics of ginseng.
Reverse engineering and analysis of large genome-scale gene networks

PubMed Central

Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

2013-01-01

Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249
Pathway Distiller - multisource biological pathway consolidation

PubMed Central

2012-01-01

Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636
Pathway Distiller - multisource biological pathway consolidation.

PubMed

Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong

2012-01-01

One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebhaber, S.A.; Weiss, I.; Cash, F.E.

Synthesis of normal human hemoglobin A, {alpha}{sub 2}{beta}{sub 2}, is based upon balanced expression of genes in the {alpha}-globin gene cluster on chromosome 15 and the {beta}-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the {beta}-globin cluster depend on sequences located at a considerable distance 5{prime} to the {beta}-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the {alpha}-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with {alpha}-thalassemia in whom structurally normal {alpha}-globin genesmore » have been inactivated in cis by a discrete de novo 35-kilobase deletion located {approximately}30 kilobases 5{prime} from the {alpha}-globin gene cluster. They conclude that this deletion inactivates expression of the {alpha}-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the {alpha}-globin genes.« less
Discovery of a Phosphonoacetic Acid Derived Natural Product by Pathway Refactoring.

PubMed

Freestone, Todd S; Ju, Kou-San; Wang, Bin; Zhao, Huimin

2017-02-17

The activation of silent natural product gene clusters is a synthetic biology problem of great interest. As the rate at which gene clusters are identified outpaces the discovery rate of new molecules, this unknown chemical space is rapidly growing, as too are the rewards for developing technologies to exploit it. One class of natural products that has been underrepresented is phosphonic acids, which have important medical and agricultural uses. Hundreds of phosphonic acid biosynthetic gene clusters have been identified encoding for unknown molecules. Although methods exist to elicit secondary metabolite gene clusters in native hosts, they require the strain to be amenable to genetic manipulation. One method to circumvent this is pathway refactoring, which we implemented in an effort to discover new phosphonic acids from a gene cluster from Streptomyces sp. strain NRRL F-525. By reengineering this cluster for expression in the production host Streptomyces lividans, utility of refactoring is demonstrated with the isolation of a novel phosphonic acid, O-phosphonoacetic acid serine, and the characterization of its biosynthesis. In addition, a new biosynthetic branch point is identified with a phosphonoacetaldehyde dehydrogenase, which was used to identify additional phosphonic acid gene clusters that share phosphonoacetic acid as an intermediate.
The intact dupA cluster is a more reliable Helicobacter pylori virulence marker than dupA alone.

PubMed

Jung, Sung Woo; Sugimoto, Mitsushige; Shiota, Seiji; Graham, David Y; Yamaoka, Yoshio

2012-01-01

The duodenal ulcer promoting (dupA) gene, located in the plasticity region of Helicobacter pylori, is associated with duodenal ulcer development. dupA was predicted to form a type IV secretory system (T4SS) with vir genes around dupA (dupA cluster). We investigated the prevalence of dupA and dupA clusters and clarified associations between the dupA cluster status and clinical outcomes in the U.S. population. In all, 245 H. pylori strains were examined using PCR to evaluate the status of dupA and the adjacent vir genes predicted to form T4SS, in addition to the status of cag pathogenicity island (PAI). The associations between dupA cluster status and interleukin-8 (IL-8) and IL-12 production were also examined. The presence of dupA and all adjacent vir genes were defined as a complete dupA cluster. Many variations related to the status of dupA and dupA cluster genes were identified. Concurrent H. pylori infection and the presence of a complete dupA cluster increases duodenal ulcer risk compared to H. pylori infection with incomplete dupA cluster or without the dupA gene independent on the cag PAI status (adjusted odds ratio, 2.13; 95% confidence interval, 1.13 to 4.03). Gastric mucosal IL-8 levels were also significantly higher in the complete dupA cluster group than in other groups (P=0.01). In conclusion, although the causal relationship between the dupA cluster and duodenal ulcer development is not proved, the presence of a complete dupA cluster but not dupA alone, is associated with duodenal ulcer development.
The Intact dupA Cluster Is a More Reliable Helicobacter pylori Virulence Marker than dupA Alone

PubMed Central

Jung, Sung Woo; Sugimoto, Mitsushige; Shiota, Seiji; Graham, David Y.

2012-01-01

The duodenal ulcer promoting (dupA) gene, located in the plasticity region of Helicobacter pylori, is associated with duodenal ulcer development. dupA was predicted to form a type IV secretory system (T4SS) with vir genes around dupA (dupA cluster). We investigated the prevalence of dupA and dupA clusters and clarified associations between the dupA cluster status and clinical outcomes in the U.S. population. In all, 245 H. pylori strains were examined using PCR to evaluate the status of dupA and the adjacent vir genes predicted to form T4SS, in addition to the status of cag pathogenicity island (PAI). The associations between dupA cluster status and interleukin-8 (IL-8) and IL-12 production were also examined. The presence of dupA and all adjacent vir genes were defined as a complete dupA cluster. Many variations related to the status of dupA and dupA cluster genes were identified. Concurrent H. pylori infection and the presence of a complete dupA cluster increases duodenal ulcer risk compared to H. pylori infection with incomplete dupA cluster or without the dupA gene independent on the cag PAI status (adjusted odds ratio, 2.13; 95% confidence interval, 1.13 to 4.03). Gastric mucosal IL-8 levels were also significantly higher in the complete dupA cluster group than in other groups (P = 0.01). In conclusion, although the causal relationship between the dupA cluster and duodenal ulcer development is not proved, the presence of a complete dupA cluster but not dupA alone, is associated with duodenal ulcer development. PMID:22038914
Fast gene ontology based clustering for microarray experiments.

PubMed

Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

2008-11-21

Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
Genetic analysis reveals the identity of the photoreceptor for phototaxis in hormogonium filaments of Nostoc punctiforme.

PubMed

Campbell, Elsie L; Hagen, Kari D; Chen, Rui; Risser, Douglas D; Ferreira, Daniela P; Meeks, John C

2015-02-15

In cyanobacterial Nostoc species, substratum-dependent gliding motility is confined to specialized nongrowing filaments called hormogonia, which differentiate from vegetative filaments as part of a conditional life cycle and function as dispersal units. Here we confirm that Nostoc punctiforme hormogonia are positively phototactic to white light over a wide range of intensities. N. punctiforme contains two gene clusters (clusters 2 and 2i), each of which encodes modular cyanobacteriochrome-methyl-accepting chemotaxis proteins (MCPs) and other proteins that putatively constitute a basic chemotaxis-like signal transduction complex. Transcriptional analysis established that all genes in clusters 2 and 2i, plus two additional clusters (clusters 1 and 3) with genes encoding MCPs lacking cyanobacteriochrome sensory domains, are upregulated during the differentiation of hormogonia. Mutational analysis determined that only genes in cluster 2i are essential for positive phototaxis in N. punctiforme hormogonia; here these genes are designated ptx (for phototaxis) genes. The cluster is unusual in containing complete or partial duplicates of genes encoding proteins homologous to the well-described chemotaxis elements CheY, CheW, MCP, and CheA. The cyanobacteriochrome-MCP gene (ptxD) lacks transmembrane domains and has 7 potential binding sites for bilins. The transcriptional start site of the ptx genes does not resemble a sigma 70 consensus recognition sequence; moreover, it is upstream of two genes encoding gas vesicle proteins (gvpA and gvpC), which also are expressed only in the hormogonium filaments of N. punctiforme. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Identifying a gene expression signature of cluster headache in blood

PubMed Central

Eising, Else; Pelzer, Nadine; Vijfhuizen, Lisanne S.; Vries, Boukje de; Ferrari, Michel D.; ‘t Hoen, Peter A. C.; Terwindt, Gisela M.; van den Maagdenberg, Arn M. J. M.

2017-01-01

Cluster headache is a relatively rare headache disorder, typically characterized by multiple daily, short-lasting attacks of excruciating, unilateral (peri-)orbital or temporal pain associated with autonomic symptoms and restlessness. To better understand the pathophysiology of cluster headache, we used RNA sequencing to identify differentially expressed genes and pathways in whole blood of patients with episodic (n = 19) or chronic (n = 20) cluster headache in comparison with headache-free controls (n = 20). Gene expression data were analysed by gene and by module of co-expressed genes with particular attention to previously implicated disease pathways including hypocretin dysregulation. Only moderate gene expression differences were identified and no associations were found with previously reported pathogenic mechanisms. At the level of functional gene sets, associations were observed for genes involved in several brain-related mechanisms such as GABA receptor function and voltage-gated channels. In addition, genes and modules of co-expressed genes showed a role for intracellular signalling cascades, mitochondria and inflammation. Although larger study samples may be required to identify the full range of involved pathways, these results indicate a role for mitochondria, intracellular signalling and inflammation in cluster headache. PMID:28074859
Identification and characterization of the ergochrome gene cluster in the plant pathogenic fungus Claviceps purpurea.

PubMed

Neubauer, Lisa; Dopstadt, Julian; Humpf, Hans-Ulrich; Tudzynski, Paul

2016-01-01

Claviceps purpurea is a phytopathogenic fungus infecting a broad range of grasses including economically important cereal crop plants. The infection cycle ends with the formation of the typical purple-black pigmented sclerotia containing the toxic ergot alkaloids. Besides these ergot alkaloids little is known about the secondary metabolism of the fungus. Red anthraquinone derivatives and yellow xanthone dimers (ergochromes) have been isolated from sclerotia and described as ergot pigments, but the corresponding gene cluster has remained unknown. Fungal pigments gain increasing interest for example as environmentally friendly alternatives to existing dyes. Furthermore, several pigments show biological activities and may have some pharmaceutical value. This study identified the gene cluster responsible for the synthesis of the ergot pigments. Overexpression of the cluster-specific transcription factor led to activation of the gene cluster and to the production of several known ergot pigments. Knock out of the cluster key enzyme, a nonreducing polyketide synthase, clearly showed that this cluster is responsible for the production of red anthraquinones as well as yellow ergochromes. Furthermore, a tentative biosynthetic pathway for the ergot pigments is proposed. By changing the culture conditions, pigment production was activated in axenic culture so that high concentration of phosphate and low concentration of sucrose induced pigment syntheses. This is the first functional analysis of a secondary metabolite gene cluster in the ergot fungus besides that for the classical ergot alkaloids. We demonstrated that this gene cluster is responsible for the typical purple-black color of the ergot sclerotia and showed that the red and yellow ergot pigments are products of the same biosynthetic pathway. Activation of the gene cluster in axenic culture opened up new possibilities for biotechnological applications like the dye production or the development of new pharmaceuticals.

Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects

PubMed Central

2012-01-01

Background Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. Results We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. Conclusions Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data. PMID:23151154
A custom correlation coefficient (CCC) approach for fast identification of multi-SNP association patterns in genome-wide SNPs data.

PubMed

Climer, Sharlee; Yang, Wei; de las Fuentes, Lisa; Dávila-Román, Victor G; Gu, C Charles

2014-11-01

Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns. This method identified 42 candidate multi-SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation-contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease-associated multi-SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets. © 2014 WILEY PERIODICALS, INC.
A custom correlation coefficient (CCC) approach for fast identification of multi-SNP association patterns in genome-wide SNPs data

PubMed Central

Climer, Sharlee; Yang, Wei; de las Fuentes, Lisa; Dávila-Román, Victor G.; Gu, C. Charles

2014-01-01

Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of Custom Correlation Coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns. This method identified 42 candidate multi-SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (6 genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation-contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease-associated multi-SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets. PMID:25168954
Genetic diversity, seasonality and transmission network of human metapneumovirus: identification of a unique sub-lineage of the fusion and attachment genes

PubMed Central

Chow, Wei Zhen; Chan, Yoke Fun; Oong, Xiang Yong; Ng, Liang Jie; Nor’E, Siti Sarah; Ng, Kim Tien; Chan, Kok Gan; Hanafi, Nik Sherina; Pang, Yong Kek; Kamarulzaman, Adeeba; Tee, Kok Keng

2016-01-01

Human metapneumovirus (HMPV) is an important viral respiratory pathogen worldwide. Current knowledge regarding the genetic diversity, seasonality and transmission dynamics of HMPV among adults and children living in tropical climate remains limited. HMPV prevailed at 2.2% (n = 86/3,935) among individuals presented with acute respiratory tract infections in Kuala Lumpur, Malaysia between 2012 and 2014. Seasonal peaks were observed during the northeast monsoon season (November–April) and correlated with higher relative humidity and number of rainy days (P < 0.05). Phylogenetic analysis of the fusion and attachment genes identified the co-circulation of three known HMPV sub-lineages, A2b and B1 (30.2% each, 26/86) and B2 (20.9%, 18/86), with genotype shift from sub-lineage B1 to A2b observed in 2013. Interestingly, a previously unrecognized sub-lineage of A2 was identified in 18.6% (16/86) of the population. Using a custom script for network construction based on the TN93 pairwise genetic distance, we identified up to nine HMPV transmission clusters circulating as multiple sub-epidemics. Although no apparent major outbreak was observed, the increased frequency of transmission clusters (dyads) during seasonal peaks suggests the potential roles of transmission clusters in driving the spread of HMPV. Our findings provide essential information for therapeutic research, prevention strategies, and disease outbreak monitoring of HMPV. PMID:27279080
New natural products isolated from Metarhizium robertsii ARSEF 23 by chemical screening and identification of the gene cluster through engineered biosynthesis in Aspergillus nidulans A1145.

PubMed

Kato, Hiroki; Tsunematsu, Yuta; Yamamoto, Tsuyoshi; Namiki, Takuya; Kishimoto, Shinji; Noguchi, Hiroshi; Watanabe, Kenji

2016-07-01

To rapidly identify novel natural products and their associated biosynthetic genes from underutilized and genetically difficult-to-manipulate microbes, we developed a method that uses (1) chemical screening to isolate novel microbial secondary metabolites, (2) bioinformatic analyses to identify a potential biosynthetic gene cluster and (3) heterologous expression of the genes in a convenient host to confirm the identity of the gene cluster and the proposed biosynthetic mechanism. The chemical screen was achieved by searching known natural product databases with data from liquid chromatographic and high-resolution mass spectrometric analyses collected on the extract from a target microbe culture. Using this method, we were able to isolate two new meroterpenes, subglutinols C (1) and D (2), from an entomopathogenic filamentous fungus Metarhizium robertsii ARSEF 23. Bioinformatics analysis of the genome allowed us to identify a gene cluster likely to be responsible for the formation of subglutinols. Heterologous expression of three genes from the gene cluster encoding a polyketide synthase, a prenyltransferase and a geranylgeranyl pyrophosphate synthase in Aspergillus nidulans A1145 afforded an α-pyrone-fused uncyclized diterpene, the expected intermediate of the subglutinol biosynthesis, thereby confirming the gene cluster to be responsible for the subglutinol biosynthesis. These results indicate the usefulness of our methodology in isolating new natural products and identifying their associated biosynthetic gene cluster from microbes that are not amenable to genetic manipulation. Our method should facilitate the natural product discovery efforts by expediting the identification of new secondary metabolites and their associated biosynthetic genes from a wider source of microbes.
The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

PubMed Central

Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.

2013-01-01

The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role of secondary metabolite gene clusters and their metabolites in fungal biology. PMID:23818858
Outcome-Driven Cluster Analysis with Application to Microarray Data.

PubMed

Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

2015-01-01

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
First freshwater member ever reported for the family Bathycoccaceae (Chlorophyta; Archaeplastida) from Argentinean Patagonia revealed by environmental DNA survey.

PubMed

Lara, Enrique; Fernández, Leonardo D; Schiaffino, M Romina; Izaguirre, Irina

2017-08-01

We characterized molecularly the first freshwater member ever reported for the family Bathycoccaceae in Lake Musters (Argentinean Patagonia). Members of this family are extremely numerous and play a key ecological role in marine systems as primary producers. We cloned a fragment comprising the SSU rRNA gene+ITS region from environmental DNA using specific mamiellophyte primers. The unique SSU rRNA gene sequence obtained clustered robustly with Bathycoccus prasinos. Analysis of the two-dimensional structure of the ITS region showed the presence of a typical supplementary helix in the ITS-2 region, a synapomorphy of Bathycoccaceae, which confirmed further its phylogenetic placement. We finally discuss the possible causes for the presence of this organism in Lake Musters. Copyright © 2017 Elsevier GmbH. All rights reserved.
Structure and genetics of the O-specific polysaccharide of Escherichia coli O27.

PubMed

Perepelov, Andrei V; Chen, Tingting; Senchenkova, Sofya N; Filatov, Andrei V; Song, Jingjie; Shashkov, Alexander S; Liu, Bin; Knirel, Yuriy A

2018-02-01

The O-specific polysaccharide (O-antigen) is a part of the lipopolysaccharide on the cell surface of Gram-negative bacteria. The O-polysaccharide was obtained by mild acid hydrolysis of the lipopolysaccharide of Escherichia coli O27 and studied by sugar analysis and Smith degradation along with 1 H and 13 C NMR spectroscopy. The following structure of the branched hexasaccharide repeating unit was established, which is unique among known structures of bacterial polysaccharides:where GlcA is non-stoichiometrically O-acetylated at position 3 (∼22%) or 4 (∼37%). Functions of genes in the O-antigen gene cluster of E. coli O27 were tentatively assigned by comparison with sequences in the available databases and found to be consistent with the O-polysaccharide structure. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Bioinformatics Facility for NASA

NASA Technical Reports Server (NTRS)

Schweighofer, Karl; Pohorille, Andrew

2006-01-01

Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.
The fish pathogen Yersinia ruckeri produces holomycin and uses an RNA methyltransferase for self-resistance.

PubMed

Qin, Zhiwei; Baker, Alexander Thomas; Raab, Andrea; Huang, Sheng; Wang, Tiehui; Yu, Yi; Jaspars, Marcel; Secombes, Christopher J; Deng, Hai

2013-05-24

Holomycin and its derivatives belong to a class of broad-spectrum antibacterial natural products containing a rare dithiolopyrrolone heterobicyclic scaffold. The antibacterial mechanism of dithiolopyrrolone compounds has been attributed to the inhibition of bacterial RNA polymerase activities, although the exact mode of action has not been established in vitro. Some dithiopyrrolone derivatives display potent anticancer activities. Recently the biosynthetic gene cluster of holomycin has been identified and characterized in Streptomyces clavuligerus. Here we report that the fish pathogen Yersinia ruckeri is a holomycin producer, as evidenced through genome mining, chemical isolation, and structural elucidation as well as genetic manipulation. We also identified a unique regulatory gene hom15 at one end of the gene cluster encoding a cold-shock-like protein that likely regulates the production of holomycin in low cultivation temperatures. Inactivation of hom15 resulted in a significant loss of holomycin production. Finally, gene disruption of an RNA methyltransferase gene hom12 resulted in the sensitivity of the mutant toward holomycin. A complementation experiment of hom12 restored the resistance against holomycin. Although the wild-type Escherichia coli BL21(DE3) Gold is susceptible to holomycin, the mutant harboring hom12 showed tolerance toward holomycin. High resolution liquid chromatography (LC)-ESI/MS analysis of digested RNA fragments demonstrated that the wild-type Y. ruckeri and E. coli harboring hom12 contain a methylated RNA fragment, whereas the mutated Y. ruckeri and the wild-type E. coli only contain normal non-methylated RNA fragments. Taken together, our results strongly suggest that this putative RNA methyltransferase Hom12 is the self-resistance protein that methylates the RNA of Y. ruckeri to reduce the cytotoxic effect of holomycin during holomycin production.
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages

PubMed Central

Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages.

PubMed

Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Comprehensive Genomic Characterization of Upper Tract Urothelial Carcinoma.

PubMed

Moss, Tyler J; Qi, Yuan; Xi, Liu; Peng, Bo; Kim, Tae-Beom; Ezzedine, Nader E; Mosqueda, Maribel E; Guo, Charles C; Czerniak, Bogdan A; Ittmann, Michael; Wheeler, David A; Lerner, Seth P; Matin, Surena F

2017-10-01

Upper urinary tract urothelial cancer (UTUC) may have unique etiologic and genomic factors compared to bladder cancer. To characterize the genomic landscape of UTUC and provide insights into its biology using comprehensive integrated genomic analyses. We collected 31 untreated snap-frozen UTUC samples from two institutions and carried out whole-exome sequencing (WES) of DNA, RNA sequencing (RNAseq), and protein analysis. Adjusting for batch effects, consensus mutation calls from independent pipelines identified DNA mutations, gene expression clusters using unsupervised consensus hierarchical clustering (UCHC), and protein expression levels that were correlated with relevant clinical variables, The Cancer Genome Atlas, and other published data. WES identified mutations in FGFR3 (74.1%; 92% low-grade, 60% high-grade), KMT2D (44.4%), PIK3CA (25.9%), and TP53 (22.2%). APOBEC and CpG were the most common mutational signatures. UCHC of RNAseq data segregated samples into four molecular subtypes with the following characteristics. Cluster 1: no PIK3CA mutations, nonsmokers, high-grade
Cancer Detection in Microarray Data Using a Modified Cat Swarm Optimization Clustering Approach

PubMed

M, Pandi; R, Balamurugan; N, Sadhasivam

2017-12-29

Objective: A better understanding of functional genomics can be obtained by extracting patterns hidden in gene expression data. This could have paramount implications for cancer diagnosis, gene treatments and other domains. Clustering may reveal natural structures and identify interesting patterns in underlying data. The main objective of this research was to derive a heuristic approach to detection of highly co-expressed genes related to cancer from gene expression data with minimum Mean Squared Error (MSE). Methods: A modified CSO algorithm using Harmony Search (MCSO-HS) for clustering cancer gene expression data was applied. Experiment results are analyzed using two cancer gene expression benchmark datasets, namely for leukaemia and for breast cancer. Result: The results indicated MCSO-HS to be better than HS and CSO, 13% and 9% with the leukaemia dataset. For breast cancer dataset improvement was by 22% and 17%, respectively, in terms of MSE. Conclusion: The results showed MCSO-HS to outperform HS and CSO with both benchmark datasets. To validate the clustering results, this work was tested with internal and external cluster validation indices. Also this work points to biological validation of clusters with gene ontology in terms of function, process and component. Creative Commons Attribution License
Structure-related clustering of gene expression fingerprints of thp-1 cells exposed to smaller polycyclic aromatic hydrocarbons.

PubMed

Wan, B; Yarbrough, J W; Schultz, T W

2008-01-01

This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.
A Zn(II)2Cys6 DNA binding protein regulates the sirodesmin PL biosynthetic gene cluster in Leptosphaeria maculans

PubMed Central

Fox, Ellen M.; Gardiner, Donald M.; Keller, Nancy P.; Howlett, Barbara J.

2008-01-01

A gene, sirZ, encoding a Zn(II)2Cys6 DNA binding protein is present in a cluster of genes responsible for the biosynthesis of the epipolythiodioxopiperazine (ETP) toxin, sirodesmin PL in the ascomycete plant pathogen, Leptosphaeria maculans. RNA-mediated silencing of sirZ gives rise to transformants that produce only residual amounts of sirodesmin PL and display a decrease in the transcription of several sirodesmin PL biosynthetic genes. This indicates that SirZ is a major regulator of this gene cluster. Proteins similar to SirZ are encoded in the gliotoxin biosynthetic gene cluster of Aspergillus fumigatus (gliZ) and in an ETP-like cluster in Penicillium lilacinoechinulatum (PlgliZ). Despite its high level of sequence similarity to gliZ, PlgliZ is unable to complement the gliotoxin-deficiency of a mutant of gliZ in A. fumigatus. Putative binding sites for these regulatory proteins in the promoters of genes in these clusters were predicted using bioinformatic analysis. These sites are similar to those commonly bound by other proteins with Zn(II)2Cys6 DNA binding domains. PMID:18023597
Evidence against the selfish operon theory.

PubMed

Pál, Csaba; Hurst, Laurence D

2004-06-01

According to the selfish operon hypothesis, the clustering of genes and their subsequent organization into operons is beneficial for the constituent genes because it enables the horizontal gene transfer of weakly selected, functionally coupled genes. The majority of these are expected to be non-essential genes. From our analysis of the Escherichia coli genome, we conclude that the selfish operon hypothesis is unlikely to provide a general explanation for clustering nor can it account for the gene composition of operons. Contrary to expectations, essential genes with related functions have an especially strong tendency to cluster, even if they are not in operons. Moreover, essential genes are particularly abundant in operons.
Molecular epidemiology is becoming complex under the dynamic HIV prevalence: The perspective from Harbin, China.

PubMed

Shao, Bing; Song, Bo; Cao, Lijun; Du, Juan; Sun, Dongying; Lin, Yuanlong; Wang, Binyou; Wang, Fuxiang; Wang, Sunran

2016-05-01

Unlike most areas of China, HIV transmission via men who have sex with men (MSM) is increasing rapidly, and has become the main route of HIV transmission in Harbin city. The purpose of the current study was to elaborate the molecular epidemiologic characteristics of the new HIV epidemic. Eighty-one HIV-1 gag gene sequences (HXB2:806-1861) from local HIV infections were isolated; CRF01_AE predominated among HIV infections (71.6%), followed by subtype B (16.5%), CRF07_BC (6.2%), and unique recombinant strains (URFs; 6.2%). URFs were most often identified in the MSM population, which consisted of a recombination of CRF01_AE with subtype B or CRF07_BC. Six clusters were formed in this analysis; clusters I and II mainly circulated in southwest China. Clusters III and IV mainly circulated in southwest, southeast, and central China. Clusters V and VI mainly circulated in north and northeast China. Clusters III and IV may facilitate the transmission of the CRF01_AE strain from the southwest to the north and northeast regions of China. HIV subtypes are becoming diverse with the persistent epidemic in this geographic region. In brief, our results indicate that the molecular epidemiology of HIV is trending to be more complex. Thus, timely molecular epidemiologic supervision of HIV is necessary, especially for the MSM population. © 2015 Wiley Periodicals, Inc.
Synchronized dynamics of bacterial niche-specific functions during biofilm development in a cold seep brine pool.

PubMed

Zhang, Weipeng; Wang, Yong; Bougouffa, Salim; Tian, Renmao; Cao, Huiluo; Li, Yongxin; Cai, Lin; Wong, Yue Him; Zhang, Gen; Zhou, Guowei; Zhang, Xixiang; Bajic, Vladimir B; Al-Suwailem, Abdulaziz; Qian, Pei-Yuan

2015-10-01

The biology of biofilm in deep-sea environments is barely being explored. Here, biofilms were developed at the brine pool (characterized by limited carbon sources) and the normal bottom water adjacent to Thuwal cold seeps. Comparative metagenomics based on 50 Gb datasets identified polysaccharide degradation, nitrate reduction and proteolysis as enriched functional categories for brine biofilms. The genomes of two dominant species: a novel Deltaproteobacterium and a novel Epsilonproteobacterium in the brine biofilms were reconstructed. Despite rather small genome sizes, the Deltaproteobacterium possessed enhanced polysaccharide fermentation pathways, whereas the Epsilonproteobacterium was a versatile nitrogen reactor possessing nar, nap and nif gene clusters. These metabolic functions, together with specific regulatory and hypersaline-tolerant genes, made the two bacteria unique compared with their close relatives, including those from hydrothermal vents. Moreover, these functions were regulated by biofilm development, as both the abundance and the expression level of key functional genes were higher in later stage biofilms, and co-occurrences between the two dominant bacteria were demonstrated. Collectively, unique mechanisms were revealed: (i) polysaccharides fermentation, proteolysis interacted with nitrogen cycling to form a complex chain for energy generation, and (ii) remarkably exploiting and organizing niche-specific functions would be an important strategy for biofilm-dependent adaptation to the extreme conditions. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.

Molecular Analysis of an Outbreak of Lethal Postpartum Sepsis Caused by Streptococcus pyogenes

PubMed Central

Turner, Claire E.; Dryden, Matthew; Holden, Matthew T. G.; Davies, Frances J.; Lawrenson, Richard A.; Farzaneh, Leili; Bentley, Stephen D.; Efstratiou, Androulla

2013-01-01

Sepsis is now the leading direct cause of maternal death in the United Kingdom, and Streptococcus pyogenes is the leading pathogen. We combined conventional and genomic analyses to define the duration and scale of a lethal outbreak. Two postpartum deaths caused by S. pyogenes occurred within 24 h; one was characterized by bacteremia and shock and the other by hemorrhagic pneumonia. The women gave birth within minutes of each other in the same maternity unit 2 days earlier. Seven additional infections in health care and household contacts were subsequently detected and treated. All cluster-associated S. pyogenes isolates were genotype emm1 and were initially indistinguishable from other United Kingdom emm1 isolates. Sequencing of the virulence gene sic revealed that all outbreak isolates had the same unique sic type. Genome sequencing confirmed that the cluster was caused by a unique S. pyogenes clone. Transmission between patients occurred on a single day and was associated with casual contact only. A single isolate from one patient demonstrated a sequence change in sic consistent with longer infection duration. Transmission to health care workers was traced to single clinical contacts with index cases. The last case was detected 18 days after the first case. Following enhanced surveillance, the outbreak isolate was not detected again. Mutations in bacterial regulatory genes played no detectable role in this outbreak, illustrating the intrinsic ability of emm1 S. pyogenes to spread while retaining virulence. This fast-moving outbreak highlights the potential of S. pyogenes to cause a range of diseases in the puerperium with rapid transmission, underlining the importance of immediate recognition and response by clinical infection and occupational health teams. PMID:23616448
Missing link in the evolution of Hox clusters.

PubMed

Ogishima, Soichi; Tanaka, Hiroshi

2007-01-31

Hox cluster has key roles in regulating the patterning of the antero-posterior axis in a metazoan embryo. It consists of the anterior, central and posterior genes; the central genes have been identified only in bilaterians, but not in cnidarians, and are responsible for archiving morphological complexity in bilaterian development. However, their evolutionary history has not been revealed, that is, there has been a "missing link". Here we show the evolutionary history of Hox clusters of 18 bilaterians and 2 cnidarians by using a new method, "motif-based reconstruction", examining the gain/loss processes of evolutionarily conserved sequences, "motifs", outside the homeodomain. We successfully identified the missing link in the evolution of Hox clusters between the cnidarian-bilaterian ancestor and the bilaterians as the ancestor of the central genes, which we call the proto-central gene. Exploring the correspondent gene with the proto-central gene, we found that one of the acoela Hox genes has the same motif repertory as that of the proto-central gene. This interesting finding suggests that the acoela Hox cluster corresponds with the missing link in the evolution of the Hox cluster between the cnidarian-bilaterian ancestor and the bilaterians. Our findings suggested that motif gains/diversifications led to the explosive diversity of the bilaterian body plan.
Heterochromatin influences the secondary metabolite profile in the plant pathogen Fusarium graminearum

PubMed Central

Reyes-Dominguez, Yazmid; Boedi, Stefan; Sulyok, Michael; Wiesenberger, Gerlinde; Stoppacher, Norbert; Krska, Rudolf; Strauss, Joseph

2012-01-01

Chromatin modifications and heterochromatic marks have been shown to be involved in the regulation of secondary metabolism gene clusters in the fungal model system Aspergillus nidulans. We examine here the role of HEP1, the heterochromatin protein homolog of Fusarium graminearum, for the production of secondary metabolites. Deletion of Hep1 in a PH-1 background strongly influences expression of genes required for the production of aurofusarin and the main tricothecene metabolite DON. In the Hep1 deletion strains AUR genes are highly up-regulated and aurofusarin production is greatly enhanced suggesting a repressive role for heterochromatin on gene expression of this cluster. Unexpectedly, gene expression and metabolites are lower for the trichothecene cluster suggesting a positive function of Hep1 for DON biosynthesis. However, analysis of histone modifications in chromatin of AUR and DON gene promoters reveals that in both gene clusters the H3K9me3 heterochromatic mark is strongly reduced in the Hep1 deletion strain. This, and the finding that a DON-cluster flanking gene is up-regulated, suggests that the DON biosynthetic cluster is repressed by HEP1 directly and indirectly. Results from this study point to a conserved mode of secondary metabolite (SM) biosynthesis regulation in fungi by chromatin modifications and the formation of facultative heterochromatin. PMID:22100541
Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features

PubMed Central

2011-01-01

Background Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Methods Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Results Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. Conclusion This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer. PMID:22044755
Characterization of Antimicrobial Resistance Patterns and Detection of Virulence Genes in Campylobacter Isolates in Italy

PubMed Central

Di Giannatale, Elisabetta; Di Serafino, Gabriella; Zilli, Katiuscia; Alessiani, Alessandra; Sacchini, Lorena; Garofolo, Giuliano; Aprea, Giuseppe; Marotta, Francesca

2014-01-01

Campylobacter has developed resistance to several antimicrobial agents over the years, including macrolides, quinolones and fluoroquinolones, becoming a significant public health hazard. A total of 145 strains derived from raw milk, chicken faeces, chicken carcasses, cattle faeces and human faeces collected from various Italian regions, were screened for antimicrobial susceptibility, molecular characterization (SmaI pulsed-field gel electrophoresis) and detection of virulence genes (sequencing and DNA microarray analysis). The prevalence of C. jejuni and C. coli was 62.75% and 37.24% respectively. Antimicrobial susceptibility revealed a high level of resistance for ciprofloxacin (62.76%), tetracycline (55.86%) and nalidixic acid (55.17%). Genotyping of Campylobacter isolates using PFGE revealed a total of 86 unique SmaI patterns. Virulence gene profiles were determined using a new microbial diagnostic microarray composed of 70-mer oligonucleotide probes targeting genes implicated in Campylobacter pathogenicity. Correspondence between PFGE and microarray clusters was observed. Comparisons of PFGE and virulence profiles reflected the high genetic diversity of the strains examined, leading us to speculate different degrees of pathogenicity inside Campylobacter populations. PMID:24556669
Characterization of antimicrobial resistance patterns and detection of virulence genes in Campylobacter isolates in Italy.

PubMed

Di Giannatale, Elisabetta; Di Serafino, Gabriella; Zilli, Katiuscia; Alessiani, Alessandra; Sacchini, Lorena; Garofolo, Giuliano; Aprea, Giuseppe; Marotta, Francesca

2014-02-19

Campylobacter has developed resistance to several antimicrobial agents over the years, including macrolides, quinolones and fluoroquinolones, becoming a significant public health hazard. A total of 145 strains derived from raw milk, chicken faeces, chicken carcasses, cattle faeces and human faeces collected from various Italian regions, were screened for antimicrobial susceptibility, molecular characterization (SmaI pulsed-field gel electrophoresis) and detection of virulence genes (sequencing and DNA microarray analysis). The prevalence of C. jejuni and C. coli was 62.75% and 37.24% respectively. Antimicrobial susceptibility revealed a high level of resistance for ciprofloxacin (62.76%), tetracycline (55.86%) and nalidixic acid (55.17%). Genotyping of Campylobacter isolates using PFGE revealed a total of 86 unique SmaI patterns. Virulence gene profiles were determined using a new microbial diagnostic microarray composed of 70-mer oligonucleotide probes targeting genes implicated in Campylobacter pathogenicity. Correspondence between PFGE and microarray clusters was observed. Comparisons of PFGE and virulence profiles reflected the high genetic diversity of the strains examined, leading us to speculate different degrees of pathogenicity inside Campylobacter populations.
Genome-wide analysis of the genetic regulation of gene expression in human neutrophils

PubMed Central

Andiappan, Anand Kumar; Melchiotti, Rossella; Poh, Tuang Yeow; Nah, Michelle; Puan, Kia Joo; Vigano, Elena; Haase, Doreen; Yusof, Nurhashikin; San Luis, Boris; Lum, Josephine; Kumar, Dilip; Foo, Shihui; Zhuang, Li; Vasudev, Anusha; Irwanto, Astrid; Lee, Bernett; Nardin, Alessandra; Liu, Hong; Zhang, Furen; Connolly, John; Liu, Jianjun; Mortellaro, Alessandra; Wang, De Yun; Poidinger, Michael; Larbi, Anis; Zolezzi, Francesca; Rotzschke, Olaf

2015-01-01

Neutrophils are an abundant immune cell type involved in both antimicrobial defence and autoimmunity. The regulation of their gene expression, however, is still largely unknown. Here we report an eQTL study on isolated neutrophils from 114 healthy individuals of Chinese ethnicity, identifying 21,210 eQTLs on 832 unique genes. Unsupervised clustering analysis of these eQTLs confirms their role in inflammatory responses and immunological diseases but also indicates strong involvement in dermatological pathologies. One of the strongest eQTL identified (rs2058660) is also the tagSNP of a linkage block reported to affect leprosy and Crohn's disease in opposite directions. In a functional study, we can link the C allele with low expression of the β-chain of IL18-receptor (IL18RAP). In neutrophils, this results in a reduced responsiveness to IL-18, detected both on the RNA and protein level. Thus, the polymorphic regulation of human neutrophils can impact beneficial as well as pathological inflammatory responses. PMID:26259071
Plastid genome sequence of an ornamental and editable fruit tree of Rosaceae, Prunus mume.

PubMed

Wang, Shuo; Gao, Cheng-Wen; Gao, Li-Zhi

2016-11-01

Here we assembled and analyzed the complete chloroplast genome of Prunus mume, a popular ornamental and editable fruit tree of Rosaceae. The cp genome exhibited a circular DNA molecule of 157 712 bp with a typical quadripartite structure consisted of two inverted repeat regions (IRa and IRb) of 26 394 bp separated by large (LSC) and small (SSC) single-copy regions of 85 861 and 19 063 bp, respectively. It encoded 112 unique genes, 19 of which were duplicated in the IR regions, giving a total of 131 genes. Eighteen of these genes harbored one or two introns. GC content was 38.9%, and coding regions accounted for 51.3% of the genome. Phylogenetic analysis showed that P. mume clustered with P. persica and P. kansuensis in the genus Punus. This newly determined chloroplast genome will enhance modern breeding programs for the purpose of genetic improvement of this valuable plant.
A bioinformatics transcriptome meta-analysis highlights the importance of trophoblast differentiation in the pathology of hydatidiform moles.

PubMed

Desterke, Christophe; Slim, Rima; Candelier, Jean-Jacques

2018-05-01

Hydatidiform mole (HM) is an aberrant human pregnancy with abnormal trophoblastic development, migration/invasion of the extravillous trophoblast in the decidua. These abnormalities are established in a hypoxic environment during the first trimester of gestation. Using text mining, we identified 72 unique genes that are linked to HM (HM-linked genes) that we studied by bioinformatic analysis in publicly available transcriptomes of primary chorionic villous cells (cytotrophoblast, syncytiotrophoblast, extravillous trophoblast, and arterial and venous endothelial) isolated from normal placentas or established trophoblastic cell lines cultured under different oxygen concentrations. We show that the majority of HM-linked genes (75%) are involved in normal trophoblastic differentiation, arranged in clusters, and some of them are implicated in chorionic villous invasion or regulated by oxygen concentrations. Our analysis integrates the various aspects of the pathophysiology of HM and highlights the importance of trophoblastic differentiation in this pathology. Copyright © 2018 Elsevier Ltd. All rights reserved.
Globin gene structure in a reptile supports the transpositional model for amniote α- and β-globin gene evolution.

PubMed

Patel, Vidushi S; Ezaz, Tariq; Deakin, Janine E; Graves, Jennifer A Marshall

2010-12-01

The haemoglobin protein, required for oxygen transportation in the body, is encoded by α- and β-globin genes that are arranged in clusters. The transpositional model for the evolution of distinct α-globin and β-globin clusters in amniotes is much simpler than the previously proposed whole genome duplication model. According to this model, all jawed vertebrates share one ancient region containing α- and β-globin genes and several flanking genes in the order MPG-C16orf35-(α-β)-GBY-LUC7L that has been conserved for more than 410 million years, whereas amniotes evolved a distinct β-globin cluster by insertion of a transposed β-globin gene from this ancient region into a cluster of olfactory receptors flanked by CCKBR and RRM1. It could not be determined whether this organisation is conserved in all amniotes because of the paucity of information from non-avian reptiles. To fill in this gap, we examined globin gene organisation in a squamate reptile, the Australian bearded dragon lizard, Pogona vitticeps (Agamidae). We report here that the α-globin cluster (HBK, HBA) is flanked by C16orf35 and GBY and is located on a pair of microchromosomes, whereas the β-globin cluster is flanked by RRM1 on the 3' end and is located on the long arm of chromosome 3. However, the CCKBR gene that flanks the β-globin cluster on the 5' end in other amniotes is located on the short arm of chromosome 5 in P. vitticeps, indicating that a chromosomal break between the β-globin cluster and CCKBR occurred at least in the agamid lineage. Our data from a reptile species provide further evidence to support the transpositional model for the evolution of β-globin gene cluster in amniotes.
A quorum-sensing molecule acts as a morphogen controlling gas vesicle organelle biogenesis and adaptive flotation in an enterobacterium

PubMed Central

Ramsay, Joshua P.; Williamson, Neil R.; Spring, David R.; Salmond, George P. C.

2011-01-01

Gas vesicles are hollow intracellular proteinaceous organelles produced by aquatic Eubacteria and Archaea, including cyanobacteria and halobacteria. Gas vesicles increase buoyancy and allow taxis toward air–liquid interfaces, enabling subsequent niche colonization. Here we report a unique example of gas vesicle-mediated flotation in an enterobacterium; Serratia sp. strain ATCC39006. This strain is a member of the Enterobacteriaceae previously studied for its production of prodigiosin and carbapenem antibiotics. Genes required for gas vesicle synthesis mapped to a 16.6-kb gene cluster encoding three distinct homologs of the main structural protein, GvpA. Heterologous expression of this locus in Escherichia coli induced copious vesicle production and efficient cell buoyancy. Gas vesicle morphogenesis in Serratia enabled formation of a pellicle-like layer of highly vacuolated cells, which was dependent on oxygen limitation and the expression of ntrB/C and cheY-like regulatory genes within the gas-vesicle gene cluster. Gas vesicle biogenesis was strictly controlled by intercellular chemical signaling, through an N-acyl homoserine lactone, indicating that in this system the quorum-sensing molecule acts as a morphogen initiating organelle development. Flagella-based motility and gas vesicle morphogenesis were also oppositely regulated by the small RNA-binding protein, RsmA, suggesting environmental adaptation through physiological control of the choice between motility and flotation as alternative taxis modes. We propose that gas vesicle biogenesis in this strain represents a distinct mechanism of mobility, regulated by oxygen availability, nutritional status, the RsmA global regulatory system, and the quorum-sensing morphogen. PMID:21873216
A quorum-sensing molecule acts as a morphogen controlling gas vesicle organelle biogenesis and adaptive flotation in an enterobacterium.

PubMed

Ramsay, Joshua P; Williamson, Neil R; Spring, David R; Salmond, George P C

2011-09-06

Gas vesicles are hollow intracellular proteinaceous organelles produced by aquatic Eubacteria and Archaea, including cyanobacteria and halobacteria. Gas vesicles increase buoyancy and allow taxis toward air-liquid interfaces, enabling subsequent niche colonization. Here we report a unique example of gas vesicle-mediated flotation in an enterobacterium; Serratia sp. strain ATCC39006. This strain is a member of the Enterobacteriaceae previously studied for its production of prodigiosin and carbapenem antibiotics. Genes required for gas vesicle synthesis mapped to a 16.6-kb gene cluster encoding three distinct homologs of the main structural protein, GvpA. Heterologous expression of this locus in Escherichia coli induced copious vesicle production and efficient cell buoyancy. Gas vesicle morphogenesis in Serratia enabled formation of a pellicle-like layer of highly vacuolated cells, which was dependent on oxygen limitation and the expression of ntrB/C and cheY-like regulatory genes within the gas-vesicle gene cluster. Gas vesicle biogenesis was strictly controlled by intercellular chemical signaling, through an N-acyl homoserine lactone, indicating that in this system the quorum-sensing molecule acts as a morphogen initiating organelle development. Flagella-based motility and gas vesicle morphogenesis were also oppositely regulated by the small RNA-binding protein, RsmA, suggesting environmental adaptation through physiological control of the choice between motility and flotation as alternative taxis modes. We propose that gas vesicle biogenesis in this strain represents a distinct mechanism of mobility, regulated by oxygen availability, nutritional status, the RsmA global regulatory system, and the quorum-sensing morphogen.
Circumpolar Genetic Structure and Recent Gene Flow of Polar Bears: A Reanalysis

PubMed Central

Malenfant, René M.; Davis, Corey S.; Cullingham, Catherine I.; Coltman, David W.

2016-01-01

Recently, an extensive study of 2,748 polar bears (Ursus maritimus) from across their circumpolar range was published in PLOS ONE, which used microsatellites and mitochondrial haplotypes to apparently show altered population structure and a dramatic change in directional gene flow towards the Canadian Archipelago—an area believed to be a future refugium for polar bears as their southernmost habitats decline under climate change. Although this study represents a major international collaborative effort and promised to be a baseline for future genetics work, methodological shortcomings and errors of interpretation undermine some of the study’s main conclusions. Here, we present a reanalysis of this data in which we address some of these issues, including: (1) highly unbalanced sample sizes and large amounts of systematically missing data; (2) incorrect calculation of FST and of significance levels; (3) misleading estimates of recent gene flow resulting from non-convergence of the program BayesAss. In contrast to the original findings, in our reanalysis we find six genetic clusters of polar bears worldwide: the Hudson Bay Complex, the Western and Eastern Canadian Arctic Archipelago, the Western and Eastern Polar Basin, and—importantly—we reconfirm the presence of a unique and possibly endangered cluster of bears in Norwegian Bay near Canada’s expected last sea-ice refugium. Although polar bears’ abundance, distribution, and population structure will certainly be negatively affected by ongoing—and increasingly rapid—loss of Arctic sea ice, these genetic data provide no evidence of strong directional gene flow in response to recent climate change. PMID:26974333
Phylogenetic evidence for intratypic recombinant events in a novel human adenovirus C that causes severe acute respiratory infection in children.

PubMed

Wang, Yanqun; Li, Yamin; Lu, Roujian; Zhao, Yanjie; Xie, Zhengde; Shen, Jun; Tan, Wenjie

2016-03-10

Human adenoviruses (HAdVs) are prevalent in hospitalized children with severe acute respiratory infection (SARI). Here, we report a unique recombinant HAdV strain (CBJ113) isolated from a HAdV-positive child with SARI. The whole-genome sequence was determined using Sanger sequencing and high-throughput sequencing. A phylogenetic analysis of the complete genome indicated that the CBJ113 strain shares a common origin with HAdV-C2, HAdV-C6, HAdV-C1, HAdV-C5, and HAdV-C57 and formed a novel subclade on the same branch as other HAdV-C subtypes. BootScan and single nucleotide polymorphism analyses showed that the CBJ113 genome has an intra-subtype recombinant structure and comprises gene regions mainly originating from two circulating viral strains: HAdV-1 and HAdV-2. The parental penton base, pVI, and DBP genes of the recombinant strain clustered with the HAdV-1 prototype strain, and the E1B, hexon, fiber, and 100 K genes of the recombinant clustered within the HAdV-2 subtype, meanwhile the E4orf1 and DNA polymerase genes of the recombinant shared the greatest similarity with those of HAdV-5 and HAdV-6, respectively. All of these findings provide insight into our understanding of the dynamics of the complexity of the HAdV-C epidemic. More extensive studies should address the pathogenicity and clinical characteristics of the novel recombinant.
Phylogenetic evidence for intratypic recombinant events in a novel human adenovirus C that causes severe acute respiratory infection in children

PubMed Central

Wang, Yanqun; Li, Yamin; Lu, Roujian; Zhao, Yanjie; Xie, Zhengde; Shen, Jun; Tan, Wenjie

2016-01-01

Human adenoviruses (HAdVs) are prevalent in hospitalized children with severe acute respiratory infection (SARI). Here, we report a unique recombinant HAdV strain (CBJ113) isolated from a HAdV-positive child with SARI. The whole-genome sequence was determined using Sanger sequencing and high-throughput sequencing. A phylogenetic analysis of the complete genome indicated that the CBJ113 strain shares a common origin with HAdV-C2, HAdV-C6, HAdV-C1, HAdV-C5, and HAdV-C57 and formed a novel subclade on the same branch as other HAdV-C subtypes. BootScan and single nucleotide polymorphism analyses showed that the CBJ113 genome has an intra-subtype recombinant structure and comprises gene regions mainly originating from two circulating viral strains: HAdV-1 and HAdV-2. The parental penton base, pVI, and DBP genes of the recombinant strain clustered with the HAdV-1 prototype strain, and the E1B, hexon, fiber, and 100 K genes of the recombinant clustered within the HAdV-2 subtype, meanwhile the E4orf1 and DNA polymerase genes of the recombinant shared the greatest similarity with those of HAdV-5 and HAdV-6, respectively. All of these findings provide insight into our understanding of the dynamics of the complexity of the HAdV-C epidemic. More extensive studies should address the pathogenicity and clinical characteristics of the novel recombinant. PMID:26960434
Heterologous Production of a Novel Cyclic Peptide Compound, KK-1, in Aspergillus oryzae.

PubMed

Yoshimi, Akira; Yamaguchi, Sigenari; Fujioka, Tomonori; Kawai, Kiyoshi; Gomi, Katsuya; Machida, Masayuki; Abe, Keietsu

2018-01-01

A novel cyclic peptide compound, KK-1, was originally isolated from the plant-pathogenic fungus Curvularia clavata . It consists of 10 amino acid residues, including five N -methylated amino acid residues, and has potent antifungal activity. Recently, the genome-sequencing analysis of C. clavata was completed, and the biosynthetic genes involved in KK-1 production were predicted by using a novel gene cluster mining tool, MIDDAS-M. These genes form an approximately 75-kb cluster, which includes nine open reading frames, containing a non-ribosomal peptide synthetase (NRPS) gene. To determine whether the predicted genes were responsible for the biosynthesis of KK-1, we performed heterologous production of KK-1 in Aspergillus oryzae by introduction of the cluster genes into the genome of A. oryzae . The NRPS gene was split in two fragments and then reconstructed in the A. oryzae genome, because the gene was quite large (approximately 40 kb). The remaining seven genes in the cluster, excluding the regulatory gene kkR , were simultaneously introduced into the strain of A. oryzae in which NRPS had already been incorporated. To evaluate the heterologous production of KK-1 in A. oryzae , gene expression was analyzed by RT-PCR and KK-1 productivity was quantified by HPLC. KK-1 was produced in variable quantities by a number of transformed strains, along with expression of the cluster genes. The amount of KK-1 produced by the strain with the greatest expression of all genes was lower than that produced by the original producer, C. clavata . Therefore, expression of the cluster genes is necessary and sufficient for the heterologous production of KK-1 in A. oryzae , although there may be unknown factors limiting productivity in this species.
Heterologous Production of a Novel Cyclic Peptide Compound, KK-1, in Aspergillus oryzae

PubMed Central

Yoshimi, Akira; Yamaguchi, Sigenari; Fujioka, Tomonori; Kawai, Kiyoshi; Gomi, Katsuya; Machida, Masayuki; Abe, Keietsu

2018-01-01

A novel cyclic peptide compound, KK-1, was originally isolated from the plant-pathogenic fungus Curvularia clavata. It consists of 10 amino acid residues, including five N-methylated amino acid residues, and has potent antifungal activity. Recently, the genome-sequencing analysis of C. clavata was completed, and the biosynthetic genes involved in KK-1 production were predicted by using a novel gene cluster mining tool, MIDDAS-M. These genes form an approximately 75-kb cluster, which includes nine open reading frames, containing a non-ribosomal peptide synthetase (NRPS) gene. To determine whether the predicted genes were responsible for the biosynthesis of KK-1, we performed heterologous production of KK-1 in Aspergillus oryzae by introduction of the cluster genes into the genome of A. oryzae. The NRPS gene was split in two fragments and then reconstructed in the A. oryzae genome, because the gene was quite large (approximately 40 kb). The remaining seven genes in the cluster, excluding the regulatory gene kkR, were simultaneously introduced into the strain of A. oryzae in which NRPS had already been incorporated. To evaluate the heterologous production of KK-1 in A. oryzae, gene expression was analyzed by RT-PCR and KK-1 productivity was quantified by HPLC. KK-1 was produced in variable quantities by a number of transformed strains, along with expression of the cluster genes. The amount of KK-1 produced by the strain with the greatest expression of all genes was lower than that produced by the original producer, C. clavata. Therefore, expression of the cluster genes is necessary and sufficient for the heterologous production of KK-1 in A. oryzae, although there may be unknown factors limiting productivity in this species. PMID:29686660
The biosynthetic gene cluster for the cyanogenic glucoside dhurrin in Sorghum bicolor contains its co-expressed vacuolar MATE transporter

PubMed Central

Darbani, Behrooz; Motawia, Mohammed Saddik; Olsen, Carl Erik; Nour-Eldin, Hussam H.; Møller, Birger Lindberg; Rook, Fred

2016-01-01

Genomic gene clusters for the biosynthesis of chemical defence compounds are increasingly identified in plant genomes. We previously reported the independent evolution of biosynthetic gene clusters for cyanogenic glucoside biosynthesis in three plant lineages. Here we report that the gene cluster for the cyanogenic glucoside dhurrin in Sorghum bicolor additionally contains a gene, SbMATE2, encoding a transporter of the multidrug and toxic compound extrusion (MATE) family, which is co-expressed with the biosynthetic genes. The predicted localisation of SbMATE2 to the vacuolar membrane was demonstrated experimentally by transient expression of a SbMATE2-YFP fusion protein and confocal microscopy. Transport studies in Xenopus laevis oocytes demonstrate that SbMATE2 is able to transport dhurrin. In addition, SbMATE2 was able to transport non-endogenous cyanogenic glucosides, but not the anthocyanin cyanidin 3-O-glucoside or the glucosinolate indol-3-yl-methyl glucosinolate. The genomic co-localisation of a transporter gene with the biosynthetic genes producing the transported compound is discussed in relation to the role self-toxicity of chemical defence compounds may play in the formation of gene clusters. PMID:27841372
Overproduction of Ristomycin A by Activation of a Silent Gene Cluster in Amycolatopsis japonicum MG417-CF17

PubMed Central

Spohn, Marius; Kirchner, Norbert; Kulik, Andreas; Jochim, Angelika; Wolf, Felix; Muenzer, Patrick; Borst, Oliver; Gross, Harald; Wohlleben, Wolfgang

2014-01-01

The emergence of antibiotic-resistant pathogenic bacteria within the last decades is one reason for the urgent need for new antibacterial agents. A strategy to discover new anti-infective compounds is the evaluation of the genetic capacity of secondary metabolite producers and the activation of cryptic gene clusters (genome mining). One genus known for its potential to synthesize medically important products is Amycolatopsis. However, Amycolatopsis japonicum does not produce an antibiotic under standard laboratory conditions. In contrast to most Amycolatopsis strains, A. japonicum is genetically tractable with different methods. In order to activate a possible silent glycopeptide cluster, we introduced a gene encoding the transcriptional activator of balhimycin biosynthesis, the bbr gene from Amycolatopsis balhimycina (bbrAba), into A. japonicum. This resulted in the production of an antibiotically active compound. Following whole-genome sequencing of A. japonicum, 29 cryptic gene clusters were identified by genome mining. One of these gene clusters is a putative glycopeptide biosynthesis gene cluster. Using bioinformatic tools, ristomycin (syn. ristocetin), a type III glycopeptide, which has antibacterial activity and which is used for the diagnosis of von Willebrand disease and Bernard-Soulier syndrome, was deduced as a possible product of the gene cluster. Chemical analyses by high-performance liquid chromatography and mass spectrometry (HPLC-MS), tandem mass spectrometry (MS/MS), and nuclear magnetic resonance (NMR) spectroscopy confirmed the in silico prediction that the recombinant A. japonicum/pRM4-bbrAba synthesizes ristomycin A. PMID:25114137
Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages.

PubMed

Elmore, M Holly; McGary, Kriston L; Wisecaver, Jennifer H; Slot, Jason C; Geiser, David M; Sink, Stacy; O'Donnell, Kerry; Rokas, Antonis

2015-02-06

Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trace its evolution across Ascomycetes, and examine the evolutionary dynamics of its spread among lineages of the Fusarium oxysporum species complex (hereafter referred to as the FOSC), a cosmopolitan clade of purportedly clonal vascular wilt plant pathogens. Phylogenetic analysis of fungal cyanase and carbonic anhydrase genes reveals that the CCA gene cluster arose independently at least twice and is now present in three lineages, namely Cochliobolus lunatus, Oidiodendron maius, and the FOSC. Genome-wide surveys within the FOSC indicate that the CCA gene cluster varies in copy number across isolates, is always located on accessory chromosomes, and is absent in FOSC's closest relatives. Phylogenetic reconstruction of the CCA gene cluster in 163 FOSC strains from a wide variety of hosts suggests a recent history of rampant transfers between isolates. We hypothesize that the independent formation of the CCA gene cluster in different fungal lineages and its spread across FOSC strains may be associated with resistance to plant-produced cyanates or to use of cyanate fungicides in agriculture. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

A de novo transcriptome and valid reference genes for quantitative real-time PCR in Colaphellus bowringi.

PubMed

Tan, Qian-Qian; Zhu, Li; Li, Yi; Liu, Wen; Ma, Wei-Hua; Lei, Chao-Liang; Wang, Xiao-Ping

2015-01-01

The cabbage beetle Colaphellus bowringi Baly is a serious insect pest of crucifers and undergoes reproductive diapause in soil. An understanding of the molecular mechanisms of diapause regulation, insecticide resistance, and other physiological processes is helpful for developing new management strategies for this beetle. However, the lack of genomic information and valid reference genes limits knowledge on the molecular bases of these physiological processes in this species. Using Illumina sequencing, we obtained more than 57 million sequence reads derived from C. bowringi, which were assembled into 39,390 unique sequences. A Clusters of Orthologous Groups classification was obtained for 9,048 of these sequences, covering 25 categories, and 16,951 were assigned to 255 Kyoto Encyclopedia of Genes and Genomes pathways. Eleven candidate reference gene sequences from the transcriptome were then identified through reverse transcriptase polymerase chain reaction. Among these candidate genes, EF1α, ACT1, and RPL19 proved to be the most stable reference genes for different reverse transcriptase quantitative polymerase chain reaction experiments in C. bowringi. Conversely, aTUB and GAPDH were the least stable reference genes. The abundant putative C. bowringi transcript sequences reported enrich the genomic resources of this beetle. Importantly, the larger number of gene sequences and valid reference genes provide a valuable platform for future gene expression studies, especially with regard to exploring the molecular mechanisms of different physiological processes in this species.
Gene cluster conservation provides insight into cercosporin biosynthesis and extends production to the genus Colletotrichum.

PubMed

de Jonge, Ronnie; Ebert, Malaika K; Huitt-Roehl, Callie R; Pal, Paramita; Suttle, Jeffrey C; Spanner, Rebecca E; Neubauer, Jonathan D; Jurick, Wayne M; Stott, Karina A; Secor, Gary A; Thomma, Bart P H J; Van de Peer, Yves; Townsend, Craig A; Bolton, Melvin D

2018-06-12

Species in the genus Cercospora cause economically devastating diseases in sugar beet, maize, rice, soy bean, and other major food crops. Here, we sequenced the genome of the sugar beet pathogen Cercospora beticola and found it encodes 63 putative secondary metabolite gene clusters, including the cercosporin toxin biosynthesis ( CTB ) cluster. We show that the CTB gene cluster has experienced multiple duplications and horizontal transfers across a spectrum of plant pathogenic fungi, including the wide-host range Colletotrichum genus as well as the rice pathogen Magnaporthe oryzae Although cercosporin biosynthesis has been thought to rely on an eight-gene CTB cluster, our phylogenomic analysis revealed gene collinearity adjacent to the established cluster in all CTB cluster-harboring species. We demonstrate that the CTB cluster is larger than previously recognized and includes cercosporin facilitator protein, previously shown to be involved with cercosporin autoresistance, and four additional genes required for cercosporin biosynthesis, including the final pathway enzymes that install the unusual cercosporin methylenedioxy bridge. Lastly, we demonstrate production of cercosporin by Colletotrichum fioriniae , the first known cercosporin producer within this agriculturally important genus. Thus, our results provide insight into the intricate evolution and biology of a toxin critical to agriculture and broaden the production of cercosporin to another fungal genus containing many plant pathogens of important crops worldwide. Copyright © 2018 the Author(s). Published by PNAS.
Bioinformatic analysis of the nucleotide binding site-encoding disease-resistance genes in foxtail millet (Setaria italica (L.) Beauv.).

PubMed

Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G

2014-08-28

The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.
Acquisition and evolution of plant pathogenesis-associated gene clusters and candidate determinants of tissue-specificity in xanthomonas.

PubMed

Lu, Hong; Patil, Prabhu; Van Sluys, Marie-Anne; White, Frank F; Ryan, Robert P; Dow, J Maxwell; Rabinowicz, Pablo; Salzberg, Steven L; Leach, Jan E; Sonti, Ramesh; Brendel, Volker; Bogdanove, Adam J

2008-01-01

Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown. To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors) cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage. Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major modifications or wholesale exchange of clusters, but subtle changes in a small number of genes or in non-coding sequences, and/or differences outside the clusters, potentially among regulatory targets or secretory substrates.
[Chromosomal large fragment deletion induced by CRISPR/Cas9 gene editing system].

PubMed

Cheng, L H; Liu, Y; Niu, T

2017-05-14

Objective: Using CRISPR-Cas9 gene editing technology to achieve a number of genes co-deletion on the same chromosome. Methods: CRISPR-Cas9 lentiviral plasmid that could induce deletion of Aloxe3-Alox12b-Alox8 cluster genes located on mouse 11B3 chromosome was constructed via molecular clone. HEK293T cells were transfected to package lentivirus of CRISPR or Cas9 cDNA, then mouse NIH3T3 cells were infected by lentivirus and genomic DNA of these cells was extracted. The deleted fragment was amplified by PCR, TA clone, Sanger sequencing and other techniques were used to confirm the deletion of Aloxe3-Alox12b-Alox8 cluster genes. Results: The CRISPR-Cas9 lentiviral plasmid, which could induce deletion of Aloxe3-Alox12b-Alox8 cluster genes, was successfully constructed. Deletion of target chromosome fragment (Aloxe3-Alox12b-Alox8 cluster genes) was verified by PCR. The deletion of Aloxe3-Alox12b-Alox8 cluster genes was affirmed by TA clone, Sanger sequencing, and the breakpoint junctions of the CRISPR-Cas9 system mediate cutting events were accurately recombined, insertion mutation did not occur between two cleavage sites at all. Conclusion: Large fragment deletion of Aloxe3-Alox12b-Alox8 cluster genes located on mouse chromosome 11B3 was successfully induced by CRISPR-Cas9 gene editing system.
Genomics-driven discovery of the pneumocandin biosynthetic gene cluster in the fungus Glarea lozoyensis

PubMed Central

2013-01-01

Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the Δglnrps4 and Δglpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom. PMID:23688303
Gene Expression Profiles of Sporadic Canine Hemangiosarcoma Are Uniquely Associated with Breed

PubMed Central

Tamburini, Beth A.; Trapp, Susan; Phang, Tzu Lip; Schappa, Jill T.; Hunter, Lawrence E.; Modiano, Jaime F.

2009-01-01

The role an individual's genetic background plays on phenotype and biological behavior of sporadic tumors remains incompletely understood. We showed previously that lymphomas from Golden Retrievers harbor defined, recurrent chromosomal aberrations that occur less frequently in lymphomas from other dog breeds, suggesting spontaneous canine tumors provide suitable models to define how heritable traits influence cancer genotypes. Here, we report a complementary approach using gene expression profiling in a naturally occurring endothelial sarcoma of dogs (hemangiosarcoma). Naturally occurring hemangiosarcomas of Golden Retrievers clustered separately from those of non-Golden Retrievers, with contributions from transcription factors, survival factors, and from pro-inflammatory and angiogenic genes, and which were exclusively present in hemangiosarcoma and not in other tumors or normal cells (i.e., they were not due simply to variation in these genes among breeds). Vascular Endothelial Growth Factor Receptor 1 (VEGFR1) was among genes preferentially enriched within known pathways derived from gene set enrichment analysis when characterizing tumors from Golden Retrievers versus other breeds. Heightened VEGFR1 expression in these tumors also was apparent at the protein level and targeted inhibition of VEGFR1 increased proliferation of hemangiosarcoma cells derived from tumors of Golden Retrievers, but not from other breeds. Our results suggest heritable factors mold gene expression phenotypes, and consequently biological behavior in sporadic, naturally occurring tumors. PMID:19461996
Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum[OPEN

PubMed Central

Hardigan, Michael A.; Crisovan, Emily; Hamilton, John P.; Laimbeer, Parker; Leisner, Courtney P.; Manrique-Carpintero, Norma C.; Newton, Linsey; Pham, Gina M.; Vaillancourt, Brieanne; Zeng, Zixian; Jiang, Jiming

2016-01-01

Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996
Whole genome analyses of marine fish pathogenic isolate, Mycobacterium sp. 012931.

PubMed

Kurokawa, Satoru; Kabayama, Jun; Hwang, Seong Don; Nho, Seong Won; Hikima, Jun-ichi; Jung, Tae Sung; Kondo, Hidehiro; Hirono, Ikuo; Takeyama, Haruko; Mori, Tetsushi; Aoki, Takashi

2014-10-01

Mycobacterium is a genus within the order Actinomycetales that comprises of a large number of well-characterized species, several of which includes pathogens known to cause serious disease in human and animal. Here, we report the whole genome sequence of Mycobacterium sp. strain 012931 isolated from the marine fish, yellowtail (Seriola quinqueradiata). Mycobacterium sp. 012931 is a fish pathogen causing serious damage to aquaculture farms in Japan. DNA dot plot analysis showed that Mycobacterium sp. 012931 was more closely related to Mycobacterium marinum when compared across several Mycobacterium species. However, little conservation of the gene order was observed between Mycobacterium sp. 012931 and M. marinum genome. The annotated 5,464 genes of Mycobacterium sp. 012931 was classified into 26 subsystems. The insertion/deletion gene analysis shows Mycobacterium sp. 012931 had 643 unique genes that were not found in the M. marinum strains. In the virulence, disease, and defense subsystem, both insertion and deletion genes of Mycobacterium sp. 012931 were associated with the PPE gene cluster of Mycobacteria. Of seven plcB genes in Mycobacterium sp. 012931, plcB_2 and plcB_3 showed low identities with those of M. marinum strains. Therefore, Mycobacterium sp. 012931 has differences on genetic and virulence from M. marinum and may induce different interaction mechanisms between host and pathogen.
Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages

USDA-ARS?s Scientific Manuscript database

Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trac...
The impact of polyploidy on the evolution of a complex NB-LRR resistance gene cluster in soybean

USDA-ARS?s Scientific Manuscript database

A comparative genomics approach was used to investigate the evolution of a complex NB-LRR gene cluster found in soybean (Glycine max), common bean (Phaseolus vulgaris), and other legumes. In soybean, the cluster is associated with several disease resistance (R) genes of known function including Rpg1...
Function and Regulation of the Formate Dehydrogenase Genes of the Methanogenic Archaeon Methanococcus maripaludis

PubMed Central

Wood, Gwendolyn E.; Haydock, Andrew K.; Leigh, John A.

2003-01-01

Methanococcus maripaludis is a mesophilic species of Archaea capable of producing methane from two substrates: hydrogen plus carbon dioxide and formate. To study the latter, we identified the formate dehydrogenase genes of M. maripaludis and found that the genome contains two gene clusters important for formate utilization. Phylogenetic analysis suggested that the two formate dehydrogenase gene sets arose from duplication events within the methanococcal lineage. The first gene cluster encodes homologs of formate dehydrogenase α (FdhA) and β (FdhB) subunits and a putative formate transporter (FdhC) as well as a carbonic anhydrase analog. The second gene cluster encodes only FdhA and FdhB homologs. Mutants lacking either fdhA gene exhibited a partial growth defect on formate, whereas a double mutant was completely unable to grow on formate as a sole methanogenic substrate. Investigation of fdh gene expression revealed that transcription of both gene clusters is controlled by the presence of H2 and not by the presence of formate. PMID:12670979
Innate responses to gene knockouts impact overlapping gene networks and vary with respect to resistance to viral infection.

PubMed

Liu, Yonghong; Liu, Yuanyuan; Wu, Jiaming; Roizman, Bernard; Zhou, Grace Guoying

2018-04-03

Analyses of the levels of mRNAs encoding IFIT1, IFI16, RIG-1, MDA5, CXCL10, LGP2, PUM1, LSD1, STING, and IFNβ in cell lines from which the gene encoding LGP2, LSD1, PML, HDAC4, IFI16, PUM1, STING, MDA5, IRF3, or HDAC 1 had been knocked out, as well as the ability of these cell lines to support the replication of HSV-1, revealed the following: ( i ) Cell lines lacking the gene encoding LGP2, PML, or HDAC4 (cluster 1) exhibited increased levels of expression of partially overlapping gene networks. Concurrently, these cell lines produced from 5 fold to 12 fold lower yields of HSV-1 than the parental cells. ( ii ) Cell lines lacking the genes encoding STING, LSD1, MDA5, IRF3, or HDAC 1 (cluster 2) exhibited decreased levels of mRNAs of partially overlapping gene networks. Concurrently, these cell lines produced virus yields that did not differ from those produced by the parental cell line. The genes up-regulated in cell lines forming cluster 1, overlapped in part with genes down-regulated in cluster 2. The key conclusions are that gene knockouts and subsequent selection for growth causes changes in expression of multiple genes, and hence the phenotype of the cell lines cannot be ascribed to a single gene; the patterns of gene expression may be shared by multiple knockouts; and the enhanced immunity to viral replication by cluster 1 knockout cell lines but not by cluster 2 cell lines suggests that in parental cells, the expression of innate resistance to infection is specifically repressed.
CRAWview: for viewing splicing variation, gene families, and polymorphism in clusters of ESTs and full-length sequences.

PubMed

Chou, A; Burke, J

1999-05-01

DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
Post-genome research on the biosynthesis of ergot alkaloids.

PubMed

Li, Shu-Ming; Unsöld, Inge A

2006-10-01

Genome sequencing provides new opportunities and challenges for identifying genes for the biosynthesis of secondary metabolites. A putative biosynthetic gene cluster of fumigaclavine C, an ergot alkaloid of the clavine type, was identified in the genome sequence of ASPERGILLUS FUMIGATUS by a bioinformatic approach. This cluster spans 22 kb of genomic DNA and comprises at least 11 open reading frames (ORFs). Seven of them are orthologous to genes from the biosynthetic gene cluster of ergot alkaloids in CLAVICEPS PURPUREA. Experimental evidence of the identified cluster was provided by heterologous expression and biochemical characterization of two ORFs, FgaPT1 and FgaPT2, in the cluster of A. FUMIGATUS, which show remarkable similarities to dimethylallyltryptophan synthase from C. PURPUREA and function as prenyltransferases. FgaPT2 converts L-tryptophan to dimethylallyltryptophan and thereby catalyzes the first step of ergot alkaloid biosynthesis, whilst FgaPT1 catalyzes the last step of the fumigaclavine C biosynthesis, i. e., the prenylation of fumigaclavine A at C-2 position of the indole nucleus. In addition to information obtained from the gene cluster of ergot alkaloids from C. PURPUREA, the identification of the biosynthetic gene cluster of fumigaclavine C in A. FUMIGATUS opens an alternative way to study the biosynthesis of ergot alkaloids in fungi.
Statistical indicators of collective behavior and functional clusters in gene networks of yeast

NASA Astrophysics Data System (ADS)

Živković, J.; Tadić, B.; Wick, N.; Thurner, S.

2006-03-01

We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.
Genome Engineering and Modification Toward Synthetic Biology for the Production of Antibiotics.

PubMed

Zou, Xuan; Wang, Lianrong; Li, Zhiqiang; Luo, Jie; Wang, Yunfu; Deng, Zixin; Du, Shiming; Chen, Shi

2018-01-01

Antibiotic production is often governed by large gene clusters composed of genes related to antibiotic scaffold synthesis, tailoring, regulation, and resistance. With the expansion of genome sequencing, a considerable number of antibiotic gene clusters has been isolated and characterized. The emerging genome engineering techniques make it possible towards more efficient engineering of antibiotics. In addition to genomic editing, multiple synthetic biology approaches have been developed for the exploration and improvement of antibiotic natural products. Here, we review the progress in the development of these genome editing techniques used to engineer new antibiotics, focusing on three aspects of genome engineering: direct cloning of large genomic fragments, genome engineering of gene clusters, and regulation of gene cluster expression. This review will not only summarize the current uses of genomic engineering techniques for cloning and assembly of antibiotic gene clusters or for altering antibiotic synthetic pathways but will also provide perspectives on the future directions of rebuilding biological systems for the design of novel antibiotics. © 2017 Wiley Periodicals, Inc.
Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

PubMed

Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V

2016-01-01

Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
Novel genomic island modifies DNA with 7-deazaguanine derivatives

PubMed Central

Thiaville, Jennifer J.; Kellner, Stefanie M.; Yuan, Yifeng; Hutinet, Geoffrey; Thiaville, Patrick C.; Jumpathong, Watthanachai; Mohapatra, Susovan; Brochier-Armanet, Celine; Letarov, Andrey V.; Hillebrand, Roman; Malik, Chanchal K.; Rizzo, Carmelo J.; Dedon, Peter C.; de Crécy-Lagard, Valérie

2016-01-01

The discovery of ∼20-kb gene clusters containing a family of paralogs of tRNA guanosine transglycosylase genes, called tgtA5, alongside 7-cyano-7-deazaguanine (preQ0) synthesis and DNA metabolism genes, led to the hypothesis that 7-deazaguanine derivatives are inserted in DNA. This was established by detecting 2’-deoxy-preQ0 and 2’-deoxy-7-amido-7-deazaguanosine in enzymatic hydrolysates of DNA extracted from the pathogenic, Gram-negative bacteria Salmonella enterica serovar Montevideo. These modifications were absent in the closely related S. enterica serovar Typhimurium LT2 and from a mutant of S. Montevideo, each lacking the gene cluster. This led us to rename the genes of the S. Montevideo cluster as dpdA-K for 7-deazapurine in DNA. Similar gene clusters were analyzed in ∼150 phylogenetically diverse bacteria, and the modifications were detected in DNA from other organisms containing these clusters, including Kineococcus radiotolerans, Comamonas testosteroni, and Sphingopyxis alaskensis. Comparative genomic analysis shows that, in Enterobacteriaceae, the cluster is a genomic island integrated at the leuX locus, and the phylogenetic analysis of the TgtA5 family is consistent with widespread horizontal gene transfer. Comparison of transformation efficiencies of modified or unmodified plasmids into isogenic S. Montevideo strains containing or lacking the cluster strongly suggests a restriction–modification role for the cluster in Enterobacteriaceae. Another preQ0 derivative, 2’-deoxy-7-formamidino-7-deazaguanosine, was found in the Escherichia coli bacteriophage 9g, as predicted from the presence of homologs of genes involved in the synthesis of the archaeosine tRNA modification. These results illustrate a deep and unexpected evolutionary connection between DNA and tRNA metabolism. PMID:26929322
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

PubMed

Bhattacharya, Anindya; De, Rajat K

2010-08-01

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. Copyright 2010 Elsevier Inc. All rights reserved.

CRISPR/Cas9-generated p47phox-deficient cell line for Chronic Granulomatous Disease gene therapy vector development.

PubMed

Wrona, Dominik; Siler, Ulrich; Reichenbach, Janine

2017-03-13

Development of gene therapy vectors requires cellular models reflecting the genetic background of a disease thus allowing for robust preclinical vector testing. For human p47 phox -deficient chronic granulomatous disease (CGD) vector testing we generated a cellular model using clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 to introduce a GT-dinucleotide deletion (ΔGT) mutation in p47 phox encoding NCF1 gene in the human acute myeloid leukemia PLB-985 cell line. CGD is a group of hereditary immunodeficiencies characterized by impaired respiratory burst activity in phagocytes due to a defective phagocytic nicotinamide adenine dinucleotide phosphate (NADPH) oxidase. In Western countries autosomal-recessive p47 phox -subunit deficiency represents the second largest CGD patient cohort with unique genetics, as the vast majority of p47 phox CGD patients carries ΔGT deletion in exon two of the NCF1 gene. The established PLB-985 NCF1 ΔGT cell line reflects the most frequent form of p47 phox -deficient CGD genetically and functionally. It can be differentiated to granulocytes efficiently, what creates an attractive alternative to currently used iPSC models for rapid testing of novel gene therapy approaches.
Analysis of the cytochrome c oxidase subunit II (COX2) gene in giant panda, Ailuropoda melanoleuca.

PubMed

Ling, S S; Zhu, Y; Lan, D; Li, D S; Pang, H Z; Wang, Y; Li, D Y; Wei, R P; Zhang, H M; Wang, C D; Hu, Y D

2017-01-23

The giant panda, Ailuropoda melanoleuca (Ursidae), has a unique bamboo-based diet; however, this low-energy intake has been sufficient to maintain the metabolic processes of this species since the fourth ice age. As mitochondria are the main sites for energy metabolism in animals, the protein-coding genes involved in mitochondrial respiratory chains, particularly cytochrome c oxidase subunit II (COX2), which is the rate-limiting enzyme in electron transfer, could play an important role in giant panda metabolism. Therefore, the present study aimed to isolate, sequence, and analyze the COX2 DNA from individuals kept at the Giant Panda Protection and Research Center, China, and compare these sequences with those of the other Ursidae family members. Multiple sequence alignment showed that the COX2 gene had three point mutations that defined three haplotypes, with 60% of the sequences corresponding to haplotype I. The neutrality tests revealed that the COX2 gene was conserved throughout evolution, and the maximum likelihood phylogenetic analysis, using homologous sequences from other Ursidae species, showed clustering of the COX2 sequences of giant pandas, suggesting that this gene evolved differently in them.
Mitochondrial DNA markers reveal high genetic diversity but low genetic differentiation in the black fly Simulium tani Takaoka & Davies along an elevational gradient in Malaysia.

PubMed

Low, Van Lun; Adler, Peter H; Takaoka, Hiroyuki; Ya'cob, Zubaidah; Lim, Phaik Eem; Tan, Tiong Kai; Lim, Yvonne A L; Chen, Chee Dhang; Norma-Rashid, Yusoff; Sofian-Azirun, Mohd

2014-01-01

The population genetic structure of Simulium tani was inferred from mitochondria-encoded sequences of cytochrome c oxidase subunits I (COI) and II (COII) along an elevational gradient in Cameron Highlands, Malaysia. A statistical parsimony network of 71 individuals revealed 71 haplotypes in the COI gene and 43 haplotypes in the COII gene; the concatenated sequences of the COI and COII genes revealed 71 haplotypes. High levels of genetic diversity but low levels of genetic differentiation were observed among populations of S. tani at five elevations. The degree of genetic diversity, however, was not in accordance with an altitudinal gradient, and a Mantel test indicated that elevation did not have a limiting effect on gene flow. No ancestral haplotype of S. tani was found among the populations. Pupae with unique structural characters at the highest elevation showed a tendency to form their own haplotype cluster, as revealed by the COII gene. Tajima's D, Fu's Fs, and mismatch distribution tests revealed population expansion of S. tani in Cameron Highlands. A strong correlation was found between nucleotide diversity and the levels of dissolved oxygen in the streams where S. tani was collected.
A Data Analytics Approach to Discovering Unique Microstructural Configurations Susceptible to Fatigue

NASA Astrophysics Data System (ADS)

Jha, S. K.; Brockman, R. A.; Hoffman, R. M.; Sinha, V.; Pilchak, A. L.; Porter, W. J.; Buchanan, D. J.; Larsen, J. M.; John, R.

2018-05-01

Principal component analysis and fuzzy c-means clustering algorithms were applied to slip-induced strain and geometric metric data in an attempt to discover unique microstructural configurations and their frequencies of occurrence in statistically representative instantiations of a titanium alloy microstructure. Grain-averaged fatigue indicator parameters were calculated for the same instantiation. The fatigue indicator parameters strongly correlated with the spatial location of the microstructural configurations in the principal components space. The fuzzy c-means clustering method identified clusters of data that varied in terms of their average fatigue indicator parameters. Furthermore, the number of points in each cluster was inversely correlated to the average fatigue indicator parameter. This analysis demonstrates that data-driven methods have significant potential for providing unbiased determination of unique microstructural configurations and their frequencies of occurrence in a given volume from the point of view of strain localization and fatigue crack initiation.
Analysis of genetic association in Listeria and Diabetes using Hierarchical Clustering and Silhouette Index

NASA Astrophysics Data System (ADS)

Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.

2016-04-01

It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.
The Fdb3 transcription factor of the Fusarium Detoxification of Benzoxazolinone gene cluster is required for MBOA but not BOA degradation in Fusarium pseudograminearum.

PubMed

Kettle, Andrew J; Carere, Jason; Batley, Jacqueline; Manners, John M; Kazan, Kemal; Gardiner, Donald M

2016-03-01

A number of cereals produce the benzoxazolinone class of phytoalexins. Fusarium species pathogenic towards these hosts can typically degrade these compounds via an aminophenol intermediate, and the ability to do so is encoded by a group of genes found in the Fusarium Detoxification of Benzoxazolinone (FDB) cluster. A zinc finger transcription factor encoded by one of the FDB cluster genes (FDB3) has been proposed to regulate the expression of other genes in the cluster and hence is potentially involved in benzoxazolinone degradation. Herein we show that Fdb3 is essential for the ability of Fusarium pseudograminearum to efficiently detoxify the predominant wheat benzoxazolinone, 6-methoxy-benzoxazolin-2-one (MBOA), but not benzoxazoline-2-one (BOA). Furthermore, additional genes thought to be part of the FDB gene cluster, based upon transcriptional response to benzoxazolinones, are regulated by Fdb3. However, deletion mutants for these latter genes remain capable of benzoxazolinone degradation, suggesting that they are not essential for this process. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Genome Neighborhood Network Reveals Insights into Enediyne Biosynthesis and Facilitates Prediction and Prioritization for Discovery

PubMed Central

Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben

2015-01-01

The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027
Hierarchical Dirichlet process model for gene expression clustering

PubMed Central

2013-01-01

Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation Among Gene Classes from Large-Scale Expression Data

NASA Technical Reports Server (NTRS)

Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara

2000-01-01

We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.
Identification of the Coumermycin A1 Biosynthetic Gene Cluster of Streptomyces rishiriensis DSM 40489

PubMed Central

Wang, Zhao-Xin; Li, Shu-Ming; Heide, Lutz

2000-01-01

The biosynthetic gene cluster of the aminocoumarin antibiotic coumermycin A1 was cloned by screening of a cosmid library of Streptomyces rishiriensis DSM 40489 with heterologous probes from a dTDP-glucose 4,6-dehydratase gene, involved in deoxysugar biosynthesis, and from the aminocoumarin resistance gyrase gene gyrBr. Sequence analysis of a 30.8-kb region upstream of gyrBr revealed the presence of 28 complete open reading frames (ORFs). Fifteen of the identified ORFs showed, on average, 84% identity to corresponding ORFs in the biosynthetic gene cluster of novobiocin, another aminocoumarin antibiotic. Possible functions of 17 ORFs in the biosynthesis of coumermycin A1 could be assigned by comparison with sequences in GenBank. Experimental proof for the function of the identified gene cluster was provided by an insertional gene inactivation experiment, which resulted in an abolishment of coumermycin A1 production. PMID:11036020
Whole Blood Gene Expression Profiling Predicts Severe Morbidity and Mortality in Cystic Fibrosis: A 5-Year Follow-Up Study.

PubMed

Saavedra, Milene T; Quon, Bradley S; Faino, Anna; Caceres, Silvia M; Poch, Katie R; Sanders, Linda A; Malcolm, Kenneth C; Nichols, David P; Sagel, Scott D; Taylor-Cousar, Jennifer L; Leach, Sonia M; Strand, Matthew; Nick, Jerry A

2018-05-01

Cystic fibrosis pulmonary exacerbations accelerate pulmonary decline and increase mortality. Previously, we identified a 10-gene leukocyte panel measured directly from whole blood, which indicates response to exacerbation treatment. We hypothesized that molecular characteristics of exacerbations could also predict future disease severity. We tested whether a 10-gene panel measured from whole blood could identify patient cohorts at increased risk for severe morbidity and mortality, beyond standard clinical measures. Transcript abundance for the 10-gene panel was measured from whole blood at the beginning of exacerbation treatment (n = 57). A hierarchical cluster analysis of subjects based on their gene expression was performed, yielding four molecular clusters. An analysis of cluster membership and outcomes incorporating an independent cohort (n = 21) was completed to evaluate robustness of cluster partitioning of genes to predict severe morbidity and mortality. The four molecular clusters were analyzed for differences in forced expiratory volume in 1 second, C-reactive protein, return to baseline forced expiratory volume in 1 second after treatment, time to next exacerbation, and time to morbidity or mortality events (defined as lung transplant referral, lung transplant, intensive care unit admission for respiratory insufficiency, or death). Clustering based on gene expression discriminated between patient groups with significant differences in forced expiratory volume in 1 second, admission frequency, and overall morbidity and mortality. At 5 years, all subjects in cluster 1 (very low risk) were alive and well, whereas 90% of subjects in cluster 4 (high risk) had suffered a major event (P = 0.0001). In multivariable analysis, the ability of gene expression to predict clinical outcomes remained significant, despite adjustment for forced expiratory volume in 1 second, sex, and admission frequency. The robustness of gene clustering to categorize patients appropriately in terms of clinical characteristics, and short- and long-term clinical outcomes, remained consistent, even when adding in a secondary population with significantly different clinical outcomes. Whole blood gene expression profiling allows molecular classification of acute pulmonary exacerbations, beyond standard clinical measures, providing a predictive tool for identifying subjects at increased risk for mortality and disease progression.
Biomarker discovery for colon cancer using a 761 gene RT-PCR assay.

PubMed

Clark-Langone, Kim M; Wu, Jenny Y; Sangli, Chithra; Chen, Angela; Snable, James L; Nguyen, Anhthu; Hackett, James R; Baker, Joffre; Yothers, Greg; Kim, Chungyeul; Cronin, Maureen T

2007-08-15

Reverse transcription PCR (RT-PCR) is widely recognized to be the gold standard method for quantifying gene expression. Studies using RT-PCR technology as a discovery tool have historically been limited to relatively small gene sets compared to other gene expression platforms such as microarrays. We have recently shown that TaqMan RT-PCR can be scaled up to profile expression for 192 genes in fixed paraffin-embedded (FPE) clinical study tumor specimens. This technology has also been used to develop and commercialize a widely used clinical test for breast cancer prognosis and prediction, the Onco typeDX assay. A similar need exists in colon cancer for a test that provides information on the likelihood of disease recurrence in colon cancer (prognosis) and the likelihood of tumor response to standard chemotherapy regimens (prediction). We have now scaled our RT-PCR assay to efficiently screen 761 biomarkers across hundreds of patient samples and applied this process to biomarker discovery in colon cancer. This screening strategy remains attractive due to the inherent advantages of maintaining platform consistency from discovery through clinical application. RNA was extracted from formalin fixed paraffin embedded (FPE) tissue, as old as 28 years, from 354 patients enrolled in NSABP C-01 and C-02 colon cancer studies. Multiplexed reverse transcription reactions were performed using a gene specific primer pool containing 761 unique primers. PCR was performed as independent TaqMan reactions for each candidate gene. Hierarchal clustering demonstrates that genes expected to co-express form obvious, distinct and in certain cases very tightly correlated clusters, validating the reliability of this technical approach to biomarker discovery. We have developed a high throughput, quantitatively precise multi-analyte gene expression platform for biomarker discovery that approaches low density DNA arrays in numbers of genes analyzed while maintaining the high specificity, sensitivity and reproducibility that are characteristics of RT-PCR. Biomarkers discovered using this approach can be transferred to a clinical reference laboratory setting without having to re-validate the assay on a second technology platform.
An OmpA Family Protein, a Target of the GinI/GinR Quorum-Sensing System in Gluconacetobacter intermedius, Controls Acetic Acid Fermentation▿ †

PubMed Central

Iida, Aya; Ohnishi, Yasuo; Horinouchi, Sueharu

2008-01-01

Via N-acylhomoserine lactones, the GinI/GinR quorum-sensing system in Gluconacetobacter intermedius NCI1051, a gram-negative acetic acid bacterium, represses acetic acid and gluconic acid fermentation. Two-dimensional polyacrylamide gel electrophoretic analysis of protein profiles of strain NCI1051 and ginI and ginR mutants identified a protein that was produced in response to the GinI/GinR regulatory system. Cloning and nucleotide sequencing of the gene encoding this protein revealed that it encoded an OmpA family protein, named GmpA. gmpA was a member of the gene cluster containing three adjacent homologous genes, gmpA to gmpC, the organization of which appeared to be unique to vinegar producers, including “Gluconacetobacter polyoxogenes.” In addition, GmpA was unique among the OmpA family proteins in that its N-terminal membrane domain forming eight antiparallel transmembrane β-strands contained an extra sequence in one of the surface-exposed loops. Transcriptional analysis showed that only gmpA of the three adjacent gmp genes was activated by the GinI/GinR quorum-sensing system. However, gmpA was not controlled directly by GinR but was controlled by an 89-amino-acid protein, GinA, a target of this quorum-sensing system. A gmpA mutant grew more rapidly in the presence of 2% (vol/vol) ethanol and accumulated acetic acid and gluconic acid in greater final yields than strain NCI1051. Thus, GmpA plays a role in repressing oxidative fermentation, including acetic acid fermentation, which is unique to acetic acid bacteria and allows ATP synthesis via ethanol oxidation. Consistent with the involvement of gmpA in oxidative fermentation, its transcription was also enhanced by ethanol and acetic acid. PMID:18487322
A pancreatic exocrine-like cell regulatory circuit operating in the upper stomach of the sea urchin Strongylocentrotus purpuratus larva.

PubMed

Perillo, Margherita; Wang, Yue Julia; Leach, Steven D; Arnone, Maria Ina

2016-05-26

Digestive cells are present in all metazoans and provide the energy necessary for the whole organism. Pancreatic exocrine cells are a unique vertebrate cell type involved in extracellular digestion of a wide range of nutrients. Although the organization and regulation of this cell type is intensively studied in vertebrates, its evolutionary history is still unknown. In order to understand which are the elements that define the pancreatic exocrine phenotype, we have analyzed the expression of genes that contribute to specification and function of this cell-type in an early branching deuterostome, the sea urchin Strongylocentrotus purpuratus. We defined the spatial and temporal expression of sea urchin orthologs of pancreatic exocrine genes and described a unique population of cells clustered in the upper stomach of the sea urchin embryo where exocrine markers are co-expressed. We used a combination of perturbation analysis, drug and feeding experiments and found that in these cells of the sea urchin embryo gene expression and gene regulatory interactions resemble that of bona fide pancreatic exocrine cells. We show that the sea urchin Ptf1a, a key transcriptional activator of digestive enzymes in pancreatic exocrine cells, can substitute for its vertebrate ortholog in activating downstream genes. Collectively, our study is the first to show with molecular tools that defining features of a vertebrate cell-type, the pancreatic exocrine cell, are shared by a non-vertebrate deuterostome. Our results indicate that the functional cell-type unit of the vertebrate pancreas may evolutionarily predate the emergence of the pancreas as a discrete organ. From an evolutionary perspective, these results encourage to further explore the homologs of other vertebrate cell-types in traditional or newly emerging deuterostome systems.
Transcription factor clusters regulate genes in eukaryotic cells

PubMed Central

Hedlund, Erik G; Friemann, Rosmarie; Hohmann, Stefan

2017-01-01

Transcription is regulated through binding factors to gene promoters to activate or repress expression, however, the mechanisms by which factors find targets remain unclear. Using single-molecule fluorescence microscopy, we determined in vivo stoichiometry and spatiotemporal dynamics of a GFP tagged repressor, Mig1, from a paradigm signaling pathway of Saccharomyces cerevisiae. We find the repressor operates in clusters, which upon extracellular signal detection, translocate from the cytoplasm, bind to nuclear targets and turnover. Simulations of Mig1 configuration within a 3D yeast genome model combined with a promoter-specific, fluorescent translation reporter confirmed clusters are the functional unit of gene regulation. In vitro and structural analysis on reconstituted Mig1 suggests that clusters are stabilized by depletion forces between intrinsically disordered sequences. We observed similar clusters of a co-regulatory activator from a different pathway, supporting a generalized cluster model for transcription factors that reduces promoter search times through intersegment transfer while stabilizing gene expression. PMID:28841133
The Genetic and Molecular Organization of the Dopa Decarboxylase Gene Cluster of Drosophila Melanogaster

PubMed Central

Stathakis, D. G.; Pentz, E. S.; Freeman, M. E.; Kullman, J.; Hankins, G. R.; Pearlson, N. J.; Wright, TRF.

1995-01-01

We report the complete molecular organization of the Dopa decarboxylase gene cluster. Mutagenesis screens recovered 77 new Df(2L)TW130 recessive lethal mutations. These new alleles combined with 263 previously isolated mutations in the cluster to define 18 essential genes. In addition, seven new deficiencies were isolated and characterized. Deficiency mapping, restriction fragment length polymorphism (RFLP) analysis and P-element-mediated germline transformation experiments determined the gene order for all 18 loci. Genomic and cDNA restriction endonuclease mapping, Northern blot analysis and DNA sequencing provided information on exact gene location, mRNA size and transcriptional direction for most of these loci. In addition, this analysis identified two transcription units that had not previously been identified by extensive mutagenesis screening. Most of the loci are contained within two dense subclusters. We discuss the effectiveness of mutagens and strategies used in our screens, the variable mutability of loci within the genome of Drosophila melanogaster, the cytological and molecular organization of the Ddc gene cluster, the validity of the one band-one gene hypothesis and a possible purpose for the clustering of genes in the Ddc region. PMID:8647399
Concerted Changes in Gene Expression and Cell Physiology of the Cyanobacterium Synechocystis sp. Strain PCC 6803 during Transitions between Nitrogen and Light-Limited Growth1[W][OA

PubMed Central

Aguirre von Wobeser, Eneas; Ibelings, Bas W.; Bok, Jasper; Krasikov, Vladimir; Huisman, Jef; Matthijs, Hans C.P.

2011-01-01

Physiological adaptation and genome-wide expression profiles of the cyanobacterium Synechocystis sp. strain PCC 6803 in response to gradual transitions between nitrogen-limited and light-limited growth conditions were measured in continuous cultures. Transitions induced changes in pigment composition, light absorption coefficient, photosynthetic electron transport, and specific growth rate. Physiological changes were accompanied by reproducible changes in the expression of several hundred open reading frames, genes with functions in photosynthesis and respiration, carbon and nitrogen assimilation, protein synthesis, phosphorus metabolism, and overall regulation of cell function and proliferation. Cluster analysis of the nearly 1,600 regulated open reading frames identified eight clusters, each showing a different temporal response during the transitions. Two large clusters mirrored each other. One cluster included genes involved in photosynthesis, which were up-regulated during light-limited growth but down-regulated during nitrogen-limited growth. Conversely, genes in the other cluster were down-regulated during light-limited growth but up-regulated during nitrogen-limited growth; this cluster included several genes involved in nitrogen uptake and assimilation. These results demonstrate complementary regulation of gene expression for two major metabolic activities of cyanobacteria. Comparison with batch-culture experiments revealed interesting differences in gene expression between batch and continuous culture and illustrates that continuous-culture experiments can pick up subtle changes in cell physiology and gene expression. PMID:21205618
Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species

PubMed Central

Glenn, Anthony E.; Davis, C. Britton; Gao, Minglu; Gold, Scott E.; Mitchell, Trevor R.; Proctor, Robert H.; Stewart, Jane E.; Snook, Maurice E.

2016-01-01

Microbes encounter a broad spectrum of antimicrobial compounds in their environments and often possess metabolic strategies to detoxify such xenobiotics. We have previously shown that Fusarium verticillioides, a fungal pathogen of maize known for its production of fumonisin mycotoxins, possesses two unlinked loci, FDB1 and FDB2, necessary for detoxification of antimicrobial compounds produced by maize, including the γ-lactam 2-benzoxazolinone (BOA). In support of these earlier studies, microarray analysis of F. verticillioides exposed to BOA identified the induction of multiple genes at FDB1 and FDB2, indicating the loci consist of gene clusters. One of the FDB1 cluster genes encoded a protein having domain homology to the metallo-β-lactamase (MBL) superfamily. Deletion of this gene (MBL1) rendered F. verticillioides incapable of metabolizing BOA and thus unable to grow on BOA-amended media. Deletion of other FDB1 cluster genes, in particular AMD1 and DLH1, did not affect BOA degradation. Phylogenetic analyses and topology testing of the FDB1 and FDB2 cluster genes suggested two horizontal transfer events among fungi, one being transfer of FDB1 from Fusarium to Colletotrichum, and the second being transfer of the FDB2 cluster from Fusarium to Aspergillus. Together, the results suggest that plant-derived xenobiotics have exerted evolutionary pressure on these fungi, leading to horizontal transfer of genes that enhance fitness or virulence. PMID:26808652
antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

PubMed Central

Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko

2015-01-01

Abstract Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579
β-globin gene cluster haplotypes in ethnic minority populations of southwest China

PubMed Central

Sun, Hao; Liu, Hongxian; Huang, Kai; Lin, Keqin; Huang, Xiaoqin; Chu, Jiayou; Ma, Shaohui; Yang, Zhaoqing

2017-01-01

The genetic diversity and relationships among ethnic minority populations of southwest China were investigated using seven polymorphic restriction enzyme sites in the β-globin gene cluster. The haplotypes of 1392 chromosomes from ten ethnic populations living in southwest China were determined. Linkage equilibrium and recombination hotspot were found between the 5′ sites and 3′ sites of the β-globin gene cluster. 5′ haplotypes 2 (+−−−), 6 (−++−+), 9 (−++++) and 3′ haplotype FW3 (−+) were the predominant haplotypes. Notably, haplotype 9 frequency was significantly high in the southwest populations, indicating their difference with other Chinese. The interpopulation differentiation of southwest Chinese minority populations is less than those in populations of northern China and other continents. Phylogenetic analysis shows that populations sharing same ethnic origin or language clustered to each other, indicating current β-globin cluster diversity in the Chinese populations reflects their ethnic origin and linguistic affiliations to a great extent. This study characterizes β-globin gene cluster haplotypes in southwest Chinese minorities for the first time, and reveals the genetic variability and affinity of these populations using β-globin cluster haplotype frequencies. The results suggest that ethnic origin plays an important role in shaping variations of the β-globin gene cluster in the southwestern ethnic populations of China. PMID:28205625

Molecular characterization of the PR-toxin gene cluster in Penicillium roqueforti and Penicillium chrysogenum: cross talk of secondary metabolite pathways.

PubMed

Hidalgo, Pedro I; Ullán, Ricardo V; Albillos, Silvia M; Montero, Olimpio; Fernández-Bodega, María Ángeles; García-Estrada, Carlos; Fernández-Aguado, Marta; Martín, Juan-Francisco

2014-01-01

The PR-toxin is a potent mycotoxin produced by Penicillium roqueforti in moulded grains and grass silages and may contaminate blue-veined cheese. The PR-toxin derives from the 15 carbon atoms sesquiterpene aristolochene formed by the aristolochene synthase (encoded by ari1). We have cloned and sequenced a four gene cluster that includes the ari1 gene from P. roqueforti. Gene silencing of each of the four genes (named prx1 to prx4) resulted in a reduction of 65-75% in the production of PR-toxin indicating that the four genes encode enzymes involved in PR-toxin biosynthesis. Interestingly the four silenced mutants overproduce large amounts of mycophenolic acid, an antitumor compound formed by an unrelated pathway suggesting a cross-talk of PR-toxin and mycophenolic acid production. An eleven gene cluster that includes the above mentioned four prx genes and a 14-TMS drug/H(+) antiporter was found in the genome of Penicillium chrysogenum. This eleven gene cluster has been reported to be very poorly expressed in a transcriptomic study of P. chrysogenum genes under conditions of penicillin production (strongly aerated cultures). We found that this apparently silent gene cluster is able to produce PR-toxin in P. chrysogenum under static culture conditions on hydrated rice medium. Noteworthily, the production of PR-toxin was 2.6-fold higher in P. chrysogenum npe10, a strain deleted in the 56.8kb amplifiable region containing the pen gene cluster, than in the parental strain Wisconsin 54-1255 providing another example of cross-talk between secondary metabolite pathways in this fungus. A detailed PR-toxin biosynthesis pathway is proposed based on all available evidence. Copyright © 2013 Elsevier Inc. All rights reserved.
The nif Gene Operon of the Methanogenic Archaeon Methanococcus maripaludis

PubMed Central

Kessler, Peter S.; Blank, Carrine; Leigh, John A.

1998-01-01

Nitrogen fixation occurs in two domains, Archaea and Bacteria. We have characterized a nif (nitrogen fixation) gene cluster in the methanogenic archaeon Methanococcus maripaludis. Sequence analysis revealed eight genes, six with sequence similarity to known nif genes and two with sequence similarity to glnB. The gene order, nifH, ORF105 (similar to glnB), ORF121 (similar to glnB), nifD, nifK, nifE, nifN, and nifX, was the same as that found in part in other diazotrophic methanogens and except for the presence of the glnB-like genes, also resembled the order found in many members of the Bacteria. Using transposon insertion mutagenesis, we determined that an 8-kb region required for nitrogen fixation corresponded to the nif gene cluster. Northern analysis revealed the presence of either a single 7.6-kb nif mRNA transcript or 10 smaller mRNA species containing portions of the large transcript. Polar effects of transposon insertions demonstrated that all of these mRNAs arose from a single promoter region, where transcription initiated 80 bp 5′ to nifH. Distinctive features of the nif gene cluster include the presence of the six primary nif genes in a single operon, the placement of the two glnB-like genes within the cluster, the apparent physical separation of the cluster from any other nif genes that might be in the genome, the fragmentation pattern of the mRNA, and the regulation of expression by a repression mechanism described previously. Our study and others with methanogenic archaea reporting multiple mRNAs arising from gene clusters with only a single putative promoter sequence suggest that mRNA processing following transcription may be a common occurrence in methanogens. PMID:9515920
A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

PubMed

Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

2015-01-01

Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.
A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

PubMed Central

Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

2015-01-01

Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180
Hybrid Assembly of Different-Sized Supertetrahedral Clusters into a Unique Non-Interpenetrated Mn-In-S Open Framework with Large Cavity.

PubMed

Wang, Hongxiang; Wang, Wei; Hu, Dandan; Luo, Min; Xue, Chaozhuang; Li, Dongsheng; Wu, Tao

2018-06-04

Reported here is a unique crystalline semiconductor open-framework material built from the large-sized supertetrahedral T4 and T5 clusters with the Mn-In-S compositions. The hybrid assembly between T4 and T5 clusters by sharing terminal μ 2 -S 2- is for the first time observed among the cluster-based chalcogenide open frameworks. Such three-dimensional structure displays non-interpenetrated diamond-type topology with extra-large nonframework volume of 82%. Moreover, ion exchange, CO 2 adsorption, as well as photoluminescence properties of the title compound are also investigated.
Use of keyword hierarchies to interpret gene expression patterns.

PubMed

Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J

2001-04-01

High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.
Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture

PubMed Central

Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; Cole, James R.; Hashsham, Syed A.; Looft, Torey; Zhu, Yong-Guan

2016-01-01

ABSTRACT Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. PMID:27073098
Use of lambdagt11 to isolate genes for two pseudorabies virus glycoproteins with homology to herpes simplex virus and varicella-zoster virus glycoproteins

DOE Office of Scientific and Technical Information (OSTI.GOV)

Petrovskis, E.A.; Timmins, J.G.; Post, L.E.

1986-10-01

A library of pseudorabies virus (PRV) DNA fragments was constructed in the expression cloning vector lambdagt11. The library was screened with antisera which reacted with mixtures of PRV proteins to isolate recombinant bacteriophages expressing PRV proteins. By the nature of the lambdagt11 vector, the cloned proteins were expressed in Escherichia coli as ..beta..-galactosidase fusion proteins. The fusion proteins from 35 of these phages were purified and injected into mice to raise antisera. The antisera were screened by several different assays, including immunoprecipitation of (/sup 14/C)glucosamine-labeled PRV proteins. This method identified phages expressing three different PRV glycoproteins: the secreted glycoprotein, gX;more » gI; and a glycoprotein that had not been previously identified, which we designate gp63. The gp63 and gI genes map adjacent to each other in the small unique region of the PRV genome. The DNA sequence was determined for the region of the genome encoding gp63 and gI. It was found that gp63 has a region of homology with a herpes simplex virus type 1 (HSV-1) protein, encoded by US7, and also with varicella-zoster virus (VZV) gpIV. The gI protein sequence has a region of homology with HSV-1 gE and VZV gpI. It is concluded that PRV, HSV, and VZV all have a cluster of homologous glycoprotein genes in the small unique components of their genomes and that the organization of these genes is conserved.« less
Molecular classification of gastric cancer: a new paradigm.

PubMed

Shah, Manish A; Khanin, Raya; Tang, Laura; Janjigian, Yelena Y; Klimstra, David S; Gerdes, Hans; Kelsen, David P

2011-05-01

Gastric cancer may be subdivided into 3 distinct subtypes--proximal, diffuse, and distal gastric cancer--based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology. Our aim is to test the hypothesis that these distinct gastric cancer subtypes may also be distinguished by gene expression analysis. Patients with localized gastric adenocarcinoma being screened for a phase II preoperative clinical trial (National Cancer Institute, NCI #5917) underwent endoscopic biopsy for fresh tumor procurement. Four to 6 targeted biopsies of the primary tumor were obtained. Macrodissection was carried out to ensure more than 80% carcinoma in the sample. HG-U133A GeneChip (Affymetrix) was used for cDNA expression analysis, and all arrays were processed and analyzed using the Bioconductor R-package. Between November 2003 and January 2006, 57 patients were screened to identify 36 patients with localized gastric cancer who had adequate RNA for expression analysis. Using supervised analysis, we built a classifier to distinguish the 3 gastric cancer subtypes, successfully classifying each into tightly grouped clusters. Leave-one-out cross-validation error was 0.14, suggesting that more than 85% of samples were classified correctly. Gene set analysis with the false discovery rate set at 0.25 identified several pathways that were differentially regulated when comparing each gastric cancer subtype to adjacent normal stomach. Subtypes of gastric cancer that have epidemiologic and histologic distinctions are also distinguished by gene expression data. These preliminary data suggest a new classification of gastric cancer with implications for improving our understanding of disease biology and identification of unique molecular drivers for each gastric cancer subtype. ©2011 AACR.
Molecular Classification of Gastric Cancer: A new paradigm

PubMed Central

Shah, Manish A.; Khanin, Raya; Tang, Laura; Janjigian, Yelena Y.; Klimstra, David S.; Gerdes, Hans; Kelsen, David P.

2011-01-01

Purpose Gastric cancer may be subdivided into three distinct subtypes –proximal, diffuse, and distal gastric cancer– based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology. Our aim is to test the hypothesis that these distinct gastric cancer subtypes may also be distinguished by gene expression analysis. Experimental Design Patients with localized gastric adenocarcinoma being screened for a phase II preoperative clinical trial (NCI 5917) underwent endoscopic biopsy for fresh tumor procurement. 4–6 targeted biopsies of the primary tumor were obtained. Macrodissection was performed to ensure >80% carcinoma in the sample. HG-U133A GeneChip (Affymetrix) was used for cDNA expression analysis, and all arrays were processed and analyzed using the Bioconductor R-package. Results Between November 2003 and January 2006, 57 patients were screened to identify 36 patients with localized gastric cancer who had adequate RNA for expression analysis. Using supervised analysis, we built a classifier to distinguish the three gastric cancer subtypes, successfully classifying each into tightly grouped clusters. Leave-one-out cross validation error was 0.14, suggesting that >85% of samples were classified correctly. Gene set analysis with the False Discovery Rate set at 0.25 identified several pathways that were differentially regulated when comparing each gastric cancer subtype to adjacent normal stomach. Conclusions Subtypes of gastric cancer that have epidemiologic and histologic distinction are also distinguished by gene expression data. These preliminary data suggest a new classification of gastric cancer with implications for improving our understanding of disease biology and identification of unique molecular drivers for each gastric cancer subtype. PMID:21430069
CrossLink: a novel method for cross-condition classification of cancer subtypes.

PubMed

Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei

2016-08-22

We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.
Lactobacillus buchneri Genotyping on the Basis of Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) Locus Diversity

PubMed Central

Briner, Alexandra E.

2014-01-01

Clustered regularly interspaced short palindromic repeats (CRISPR) in combination with associated sequences (cas) constitute the CRISPR-Cas immune system, which uptakes DNA from invasive genetic elements as novel “spacers” that provide a genetic record of immunization events. We investigated the potential of CRISPR-based genotyping of Lactobacillus buchneri, a species relevant for commercial silage, bioethanol, and vegetable fermentations. Upon investigating the occurrence and diversity of CRISPR-Cas systems in Lactobacillus buchneri genomes, we observed a ubiquitous occurrence of CRISPR arrays containing a 36-nucleotide (nt) type II-A CRISPR locus adjacent to four cas genes, including the universal cas1 and cas2 genes and the type II signature gene cas9. Comparative analysis of CRISPR spacer content in 26 L. buchneri pickle fermentation isolates associated with spoilage revealed 10 unique locus genotypes that contained between 9 and 29 variable spacers. We observed a set of conserved spacers at the ancestral end, reflecting a common origin, as well as leader-end polymorphisms, reflecting recent divergence. Some of these spacers showed perfect identity with phage sequences, and many spacers showed homology to Lactobacillus plasmid sequences. Following a comparative analysis of sequences immediately flanking protospacers that matched CRISPR spacers, we identified a novel putative protospacer-adjacent motif (PAM), 5′-AAAA-3′. Overall, these findings suggest that type II-A CRISPR-Cas systems are valuable for genotyping of L. buchneri. PMID:24271175
Bacillus cereus-type polyhydroxyalkanoate biosynthetic gene cluster contains R-specific enoyl-CoA hydratase gene.

PubMed

Kihara, Takahiro; Hiroe, Ayaka; Ishii-Hyakutake, Manami; Mizuno, Kouhei; Tsuge, Takeharu

2017-08-01

Bacillus cereus and Bacillus megaterium both accumulate polyhydroxyalkanoate (PHA) but their PHA biosynthetic gene (pha) clusters that code for proteins involved in PHA biosynthesis are different. Namely, a gene encoding MaoC-like protein exists in the B. cereus-type pha cluster but not in the B. megaterium-type pha cluster. MaoC-like protein has an R-specific enoyl-CoA hydratase (R-hydratase) activity and is referred to as PhaJ when involved in PHA metabolism. In this study, the pha cluster of B. cereus YB-4 was characterized in terms of PhaJ's function. In an in vitro assay, PhaJ from B. cereus YB-4 (PhaJ YB4 ) exhibited hydration activity toward crotonyl-CoA. In an in vivo assay using Escherichia coli as a host for PHA accumulation, the recombinant strain expressing PhaJ YB4 and PHA synthase led to increased PHA accumulation, suggesting that PhaJ YB4 functioned as a monomer supplier. The monomer composition of the accumulated PHA reflected the substrate specificity of PhaJ YB4 , which appeared to prefer short chain-length substrates. The pha cluster from B. cereus YB-4 functioned to accumulate PHA in E. coli; however, it did not function when the phaJ YB4 gene was deleted. The B. cereus-type pha cluster represents a new example of a pha cluster that contains the gene encoding PhaJ.
Analysis of lamprey clustered Fox genes: insight into Fox gene evolution and expression in vertebrates.

PubMed

Wotton, Karl R; Shimeld, Sebastian M

2011-12-01

In the human genome, members of the FoxC, FoxF, FoxL1, and FoxQ1 gene families are found in two paralagous clusters. One cluster contains the genes FOXQ1, FOXF2, FOXC1 and the second consists of FOXF1, FOXC2, and FOXL1. In jawed vertebrates these genes are known to be expressed in different pharyngeal tissues and all, except FoxQ1, are involved in patterning the early embryonic mesoderm. We have previously traced the evolution of this cluster in the bony vertebrates, and the gene content is identical in the dogfish, a member of the most basally branching lineage of the jawed vertebrates. Here we extend these analyses to jawless vertebrates. Using genomic searches and molecular approaches we have identified homologues of these genes from lampreys. We identify two FoxC genes, two FoxF genes, two FoxQ1 genes and single FoxL1 gene. We examine the embryonic expression of one predominantly mesodermally expressed gene family, FoxC, and the endodermally expressed member of the cluster, FoxQ1. We identified FoxQ1 transcripts in the pharyngeal endoderm, while the two FoxC genes are differentially expressed in the pharyngeal mesenchyme and ectoderm. Furthermore we identify conserved expression of lamprey FoxC genes in the paraxial and intermediate mesoderms. We interpret our results through a chordate-wide comparison of expression patterns and discuss gene content in the context of theories on the evolution of the vertebrate genome. 2011 Elsevier B.V. All rights reserved.
Atlas of nonribosomal peptide and polyketide biosynthetic pathways reveals common occurrence of nonmodular enzymes.

PubMed

Wang, Hao; Fewer, David P; Holm, Liisa; Rouhiainen, Leo; Sivonen, Kaarina

2014-06-24

Nonribosomal peptides and polyketides are a diverse group of natural products with complex chemical structures and enormous pharmaceutical potential. They are synthesized on modular nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) enzyme complexes by a conserved thiotemplate mechanism. Here, we report the widespread occurrence of NRPS and PKS genetic machinery across the three domains of life with the discovery of 3,339 gene clusters from 991 organisms, by examining a total of 2,699 genomes. These gene clusters display extraordinarily diverse organizations, and a total of 1,147 hybrid NRPS/PKS clusters were found. Surprisingly, 10% of bacterial gene clusters lacked modular organization, and instead catalytic domains were mostly encoded as separate proteins. The finding of common occurrence of nonmodular NRPS differs substantially from the current classification. Sequence analysis indicates that the evolution of NRPS machineries was driven by a combination of common descent and horizontal gene transfer. We identified related siderophore NRPS gene clusters that encoded modular and nonmodular NRPS enzymes organized in a gradient. A higher frequency of the NRPS and PKS gene clusters was detected from bacteria compared with archaea or eukarya. They commonly occurred in the phyla of Proteobacteria, Actinobacteria, Firmicutes, and Cyanobacteria in bacteria and the phylum of Ascomycota in fungi. The majority of these NRPS and PKS gene clusters have unknown end products highlighting the power of genome mining in identifying novel genetic machinery for the biosynthesis of secondary metabolites.
Interspecific and intraspecific gene variability in a 1-Mb region containing the highest density of NBS-LRR genes found in the melon genome.

PubMed

González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere

2014-12-17

Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes.
Wide distribution of O157-antigen biosynthesis gene clusters in Escherichia coli.

PubMed

Iguchi, Atsushi; Shirai, Hiroki; Seto, Kazuko; Ooka, Tadasuke; Ogura, Yoshitoshi; Hayashi, Tetsuya; Osawa, Kayo; Osawa, Ro

2011-01-01

Most Escherichia coli O157-serogroup strains are classified as enterohemorrhagic E. coli (EHEC), which is known as an important food-borne pathogen for humans. They usually produce Shiga toxin (Stx) 1 and/or Stx2, and express H7-flagella antigen (or nonmotile). However, O157 strains that do not produce Stxs and express H antigens different from H7 are sometimes isolated from clinical and other sources. Multilocus sequence analysis revealed that these 21 O157:non-H7 strains tested in this study belong to multiple evolutionary lineages different from that of EHEC O157:H7 strains, suggesting a wide distribution of the gene set encoding the O157-antigen biosynthesis in multiple lineages. To gain insight into the gene organization and the sequence similarity of the O157-antigen biosynthesis gene clusters, we conducted genomic comparisons of the chromosomal regions (about 59 kb in each strain) covering the O-antigen gene cluster and its flanking regions between six O157:H7/non-H7 strains. Gene organization of the O157-antigen gene cluster was identical among O157:H7/non-H7 strains, but was divided into two distinct types at the nucleotide sequence level. Interestingly, distribution of the two types did not clearly follow the evolutionary lineages of the strains, suggesting that horizontal gene transfer of both types of O157-antigen gene clusters has occurred independently among E. coli strains. Additionally, detailed sequence comparison revealed that some positions of the repetitive extragenic palindromic (REP) sequences in the regions flanking the O-antigen gene clusters were coincident with possible recombination points. From these results, we conclude that the horizontal transfer of the O157-antigen gene clusters induced the emergence of multiple O157 lineages within E. coli and speculate that REP sequences may involve one of the driving forces for exchange and evolution of O-antigen loci.
Penicillin production in industrial strain Penicillium chrysogenum P2niaD18 is not dependent on the copy number of biosynthesis genes.

PubMed

Ziemons, Sandra; Koutsantas, Katerina; Becker, Kordula; Dahlmann, Tim; Kück, Ulrich

2017-02-16

Multi-copy gene integration into microbial genomes is a conventional tool for obtaining improved gene expression. For Penicillium chrysogenum, the fungal producer of the beta-lactam antibiotic penicillin, many production strains carry multiple copies of the penicillin biosynthesis gene cluster. This discovery led to the generally accepted view that high penicillin titers are the result of multiple copies of penicillin genes. Here we investigated strain P2niaD18, a production line that carries only two copies of the penicillin gene cluster. We performed pulsed-field gel electrophoresis (PFGE), quantitative qRT-PCR, and penicillin bioassays to investigate production, deletion and overexpression strains generated in the P. chrysogenum P2niaD18 background, in order to determine the copy number of the penicillin biosynthesis gene cluster, and study the expression of one penicillin biosynthesis gene, and the penicillin titer. Analysis of production and recombinant strain showed that the enhanced penicillin titer did not depend on the copy number of the penicillin gene cluster. Our assumption was strengthened by results with a penicillin null strain lacking pcbC encoding isopenicillin N synthase. Reintroduction of one or two copies of the cluster into the pcbC deletion strain restored transcriptional high expression of the pcbC gene, but recombinant strains showed no significantly different penicillin titer compared to parental strains. Here we present a molecular genetic analysis of production and recombinant strains in the P2niaD18 background carrying different copy numbers of the penicillin biosynthesis gene cluster. Our analysis shows that the enhanced penicillin titer does not strictly depend on the copy number of the cluster. Based on these overall findings, we hypothesize that instead, complex regulatory mechanisms are prominently implicated in increased penicillin biosynthesis in production strains.
A Sinorhizobium meliloti RpoH-Regulated Gene Is Involved in Iron-Sulfur Protein Metabolism and Effective Plant Symbiosis under Intrinsic Iron Limitation.

PubMed

Sasaki, Shohei; Minamisawa, Kiwamu; Mitsui, Hisayuki

2016-09-01

In Sinorhizobium meliloti, RpoH-type sigma factors have a global impact on gene expression during heat shock and play an essential role in symbiosis with leguminous plants. Using mutational analysis of a set of genes showing highly RpoH-dependent expression during heat shock, we identified a gene indispensable for effective symbiosis. This gene, designated sufT, was located downstream of the sufBCDS homologs that specify the iron-sulfur (Fe/S) cluster assembly pathway. The identified transcription start site was preceded by an RpoH-dependent promoter consensus sequence. SufT was related to a conserved protein family of unknown molecular function, of which some members are involved in Fe/S cluster metabolism in diverse organisms. A sufT mutation decreased bacterial growth in both rich and minimal media, tolerance to stresses such as iron starvation, and activities of some Fe/S cluster-dependent enzymes. These results support the involvement of SufT in SUF (sulfur mobilization) system-mediated Fe/S protein metabolism. Furthermore, we isolated spontaneous pseudorevertants of the sufT mutant with partially recovered growth; each of them had a mutation in rirA This gene encodes a global iron regulator whose loss increases the intracellular iron content. Deletion of rirA in the original sufT mutant improved growth and restored Fe/S enzyme activities and effective symbiosis. These results suggest that enhanced iron availability compensates for the lack of SufT in the maintenance of Fe/S proteins. Although RpoH-type sigma factors of the RNA polymerase are present in diverse proteobacteria, their role as global regulators of protein homeostasis has been studied mainly in the enteric gammaproteobacterium Escherichia coli In the soil alphaproteobacterium Sinorhizobium meliloti, the rpoH mutations have a strong impact on symbiosis with leguminous plants. We found that sufT is a unique member of the S. meliloti RpoH regulon; sufT contributes to Fe/S protein metabolism and effective symbiosis under intrinsic iron limitation exerted by RirA, a global iron regulator. Our study provides insights into the RpoH regulon function in diverse proteobacteria adapted to particular ecological niches and into the mechanism of conserved Fe/S protein biogenesis. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Identification of a lineage specific zinc responsive genomic island in Mycobacterium avium ssp. paratuberculosis.

PubMed

Eckelt, Elke; Jarek, Michael; Frömke, Cornelia; Meens, Jochen; Goethe, Ralph

2014-12-06

Maintenance of metal homeostasis is crucial in bacterial pathogenicity as metal starvation is the most important mechanism in the nutritional immunity strategy of host cells. Thus, pathogenic bacteria have evolved sensitive metal scavenging systems to overcome this particular host defence mechanism. The ruminant pathogen Mycobacterium avium ssp. paratuberculosis (MAP) displays a unique gut tropism and causes a chronic progressive intestinal inflammation. MAP possesses eight conserved lineage specific large sequence polymorphisms (LSP), which distinguish MAP from its ancestral M. avium ssp. hominissuis or other M. avium subspecies. LSP14 and LSP15 harbour many genes proposed to be involved in metal homeostasis and have been suggested to substitute for a MAP specific, impaired mycobactin synthesis. In the present study, we found that a LSP14 located putative IrtAB-like iron transporter encoded by mptABC was induced by zinc but not by iron starvation. Heterologous reporter gene assays with the lacZ gene under control of the mptABC promoter in M. smegmatis (MSMEG) and in a MSMEG∆furB deletion mutant revealed a zinc dependent, metalloregulator FurB mediated expression of mptABC via a conserved mycobacterial FurB recognition site. Deep sequencing of RNA from MAP cultures treated with the zinc chelator TPEN revealed that 70 genes responded to zinc limitation. Remarkably, 45 of these genes were located on a large genomic island of approximately 90 kb which harboured LSP14 and LSP15. Thirty-five of these genes were predicted to be controlled by FurB, due to the presence of putative binding sites. This clustering of zinc responsive genes was exclusively found in MAP and not in other mycobacteria. Our data revealed a particular genomic signature for MAP given by a unique zinc specific locus, thereby suggesting an exceptional relevance of zinc for the metabolism of MAP. MAP seems to be well adapted to maintain zinc homeostasis which might contribute to the peculiarity of MAP pathogenicity.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.