Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko
2012-07-15
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.
Finding gene clusters for a replicated time course study
2014-01-01
Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656
Multiconstrained gene clustering based on generalized projections
2010-01-01
Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386
A cluster merging method for time series microarray with production values.
Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio
2014-09-01
A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.
Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.
Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin
2005-01-01
DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.
A Stationary Wavelet Entropy-Based Clustering Approach Accurately Predicts Gene Expression
Nguyen, Nha; Vo, An; Choi, Inchan
2015-01-01
Abstract Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation. PMID:25383910
Functional clustering of time series gene expression data by Granger causality
2012-01-01
Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425
Analysis of genetic association using hierarchical clustering and cluster validation indices.
Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L
2017-10-01
It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.
Cao, Huojun; Amendt, Brad A
2016-11-01
Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis. Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016. Published by Elsevier B.V.
2017-06-30
Clustered Regularly Interspaced Short Palindromic Repeat/ CRISPR -associated protein 9 ( CRISPR /Cas9)-based Gene Drives En vi ro nm en ta l L ab or at...Management on Military Lands Clustered Regularly Interspaced Short Palindromic Repeat/ CRISPR -associated protein 9 ( CRISPR /Cas9)-based Gene Drives Ping... CRISPR /Cas9-based Gene Drives for Invasive Species Management on Military Lands” ERDC/EL SR-17-2 ii Abstract Applications of genetic engineering
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Lukashin, A V; Fuchs, R
2001-05-01
Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
Ortholog-based screening and identification of genes related to intracellular survival.
Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin
2018-04-20
Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.
Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita
2016-04-01
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.
Wang, Dapeng; Zhang, Yubin; Fan, Zhonghua; Liu, Guiming; Yu, Jun
2012-01-01
Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database-LCGbase (a comprehensive database for lineage-based co-regulated genes)-hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.
Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray
2004-01-01
One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.
Clustering change patterns using Fourier transformation with time-course gene expression data.
Kim, Jaehee
2011-01-01
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
2008-05-12
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong
2013-07-01
A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...
2015-04-09
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.
2015-01-01
Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308
Wan, B; Yarbrough, J W; Schultz, T W
2008-01-01
This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.
NASA Astrophysics Data System (ADS)
Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.
2016-04-01
It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.
Clustering of change patterns using Fourier coefficients.
Kim, Jaehee; Kim, Haseong
2008-01-15
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.
Diametrical clustering for identifying anti-correlated gene clusters.
Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman
2003-09-01
Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
Sakai, Kanae; Komaki, Hisayuki; Gonoi, Tohru
2015-01-01
Nocardithiocin is a thiopeptide compound isolated from the opportunistic pathogen Nocardia pseudobrasiliensis. It shows a strong activity against acid-fast bacteria and is also active against rifampicin-resistant Mycobacterium tuberculosis. Here, we report the identification of the nocardithiocin gene cluster in N. pseudobrasiliensis IFM 0761 based on conserved thiopeptide biosynthesis gene sequence and the whole genome sequence. The predicted gene cluster was confirmed by gene disruption and complementation. As expected, strains containing the disrupted gene did not produce nocardithiocin while gene complementation restored nocardithiocin production in these strains. The predicted cluster was further analyzed using RNA-seq which showed that the nocardithiocin gene cluster contains 12 genes within a 15.2-kb region. This finding will promote the improvement of nocardithiocin productivity and its derivatives production. PMID:26588225
Fast gene ontology based clustering for microarray experiments.
Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa
2008-11-21
Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Identification of lethal cluster of genes in the yeast transcription network
NASA Astrophysics Data System (ADS)
Rho, K.; Jeong, H.; Kahng, B.
2006-05-01
Identification of essential or lethal genes would be one of the ultimate goals in drug designs. Here we introduce an in silico method to select the cluster with a high population of lethal genes, called lethal cluster, through microarray assay. We construct a gene transcription network based on the microarray expression level. Links are added one by one in the descending order of the Pearson correlation coefficients between two genes. As the link density p increases, two meaningful link densities pm and ps are observed. At pm, which is smaller than the percolation threshold, the number of disconnected clusters is maximum, and the lethal genes are highly concentrated in a certain cluster that needs to be identified. Thus the deletion of all genes in that cluster could efficiently lead to a lethal inviable mutant. This lethal cluster can be identified by an in silico method. As p increases further beyond the percolation threshold, the power law behavior in the degree distribution of a giant cluster appears at ps. We measure the degree of each gene at ps. With the information pertaining to the degrees of each gene at ps, we return to the point pm and calculate the mean degree of genes of each cluster. We find that the lethal cluster has the largest mean degree.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri
2007-01-01
Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825
Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri
2007-12-30
Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
Finding approximate gene clusters with Gecko 3.
Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian
2016-11-16
Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A tripartite clustering analysis on microRNA, gene and disease model.
Shen, Chengcheng; Liu, Ying
2012-02-01
Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.
GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.
Zheng, Hai-Tao; Borchert, Charles; Kim, Hong-Gee
2010-02-01
Concurrent with progress in biomedical sciences, an overwhelming of textual knowledge is accumulating in the biomedical literature. PubMed is the most comprehensive database collecting and managing biomedical literature. To help researchers easily understand collections of PubMed abstracts, numerous clustering methods have been proposed to group similar abstracts based on their shared features. However, most of these methods do not explore the semantic relationships among groupings of documents, which could help better illuminate the groupings of PubMed abstracts. To address this issue, we proposed an ontological clustering method called GOClonto for conceptualizing PubMed abstracts. GOClonto uses latent semantic analysis (LSA) and gene ontology (GO) to identify key gene-related concepts and their relationships as well as allocate PubMed abstracts based on these key gene-related concepts. Based on two PubMed abstract collections, the experimental results show that GOClonto is able to identify key gene-related concepts and outperforms the STC (suffix tree clustering) algorithm, the Lingo algorithm, the Fuzzy Ants algorithm, and the clustering based TRS (tolerance rough set) algorithm. Moreover, the two ontologies generated by GOClonto show significant informative conceptual structures.
A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.
Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong
2015-01-01
Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.
A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus
Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong
2015-01-01
Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180
A mixture model-based approach to the clustering of microarray expression data.
McLachlan, G J; Bean, R W; Peel, D
2002-03-01
This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/
Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John
2015-01-01
Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700
Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean
2006-01-01
Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810
Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes.
Azevedo, Analice C; Bento, Cláudia B P; Ruiz, Jeronimo C; Queiroz, Marisa V; Mantovani, Hilário C
2015-10-01
Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Mining subspace clusters from DNA microarray data using large itemset techniques.
Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi
2009-05-01
Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.
Guthke, Reinhard; Möller, Ulrich; Hoffmann, Martin; Thies, Frank; Töpfer, Susanne
2005-04-15
The immune response to bacterial infection represents a complex network of dynamic gene and protein interactions. We present an optimized reverse engineering strategy aimed at a reconstruction of this kind of interaction networks. The proposed approach is based on both microarray data and available biological knowledge. The main kinetics of the immune response were identified by fuzzy clustering of gene expression profiles (time series). The number of clusters was optimized using various evaluation criteria. For each cluster a representative gene with a high fuzzy-membership was chosen in accordance with available physiological knowledge. Then hypothetical network structures were identified by seeking systems of ordinary differential equations, whose simulated kinetics could fit the gene expression profiles of the cluster-representative genes. For the construction of hypothetical network structures singular value decomposition (SVD) based methods and a newly introduced heuristic Network Generation Method here were compared. It turned out that the proposed novel method could find sparser networks and gave better fits to the experimental data. Reinhard.Guthke@hki-jena.de.
Statistical indicators of collective behavior and functional clusters in gene networks of yeast
NASA Astrophysics Data System (ADS)
Živković, J.; Tadić, B.; Wick, N.; Thurner, S.
2006-03-01
We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.
Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John
2015-09-01
We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.
Butyrate production in phylogenetically diverse Firmicutes isolated from the chicken caecum
Eeckhaut, Venessa; Van Immerseel, Filip; Croubels, Siska; De Baere, Siegrid; Haesebrouck, Freddy; Ducatelle, Richard; Louis, Petra; Vandamme, Peter
2011-01-01
Summary Sixteen butyrate‐producing bacteria were isolated from the caecal content of chickens and analysed phylogenetically. They did not represent a coherent phylogenetic group, but were allied to four different lineages in the Firmicutes phylum. Fourteen strains appeared to represent novel species, based on a level of ≤ 98.5% 16S rRNA gene sequence similarity towards their nearest validly named neighbours. The highest butyrate concentrations were produced by the strains belonging to clostridial clusters IV and XIVa, clusters which are predominant in the chicken caecal microbiota. In only one of the 16 strains tested, the butyrate kinase operon could be amplified, while the butyryl‐CoA : acetate CoA‐transferase gene was detected in eight strains belonging to clostridial clusters IV, XIVa and XIVb. None of the clostridial cluster XVI isolates carried this gene based on degenerate PCR analyses. However, another CoA‐transferase gene more similar to propionate CoA‐transferase was detected in the majority of the clostridial cluster XVI isolates. Since this gene is located directly downstream of the remaining butyrate pathway genes in several human cluster XVI bacteria, it may be involved in butyrate formation in these bacteria. The present study indicates that butyrate producers related to cluster XVI may play a more important role in the chicken gut than in the human gut. PMID:21375722
A ground truth based comparative study on clustering of gene expression data.
Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue
2008-05-01
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
Form gene clustering method about pan-ethnic-group products based on emotional semantic
NASA Astrophysics Data System (ADS)
Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui
2016-09-01
The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.
ERIC Educational Resources Information Center
Scharfenberg, Franz-Josef; Bogner, Franz X.
2013-01-01
This study classified students into different cognitive load (CL) groups by means of cluster analysis based on their experienced CL in a gene technology outreach lab which has instructionally been designed with regard to CL theory. The relationships of the identified student CL clusters to learner characteristics, laboratory variables, and…
Analyzing gene expression time-courses based on multi-resolution shape mixture model.
Li, Ying; He, Ye; Zhang, Yu
2016-11-01
Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.
WordCluster: detecting clusters of DNA words and genomic elements
2011-01-01
Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981
Yang, Yung-Hun; Kim, Ji-Nu; Song, Eunjung; Kim, Eunjung; Oh, Min-Kyu; Kim, Byung-Gee
2008-09-01
In order to identify the regulators involved in antibiotic production or time-specific cellular events, the messenger ribonucleic acid (mRNA) expression data of the two gene clusters, actinorhodin (ACT) and undecylprodigiosin (RED) biosynthetic genes, were clustered with known mRNA expression data of regulators from S. coelicolor using a filtering method based on standard deviation and clustering analysis. The result identified five regulators including two well-known regulators namely, SCO3579 (WlbA) and SCO6722 (SsgD). Using overexpression and deletion of the regulator genes, we were able to identify two regulators, i.e., SCO0608 and SCO6808, playing roles as repressors in antibiotics production and sporulation. This approach can be easily applied to mapping out new regulators related to any interesting target gene clusters showing characteristic expression patterns. The result can also be used to provide insightful information on the selection rules among a large number of regulators.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebhaber, S.A.; Weiss, I.; Cash, F.E.
Synthesis of normal human hemoglobin A, {alpha}{sub 2}{beta}{sub 2}, is based upon balanced expression of genes in the {alpha}-globin gene cluster on chromosome 15 and the {beta}-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the {beta}-globin cluster depend on sequences located at a considerable distance 5{prime} to the {beta}-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the {alpha}-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with {alpha}-thalassemia in whom structurally normal {alpha}-globin genesmore » have been inactivated in cis by a discrete de novo 35-kilobase deletion located {approximately}30 kilobases 5{prime} from the {alpha}-globin gene cluster. They conclude that this deletion inactivates expression of the {alpha}-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the {alpha}-globin genes.« less
Bhattacharya, Anindya; De, Rajat K
2010-08-01
Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. Copyright 2010 Elsevier Inc. All rights reserved.
Clustering gene expression data based on predicted differential effects of GV interaction.
Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu
2005-02-01
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.
Wu, Mengmeng; Huang, Haidong; Li, Guoqiang; Ren, Yi; Shi, Zhong; Li, Xiaoyan; Dai, Xiaohui; Gao, Ge; Ren, Mengnan; Ma, Ting
2017-04-21
Although clustering of genes from the same metabolic pathway is a widespread phenomenon, the evolution of the polysaccharide biosynthetic gene cluster remains poorly understood. To determine the evolution of this pathway, we identified a scattered production pathway of the polysaccharide sanxan by Sphingomonas sanxanigenens NX02, and compared the distribution of genes between sphingan-producing and other Sphingomonadaceae strains. This allowed us to determine how the scattered sanxan pathway developed, and how the polysaccharide gene cluster evolved. Our findings suggested that the evolution of microbial polysaccharide biosynthesis gene clusters is a lengthy cyclic process comprising cluster 1 → scatter → cluster 2. The sanxan biosynthetic pathway proved the existence of a dispersive process. We also report the complete genome sequence of NX02, in which we identified many unstable genetic elements and powerful secretion systems. Furthermore, nine enzymes for the formation of activated precursors, four glycosyltransferases, four acyltransferases, and four polymerization and export proteins were identified. These genes were scattered in the NX02 genome, and the positive regulator SpnA of sphingans synthesis could not regulate sanxan production. Finally, we concluded that the evolution of the sanxan pathway was independent. NX02 evolved naturally as a polysaccharide producing strain over a long-time evolution involving gene acquisitions and adaptive mutations.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Saavedra, Milene T; Quon, Bradley S; Faino, Anna; Caceres, Silvia M; Poch, Katie R; Sanders, Linda A; Malcolm, Kenneth C; Nichols, David P; Sagel, Scott D; Taylor-Cousar, Jennifer L; Leach, Sonia M; Strand, Matthew; Nick, Jerry A
2018-05-01
Cystic fibrosis pulmonary exacerbations accelerate pulmonary decline and increase mortality. Previously, we identified a 10-gene leukocyte panel measured directly from whole blood, which indicates response to exacerbation treatment. We hypothesized that molecular characteristics of exacerbations could also predict future disease severity. We tested whether a 10-gene panel measured from whole blood could identify patient cohorts at increased risk for severe morbidity and mortality, beyond standard clinical measures. Transcript abundance for the 10-gene panel was measured from whole blood at the beginning of exacerbation treatment (n = 57). A hierarchical cluster analysis of subjects based on their gene expression was performed, yielding four molecular clusters. An analysis of cluster membership and outcomes incorporating an independent cohort (n = 21) was completed to evaluate robustness of cluster partitioning of genes to predict severe morbidity and mortality. The four molecular clusters were analyzed for differences in forced expiratory volume in 1 second, C-reactive protein, return to baseline forced expiratory volume in 1 second after treatment, time to next exacerbation, and time to morbidity or mortality events (defined as lung transplant referral, lung transplant, intensive care unit admission for respiratory insufficiency, or death). Clustering based on gene expression discriminated between patient groups with significant differences in forced expiratory volume in 1 second, admission frequency, and overall morbidity and mortality. At 5 years, all subjects in cluster 1 (very low risk) were alive and well, whereas 90% of subjects in cluster 4 (high risk) had suffered a major event (P = 0.0001). In multivariable analysis, the ability of gene expression to predict clinical outcomes remained significant, despite adjustment for forced expiratory volume in 1 second, sex, and admission frequency. The robustness of gene clustering to categorize patients appropriately in terms of clinical characteristics, and short- and long-term clinical outcomes, remained consistent, even when adding in a secondary population with significantly different clinical outcomes. Whole blood gene expression profiling allows molecular classification of acute pulmonary exacerbations, beyond standard clinical measures, providing a predictive tool for identifying subjects at increased risk for mortality and disease progression.
2010-01-01
Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben
2015-01-01
The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027
Hutton, John J; Jegga, Anil G; Kong, Sue; Gupta, Ashima; Ebert, Catherine; Williams, Sarah; Katz, Jonathan D; Aronow, Bruce J
2004-01-01
Background In this study we have built and mined a gene expression database composed of 65 diverse mouse tissues for genes preferentially expressed in immune tissues and cell types. Using expression pattern criteria, we identified 360 genes with preferential expression in thymus, spleen, peripheral blood mononuclear cells, lymph nodes (unstimulated or stimulated), or in vitro activated T-cells. Results Gene clusters, formed based on similarity of expression-pattern across either all tissues or the immune tissues only, had highly significant associations both with immunological processes such as chemokine-mediated response, antigen processing, receptor-related signal transduction, and transcriptional regulation, and also with more general processes such as replication and cell cycle control. Within-cluster gene correlations implicated known associations of known genes, as well as immune process-related roles for poorly described genes. To characterize regulatory mechanisms and cis-elements of genes with similar patterns of expression, we used a new version of a comparative genomics-based cis-element analysis tool to identify clusters of cis-elements with compositional similarity among multiple genes. Several clusters contained genes that shared 5–6 cis-elements that included ETS and zinc-finger binding sites. cis-Elements AP2 EGRF ETSF MAZF SP1F ZF5F and AREB ETSF MZF1 PAX5 STAT were shared in a thymus-expressed set; AP4R E2FF EBOX ETSF MAZF SP1F ZF5F and CREB E2FF MAZF PCAT SP1F STAT cis-clusters occurred in activated T-cells; CEBP CREB NFKB SORY and GATA NKXH OCT1 RBIT occurred in stimulated lymph nodes. Conclusion This study demonstrates a series of analytic approaches that have allowed the implication of genes and regulatory elements that participate in the differentiation, maintenance, and function of the immune system. Polymorphism or mutation of these could adversely impact immune system functions. PMID:15504237
Spectral gene set enrichment (SGSE).
Frost, H Robert; Li, Zhigang; Moore, Jason H
2015-03-03
Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.
Srivastava, Mousami; Khurana, Pankaj; Sugadev, Ragumani
2012-11-02
The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. Digital differential display (DDD) rapidly creates transcription profiles based on EST comparisons and numerically calculates, as a fraction of the pool of ESTs, the relative sequence abundance of known and novel genes. However, the process of identifying the most likely tissue for a specific disease in which to search for candidate genes from the pool of differentially expressed genes remains difficult. Therefore, we have used 'Gene Ontology semantic similarity score' to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal) and disease (cancer) sets. This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. The dendrogram cluster stability was assessed by multiple bootstrapping. Multiple bootstrapping also computes a p-value for each cluster and corrects the bias of the bootstrap probability. Subsequent hierarchical clustering by the multiple bootstrapping method (α = 0.95) identified seven clusters. The comparative, as well as subtractive, approach revealed a set of 38 biomarkers comprising four distinct lung cancer signature biomarker clusters (panel 1-4). Further gene enrichment analysis of the four panels revealed that each panel represents a set of lung cancer linked metastasis diagnostic biomarkers (panel 1), chemotherapy/drug resistance biomarkers (panel 2), hypoxia regulated biomarkers (panel 3) and lung extra cellular matrix biomarkers (panel 4). Expression analysis reveals that hypoxia induced lung cancer related biomarkers (panel 3), HIF and its modulating proteins (TGM2, CSNK1A1, CTNNA1, NAMPT/Visfatin, TNFRSF1A, ETS1, SRC-1, FN1, APLP2, DMBT1/SAG, AIB1 and AZIN1) are significantly down regulated. All down regulated genes in this panel were highly up regulated in most other types of cancers. These panels of proteins may represent signature biomarkers for lung cancer and will aid in lung cancer diagnosis and disease monitoring as well as in the prediction of responses to therapeutics.
Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S
2016-12-01
Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.
Schorn, Michelle A.; Alanjary, Mohammad M.; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R.; Ziemert, Nadine
2016-01-01
Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites. PMID:27902408
Xiao, Yinghua; van Hijum, Sacha A F T; Abee, Tjakko; Wells-Bennik, Marjon H J
2015-01-01
The formation of bacterial spores is a highly regulated process and the ultimate properties of the spores are determined during sporulation and subsequent maturation. A wide variety of genes that are expressed during sporulation determine spore properties such as resistance to heat and other adverse environmental conditions, dormancy and germination responses. In this study we characterized the sporulation phases of C. perfringens enterotoxic strain SM101 based on morphological characteristics, biomass accumulation (OD600), the total viable counts of cells plus spores, the viable count of heat resistant spores alone, the pH of the supernatant, enterotoxin production and dipicolinic acid accumulation. Subsequently, whole-genome expression profiling during key phases of the sporulation process was performed using DNA microarrays, and genes were clustered based on their time-course expression profiles during sporulation. The majority of previously characterized C. perfringens germination genes showed upregulated expression profiles in time during sporulation and belonged to two main clusters of genes. These clusters with up-regulated genes contained a large number of C. perfringens genes which are homologs of Bacillus genes with roles in sporulation and germination; this study therefore suggests that those homologs are functional in C. perfringens. A comprehensive homology search revealed that approximately half of the upregulated genes in the two clusters are conserved within a broad range of sporeforming Firmicutes. Another 30% of upregulated genes in the two clusters were found only in Clostridium species, while the remaining 20% appeared to be specific for C. perfringens. These newly identified genes may add to the repertoire of genes with roles in sporulation and determining spore properties including germination behavior. Their exact roles remain to be elucidated in future studies.
Xiao, Yinghua; van Hijum, Sacha A. F. T.; Abee, Tjakko; Wells-Bennik, Marjon H. J.
2015-01-01
The formation of bacterial spores is a highly regulated process and the ultimate properties of the spores are determined during sporulation and subsequent maturation. A wide variety of genes that are expressed during sporulation determine spore properties such as resistance to heat and other adverse environmental conditions, dormancy and germination responses. In this study we characterized the sporulation phases of C. perfringens enterotoxic strain SM101 based on morphological characteristics, biomass accumulation (OD600), the total viable counts of cells plus spores, the viable count of heat resistant spores alone, the pH of the supernatant, enterotoxin production and dipicolinic acid accumulation. Subsequently, whole-genome expression profiling during key phases of the sporulation process was performed using DNA microarrays, and genes were clustered based on their time-course expression profiles during sporulation. The majority of previously characterized C. perfringens germination genes showed upregulated expression profiles in time during sporulation and belonged to two main clusters of genes. These clusters with up-regulated genes contained a large number of C. perfringens genes which are homologs of Bacillus genes with roles in sporulation and germination; this study therefore suggests that those homologs are functional in C. perfringens. A comprehensive homology search revealed that approximately half of the upregulated genes in the two clusters are conserved within a broad range of sporeforming Firmicutes. Another 30% of upregulated genes in the two clusters were found only in Clostridium species, while the remaining 20% appeared to be specific for C. perfringens. These newly identified genes may add to the repertoire of genes with roles in sporulation and determining spore properties including germination behavior. Their exact roles remain to be elucidated in future studies. PMID:25978838
González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere
2014-12-17
Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes.
Li, Jun; Tai, Cui; Deng, Zixin; Zhong, Weihong; He, Yongqun; Ou, Hong-Yu
2017-01-10
VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A strategy of gene overexpression based on tandem repetitive promoters in Escherichia coli.
Li, Mingji; Wang, Junshu; Geng, Yanping; Li, Yikui; Wang, Qian; Liang, Quanfeng; Qi, Qingsheng
2012-02-06
For metabolic engineering, many rate-limiting steps may exist in the pathways of accumulating the target metabolites. Increasing copy number of the desired genes in these pathways is a general method to solve the problem, for example, the employment of the multi-copy plasmid-based expression system. However, this method may bring genetic instability, structural instability and metabolic burden to the host, while integrating of the desired gene into the chromosome may cause inadequate transcription or expression. In this study, we developed a strategy for obtaining gene overexpression by engineering promoter clusters consisted of multiple core-tac-promoters (MCPtacs) in tandem. Through a uniquely designed in vitro assembling process, a series of promoter clusters were constructed. The transcription strength of these promoter clusters showed a stepwise enhancement with the increase of tandem repeats number until it reached the critical value of five. Application of the MCPtacs promoter clusters in polyhydroxybutyrate (PHB) production proved that it was efficient. Integration of the phaCAB genes with the 5CPtacs promoter cluster resulted in an engineered E.coli that can accumulate 23.7% PHB of the cell dry weight in batch cultivation. The transcription strength of the MCPtacs promoter cluster can be greatly improved by increasing the tandem repeats number of the core-tac-promoter. By integrating the desired gene together with the MCPtacs promoter cluster into the chromosome of E. coli, we can achieve high and stale overexpression with only a small size. This strategy has an application potential in many fields and can be extended to other bacteria.
Kettle, Andrew J; Carere, Jason; Batley, Jacqueline; Manners, John M; Kazan, Kemal; Gardiner, Donald M
2016-03-01
A number of cereals produce the benzoxazolinone class of phytoalexins. Fusarium species pathogenic towards these hosts can typically degrade these compounds via an aminophenol intermediate, and the ability to do so is encoded by a group of genes found in the Fusarium Detoxification of Benzoxazolinone (FDB) cluster. A zinc finger transcription factor encoded by one of the FDB cluster genes (FDB3) has been proposed to regulate the expression of other genes in the cluster and hence is potentially involved in benzoxazolinone degradation. Herein we show that Fdb3 is essential for the ability of Fusarium pseudograminearum to efficiently detoxify the predominant wheat benzoxazolinone, 6-methoxy-benzoxazolin-2-one (MBOA), but not benzoxazoline-2-one (BOA). Furthermore, additional genes thought to be part of the FDB gene cluster, based upon transcriptional response to benzoxazolinones, are regulated by Fdb3. However, deletion mutants for these latter genes remain capable of benzoxazolinone degradation, suggesting that they are not essential for this process. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
A transversal approach to predict gene product networks from ontology-based similarity
Chabalier, Julie; Mosser, Jean; Burgun, Anita
2007-01-01
Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807
Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.
Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D; Ober, Carole; Nicolae, Dan L; Barnes, Kathleen C; London, Stephanie J; Gilliland, Frank; Weiss, Scott T; Raby, Benjamin A; Cohn, Lauren; Chupp, Geoffrey L
2015-05-15
The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively. There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.
Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma
Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F.; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D.; Ober, Carole; Nicolae, Dan L.; Barnes, Kathleen C.; London, Stephanie J.; Gilliland, Frank; Weiss, Scott T.; Raby, Benjamin A.; Cohn, Lauren
2015-01-01
Rationale: The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. Objectives: We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Methods: Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Measurements and Main Results: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10−6) and hospitalization (P = 0.01), respectively. Conclusions: There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma. PMID:25763605
A cluster of culture positive gonococcal infections but with false negative cppB gene based PCR.
Lum, G; Freeman, K; Nguyen, N L; Limnios, E A; Tabrizi, S N; Carter, I; Chambers, I W; Whiley, D M; Sloots, T P; Garland, S M; Tapsall, J W
2005-10-01
To describe the prevalence and characteristics of isolates of Neisseria gonorrhoeae grown from urine samples that produced negative results with nucleic acid amplification assays (NAA) targeting the cppB gene. An initial cluster of culture positive, but cppB gene based NAA negative, gonococcal infections was recognised. Urine samples and suspensions of gonococci isolated over 9 months in the Northern Territory of Australia were examined using cppB gene based and other non-cppB gene based NAA. The gonococcal isolates were phenotyped by determining the auxotype/serovar (A/S) class and genotyped by pulsed field gel electrophoresis (PFGE). 14 (9.8%) of 143 gonococci isolated were of A/S class Pro(-/)Brpyut, indistinguishable on PFGE and negative in cppB gene based, but not other, NAA. This cluster represents a temporal and geographic expansion of a gonococcal subtype lacking the cppB gene with consequent loss of sensitivity of NAA dependent on amplification of this target. Gonococci lacking the cppB gene have in the past been more commonly associated with the PAU-/PCU- auxotype, a gonococcal subtype hitherto infrequently encountered in Australia. NAA based on the cppB gene as a target may produce false positive as well as false negative NAA. This suggests that unless there is continuing comparison with culture to show their utility, cppB gene based NAA should be regarded as suboptimal for use either as a diagnostic or supplemental assay for diagnosis of gonorrhoea, and NAA with alternative amplification targets should be substituted.
Missing link in the evolution of Hox clusters.
Ogishima, Soichi; Tanaka, Hiroshi
2007-01-31
Hox cluster has key roles in regulating the patterning of the antero-posterior axis in a metazoan embryo. It consists of the anterior, central and posterior genes; the central genes have been identified only in bilaterians, but not in cnidarians, and are responsible for archiving morphological complexity in bilaterian development. However, their evolutionary history has not been revealed, that is, there has been a "missing link". Here we show the evolutionary history of Hox clusters of 18 bilaterians and 2 cnidarians by using a new method, "motif-based reconstruction", examining the gain/loss processes of evolutionarily conserved sequences, "motifs", outside the homeodomain. We successfully identified the missing link in the evolution of Hox clusters between the cnidarian-bilaterian ancestor and the bilaterians as the ancestor of the central genes, which we call the proto-central gene. Exploring the correspondent gene with the proto-central gene, we found that one of the acoela Hox genes has the same motif repertory as that of the proto-central gene. This interesting finding suggests that the acoela Hox cluster corresponds with the missing link in the evolution of the Hox cluster between the cnidarian-bilaterian ancestor and the bilaterians. Our findings suggested that motif gains/diversifications led to the explosive diversity of the bilaterian body plan.
Haakensen, Vilde D; Lingjaerde, Ole Christian; Lüders, Torben; Riis, Margit; Prat, Aleix; Troester, Melissa A; Holmen, Marit M; Frantzen, Jan Ole; Romundstad, Linda; Navjord, Dina; Bukholm, Ida K; Johannesen, Tom B; Perou, Charles M; Ursin, Giske; Kristensen, Vessela N; Børresen-Dale, Anne-Lise; Helland, Aslaug
2011-11-01
Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer.
Identification of hub subnetwork based on topological features of genes in breast cancer
ZHUANG, DA-YONG; JIANG, LI; HE, QING-QING; ZHOU, PENG; YUE, TAO
2015-01-01
The aim of this study was to provide functional insight into the identification of hub subnetworks by aggregating the behavior of genes connected in a protein-protein interaction (PPI) network. We applied a protein network-based approach to identify subnetworks which may provide new insight into the functions of pathways involved in breast cancer rather than individual genes. Five groups of breast cancer data were downloaded and analyzed from the Gene Expression Omnibus (GEO) database of high-throughput gene expression data to identify gene signatures using the genome-wide global significance (GWGS) method. A PPI network was constructed using Cytoscape and clusters that focused on highly connected nodes were obtained using the molecular complex detection (MCODE) clustering algorithm. Pathway analysis was performed to assess the functional relevance of selected gene signatures based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Topological centrality was used to characterize the biological importance of gene signatures, pathways and clusters. The results revealed that, cluster1, as well as the cell cycle and oocyte meiosis pathways were significant subnetworks in the analysis of degree and other centralities, in which hub nodes mostly distributed. The most important hub nodes, with top ranked centrality, were also similar with the common genes from the above three subnetwork intersections, which was viewed as a hub subnetwork with more reproducible than individual critical genes selected without network information. This hub subnetwork attributed to the same biological process which was essential in the function of cell growth and death. This increased the accuracy of identifying gene interactions that took place within the same functional process and was potentially useful for the development of biomarkers and networks for breast cancer. PMID:25573623
Liu, Yanhong; Yan, Xianghe; DebRoy, Chitrita; Fratamico, Pina M.; Needleman, David S.; Li, Robert W.; Wang, Wei; Losada, Liliana; Brinkac, Lauren; Radune, Diana; Toro, Magaly; Hegde, Narasimha; Meng, Jianghong
2015-01-01
The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase) and/or wzy (O-antigen polymerase) genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS) element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources. PMID:25664526
Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.
Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V
2016-01-01
Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
PlantTribes: a gene and gene family resource for comparative genomics in plants
Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.
2008-01-01
The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study. PMID:18073194
Welcome to pandoraviruses at the ‘Fourth TRUC’ club
Sharma, Vikas; Colson, Philippe; Chabrol, Olivier; Scheid, Patrick; Pontarotti, Pierre; Raoult, Didier
2015-01-01
Nucleocytoplasmic large DNA viruses, or representatives of the proposed order Megavirales, belong to families of giant viruses that infect a broad range of eukaryotic hosts. Megaviruses have been previously described to comprise a fourth monophylogenetic TRUC (things resisting uncompleted classification) together with cellular domains in the universal tree of life. Recently described pandoraviruses have large (1.9–2.5 MB) and highly divergent genomes. In the present study, we updated the classification of pandoraviruses and other reported giant viruses. Phylogenetic trees were constructed based on six informational genes. Hierarchical clustering was performed based on a set of informational genes from Megavirales members and cellular organisms. Homologous sequences were selected from cellular organisms using TimeTree software, comprising comprehensive, and representative sets of members from Bacteria, Archaea, and Eukarya. Phylogenetic analyses based on three conserved core genes clustered pandoraviruses with phycodnaviruses, exhibiting their close relatedness. Additionally, hierarchical clustering analyses based on informational genes grouped pandoraviruses with Megavirales members as a super group distinct from cellular organisms. Thus, the analyses based on core conserved genes revealed that pandoraviruses are new genuine members of the ‘Fourth TRUC’ club, encompassing distinct life forms compared with cellular organisms. PMID:26042093
Welcome to pandoraviruses at the 'Fourth TRUC' club.
Sharma, Vikas; Colson, Philippe; Chabrol, Olivier; Scheid, Patrick; Pontarotti, Pierre; Raoult, Didier
2015-01-01
Nucleocytoplasmic large DNA viruses, or representatives of the proposed order Megavirales, belong to families of giant viruses that infect a broad range of eukaryotic hosts. Megaviruses have been previously described to comprise a fourth monophylogenetic TRUC (things resisting uncompleted classification) together with cellular domains in the universal tree of life. Recently described pandoraviruses have large (1.9-2.5 MB) and highly divergent genomes. In the present study, we updated the classification of pandoraviruses and other reported giant viruses. Phylogenetic trees were constructed based on six informational genes. Hierarchical clustering was performed based on a set of informational genes from Megavirales members and cellular organisms. Homologous sequences were selected from cellular organisms using TimeTree software, comprising comprehensive, and representative sets of members from Bacteria, Archaea, and Eukarya. Phylogenetic analyses based on three conserved core genes clustered pandoraviruses with phycodnaviruses, exhibiting their close relatedness. Additionally, hierarchical clustering analyses based on informational genes grouped pandoraviruses with Megavirales members as a super group distinct from cellular organisms. Thus, the analyses based on core conserved genes revealed that pandoraviruses are new genuine members of the 'Fourth TRUC' club, encompassing distinct life forms compared with cellular organisms.
A roadmap for natural product discovery based on large-scale genomics and metabolomics
USDA-ARS?s Scientific Manuscript database
Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic ca...
Outcome-Driven Cluster Analysis with Application to Microarray Data.
Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A
2015-01-01
One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
Sekigami, Yuka; Kobayashi, Takuya; Omi, Ai; Nishitsuji, Koki; Ikuta, Tetsuro; Fujiyama, Asao; Satoh, Noriyuki; Saiga, Hidetoshi
2017-01-01
Hox gene clusters with at least 13 paralog group (PG) members are common in vertebrate genomes and in that of amphioxus. Ascidians, which belong to the subphylum Tunicata (Urochordata), are phylogenetically positioned between vertebrates and amphioxus, and traditionally divided into two groups: the Pleurogona and the Enterogona. An enterogonan ascidian, Ciona intestinalis ( Ci ), possesses nine Hox genes localized on two chromosomes; thus, the Hox gene cluster is disintegrated. We investigated the Hox gene cluster of a pleurogonan ascidian, Halocynthia roretzi ( Hr ) to investigate whether Hox gene cluster disintegration is common among ascidians, and if so, how such disintegration occurred during ascidian or tunicate evolution. Our phylogenetic analysis reveals that the Hr Hox gene complement comprises nine members, including one with a relatively divergent Hox homeodomain sequence. Eight of nine Hr Hox genes were orthologous to Ci-Hox1 , 2, 3, 4, 5, 10, 12 and 13. Following the phylogenetic classification into 13 PGs, we designated Hr Hox genes as Hox1, 2, 3, 4, 5, 10, 11/12/13.a , 11/12/13.b and HoxX . To address the chromosomal arrangement of the nine Hox genes, we performed two-color chromosomal fluorescent in situ hybridization, which revealed that the nine Hox genes are localized on a single chromosome in Hr , distinct from their arrangement in Ci . We further examined the order of the nine Hox genes on the chromosome by chromosome/scaffold walking. This analysis suggested a gene order of Hox1 , 11/12/13.b, 11/12/13.a, 10, 5, X, followed by either Hox4, 3, 2 or Hox2, 3, 4 on the chromosome. Based on the present results and those previously reported in Ci , we discuss the establishment of the Hox gene complement and disintegration of Hox gene clusters during the course of ascidian or tunicate evolution. The Hox gene cluster and the genome must have experienced extensive reorganization during the course of evolution from the ancestral tunicate to Hr and Ci . Nevertheless, some features are shared in Hox gene components and gene arrangement on the chromosomes, suggesting that Hox gene cluster disintegration in ascidians involved early events common to tunicates as well as later ascidian lineage-specific events.
Fungal secondary metabolites - strategies to activate silent gene clusters.
Brakhage, Axel A; Schroeckh, Volker
2011-01-01
Filamentous fungi produce a multitude of low molecular weight bioactive compounds. The increasing number of fungal genome sequences impressively demonstrated that their biosynthetic potential is far from being exploited. In fungi, the genes required for the biosynthesis of a secondary metabolite are clustered. Many of these bioinformatically newly discovered secondary metabolism gene clusters are silent under standard laboratory conditions. Consequently, no product can be found. This review summarizes the current strategies that have been successfully applied during the last years to activate these silent gene clusters in filamentous fungi, especially in the genus Aspergillus. The techniques take advantage of genome mining, vary from the simple search for compounds with bioinformatically predicted physicochemical properties up to methods that exploit a probable interaction of microorganisms. Until now, the majority of successful approaches have been based on molecular biology like the generation of gene "knock outs", promoter exchange, overexpression of transcription factors or other pleiotropic regulators. Moreover, strategies based on epigenetics opened a new avenue for the elucidation of the regulation of secondary metabolite formation and will certainly continue to play a significant role for the elucidation of cryptic natural products. The conditions under which a given gene cluster is naturally expressed are largely unknown. One technique is to attempt to simulate the natural habitat by co-cultivation of microorganisms from the same ecosystem. This has already led to the activation of silent gene clusters and the identification of novel compounds in Aspergillus nidulans. These simulation strategies will help discover new natural products in the future, and may also provide fundamental new insights into microbial communication. Copyright © 2010 Elsevier Inc. All rights reserved.
Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo
2012-03-01
Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.
Conley, P B; Lemaux, P G; Lomax, T L; Grossman, A R
1986-01-01
The polypeptide composition of the phycobilisome, the major light-harvesting complex of prokaryotic cyanobacteria and certain eukaryotic algae, can be modulated by different light qualities in cyanobacteria exhibiting chromatic adaptation. We have identified genomic fragments encoding a cluster of phycobilisome polypeptides (phycobiliproteins) from the chromatically adapting cyanobacterium Fremyella diplosiphon using previously characterized DNA fragments of phycobiliprotein genes from the eukaryotic alga Cyanophora paradoxa and from F. diplosiphon. Characterization of two lambda-EMBL3 clones containing overlapping genomic fragments indicates that three sets of phycobiliprotein genes--the alpha- and beta-allophycocyanin genes plus two sets of alpha- and beta-phycocyanin genes--are clustered within 13 kilobases on the cyanobacterial genome and transcribed off the same strand. The gene order (alpha-allophycocyanin followed by beta-allophycocyanin and beta-phycocyanin followed by alpha-phycocyanin) appears to be a conserved arrangement found previously in a eukaryotic alga and another cyanobacterium. We have reported that one set of phycocyanin genes is transcribed as two abundant red light-induced mRNAs (1600 and 3800 bases). We now present data showing that the allophycocyanin genes and a second set of phycocyanin genes are transcribed into major mRNAs of 1400 and 1600 bases, respectively. These transcripts are present in RNA isolated from cultures grown in red and green light, although lower levels of the 1600-base phycocyanin transcript are present in cells grown in green light. Furthermore, a larger transcript of 1750 bases hybridizes to the allophycocyanin genes and may be a precursor to the 1400-base species. Images PMID:3086870
A homeotic gene cluster patterns the anteroposterior body axis of C. elegans.
Wang, B B; Müller-Immergluck, M M; Austin, J; Robinson, N T; Chisholm, A; Kenyon, C
1993-07-16
In insects and vertebrates, clusters of Antennapedia class homeobox (HOM-C) genes specify anteroposterior body pattern. The nematode C. elegans also contains a small cluster of HOM-C genes, one of which has been shown to specify positional identity. Here we show that two additional C. elegans HOM-C genes also specify positional identity and that together these three HOM-C genes function along the anteroposterior axis in the same order as their homologs in other organisms. Thus, HOM-C-based pattern formation has been conserved in nematodes despite the many differences in morphology and embryology that distinguish them from other phyla. Each C. elegans HOM-C gene is responsible for a distinct body region; however, where their domains overlap, two HOM-C genes can act together to specify the fates of individual cells.
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P
2015-04-14
Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for choline fermentation (the cut gene cluster) have been recently identified, there has been no characterization of these genes in human gut isolates and microbial communities. In this work, we use multiple approaches to demonstrate that the pathway encoded by the cut genes is present and functional in a diverse range of human gut bacteria and is also widespread in stool metagenomes. We also developed a PCR-based strategy to detect a key functional gene (cutC) involved in this pathway and applied it to characterize newly isolated choline-utilizing strains. Both our analyses of the cut gene cluster and this molecular tool will aid efforts to further understand the role of choline metabolism in the human gut microbiota and its link to disease. Copyright © 2015 Martínez-del Campo et al.
Chou, A; Burke, J
1999-05-01
DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae
Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira
2011-01-01
Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716
2011-01-01
Background Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Methods Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Results Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. Conclusion This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer. PMID:22044755
Van den Eynden, Jimmy; Fierro, Ana Carolina; Verbeke, Lieven P C; Marchal, Kathleen
2015-04-23
With the advances in high throughput technologies, increasing amounts of cancer somatic mutation data are being generated and made available. Only a small number of (driver) mutations occur in driver genes and are responsible for carcinogenesis, while the majority of (passenger) mutations do not influence tumour biology. In this study, SomInaClust is introduced, a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively. SomInaClust starts from the observation that oncogenes mainly contain mutations that, due to positive selection, cluster at similar positions in a gene across patient samples, whereas tumour suppressor genes contain a high number of protein-truncating mutations throughout the entire gene length. The method was shown to prioritize driver genes in 9 different solid cancers. Furthermore it was found to be complementary to existing similar-purpose methods with the additional advantages that it has a higher sensitivity, also for rare mutations (occurring in less than 1% of all samples), and it accurately classifies candidate driver genes in putative oncogenes and tumour suppressor genes. Pathway enrichment analysis showed that the identified genes belong to known cancer signalling pathways, and that the distinction between oncogenes and tumour suppressor genes is biologically relevant. SomInaClust was shown to detect candidate driver genes based on somatic mutation patterns of inactivation and clustering and to distinguish oncogenes from tumour suppressor genes. The method could be used for the identification of new cancer genes or to filter mutation data for further data-integration purposes.
SeMPI: a genome-based secondary metabolite prediction and identification web server.
Zierep, Paul F; Padilla, Natàlia; Yonchev, Dimitar G; Telukunta, Kiran K; Klementz, Dennis; Günther, Stefan
2017-07-03
The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not been identified yet. Even though genome mining tools have become significantly more efficient in the identification of biosynthetic gene clusters, structural elucidation of the actual secondary metabolite is still challenging, especially due to as yet unpredictable post-modifications. Here, we introduce SeMPI, a web server providing a prediction and identification pipeline for natural products synthesized by polyketide synthases of type I modular. In order to limit the possible structures of PKS products and to include putative tailoring reactions, a structural comparison with annotated natural products was introduced. Furthermore, a benchmark was designed based on 40 gene clusters with annotated PKS products. The web server of the pipeline (SeMPI) is freely available at: http://www.pharmaceutical-bioinformatics.de/sempi. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Morata, Jordi; Puigdomènech, Pere
2017-02-08
Cucurbitaceae species contain a significantly lower number of genes coding for proteins with similarity to plant resistance genes belonging to the NBS-LRR family than other plant species of similar genome size. A large proportion of these genes are organized in clusters that appear to be hotspots of variability. The genomes of the Cucurbitaceae species measured until now are intermediate in size (between 350 and 450 Mb) and they apparently have not undergone any genome duplications beside those at the origin of eudicots. The cluster containing the largest number of NBS-LRR genes has previously been analyzed in melon and related species and showed a high degree of interspecific and intraspecific variability. It was of interest to study whether similar behavior occurred in other cluster of the same family of genes. The cluster of NBS-LRR genes located in melon chromosome 9 was analyzed and compared with the syntenic regions in other cucurbit genomes. This is the second cluster in number within this species and it contains nine sequences with a NBS-LRR annotation including two genes, Fom1 and Prv, providing resistance against Fusarium and Ppapaya ring-spot virus (PRSV). The variability within the melon species appears to consist essentially of single nucleotide polymorphisms. Clusters of similar genes are present in the syntenic regions of the two species of Cucurbitaceae that were sequenced, cucumber and watermelon. Most of the genes in the syntenic clusters can be aligned between species and a hypothesis of generation of the cluster is proposed. The number of genes in the watermelon cluster is similar to that in melon while a higher number of genes (12) is present in cucumber, a species with a smaller genome than melon. After comparing genome resequencing data of 115 cucumber varieties, deletion of a group of genes is observed in a group of varieties of Indian origin. Clusters of genes coding for NBS-LRR proteins in cucurbits appear to have specific variability in different regions of the genome and between different species. This observation is in favour of considering that the adaptation of plant species to changing environments is based upon the variability that may occur at any location in the genome and that has been produced by specific mechanisms of sequence variation acting on plant genomes. This information could be useful both to understand the evolution of species and for plant breeding.
Disentangling the multigenic and pleiotropic nature of molecular function
2015-01-01
Background Biological processes at the molecular level are usually represented by molecular interaction networks. Function is organised and modularity identified based on network topology, however, this approach often fails to account for the dynamic and multifunctional nature of molecular components. For example, a molecule engaging in spatially or temporally independent functions may be inappropriately clustered into a single functional module. To capture biologically meaningful sets of interacting molecules, we use experimentally defined pathways as spatial/temporal units of molecular activity. Results We defined functional profiles of Saccharomyces cerevisiae based on a minimal set of Gene Ontology terms sufficient to represent each pathway's genes. The Gene Ontology terms were used to annotate 271 pathways, accounting for pathway multi-functionality and gene pleiotropy. Pathways were then arranged into a network, linked by shared functionality. Of the genes in our data set, 44% appeared in multiple pathways performing a diverse set of functions. Linking pathways by overlapping functionality revealed a modular network with energy metabolism forming a sparse centre, surrounded by several denser clusters comprised of regulatory and metabolic pathways. Signalling pathways formed a relatively discrete cluster connected to the centre of the network. Genetic interactions were enriched within the clusters of pathways by a factor of 5.5, confirming the organisation of our pathway network is biologically significant. Conclusions Our representation of molecular function according to pathway relationships enables analysis of gene/protein activity in the context of specific functional roles, as an alternative to typical molecule-centric graph-based methods. The pathway network demonstrates the cooperation of multiple pathways to perform biological processes and organises pathways into functionally related clusters with interdependent outcomes. PMID:26678917
Lu, Hong; Patil, Prabhu; Van Sluys, Marie-Anne; White, Frank F; Ryan, Robert P; Dow, J Maxwell; Rabinowicz, Pablo; Salzberg, Steven L; Leach, Jan E; Sonti, Ramesh; Brendel, Volker; Bogdanove, Adam J
2008-01-01
Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown. To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors) cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage. Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major modifications or wholesale exchange of clusters, but subtle changes in a small number of genes or in non-coding sequences, and/or differences outside the clusters, potentially among regulatory targets or secretory substrates.
Hidalgo, Pedro I; Ullán, Ricardo V; Albillos, Silvia M; Montero, Olimpio; Fernández-Bodega, María Ángeles; García-Estrada, Carlos; Fernández-Aguado, Marta; Martín, Juan-Francisco
2014-01-01
The PR-toxin is a potent mycotoxin produced by Penicillium roqueforti in moulded grains and grass silages and may contaminate blue-veined cheese. The PR-toxin derives from the 15 carbon atoms sesquiterpene aristolochene formed by the aristolochene synthase (encoded by ari1). We have cloned and sequenced a four gene cluster that includes the ari1 gene from P. roqueforti. Gene silencing of each of the four genes (named prx1 to prx4) resulted in a reduction of 65-75% in the production of PR-toxin indicating that the four genes encode enzymes involved in PR-toxin biosynthesis. Interestingly the four silenced mutants overproduce large amounts of mycophenolic acid, an antitumor compound formed by an unrelated pathway suggesting a cross-talk of PR-toxin and mycophenolic acid production. An eleven gene cluster that includes the above mentioned four prx genes and a 14-TMS drug/H(+) antiporter was found in the genome of Penicillium chrysogenum. This eleven gene cluster has been reported to be very poorly expressed in a transcriptomic study of P. chrysogenum genes under conditions of penicillin production (strongly aerated cultures). We found that this apparently silent gene cluster is able to produce PR-toxin in P. chrysogenum under static culture conditions on hydrated rice medium. Noteworthily, the production of PR-toxin was 2.6-fold higher in P. chrysogenum npe10, a strain deleted in the 56.8kb amplifiable region containing the pen gene cluster, than in the parental strain Wisconsin 54-1255 providing another example of cross-talk between secondary metabolite pathways in this fungus. A detailed PR-toxin biosynthesis pathway is proposed based on all available evidence. Copyright © 2013 Elsevier Inc. All rights reserved.
A systematic analysis of genomic changes in Tg2576 mice.
Tan, Lu; Wang, Xiong; Ni, Zhong-Fei; Zhu, Xiuming; Wu, Wei; Zhu, Ling-Qiang; Liu, Dan
2013-06-01
Alzheimer's disease (AD) is an age-related neurodegenerative disorder characterized by intelligence decline, behavioral disorders and cognitive disability. The purpose of this study was to investigate gene expression in AD, based on published microarray data on Tg2576 mice. Hierarchical Cluster Analysis and Gene Ontology were employed to group genes together on the basis of their product characteristics and annotation data. Genes with prominent alterations were clustered into apoptosis and axon guidance pathways. Based on our findings and those of previous studies, we propose that the mitochondria-mediated apoptotic pathway plays a crucial role in the neuronal loss and synaptic dysfunction associated with AD. Furthermore, based on the findings of Positional Gene Enrichment analysis and Gene Set Enrichment analysis, we propose that the regulation of transcription of AD genes may be an important pathogenic factor in this neurodegenerative disease. Our results highlight the importance of genes that could subsequently be examined for their potential as prognostic markers for AD.
Studt, Lena; Niehaus, Eva-Maria; Espino, Jose J.; Huß, Kathleen; Michielse, Caroline B.; Albermann, Sabine; Wagner, Dominik; Bergner, Sonja V.; Connolly, Lanelle R.; Fischer, Andreas; Reuter, Gunter; Kleigrewe, Karin; Bald, Till; Wingfield, Brenda D.; Ophir, Ron; Freeman, Stanley; Hippler, Michael; Smith, Kristina M.; Brown, Daren W.; Proctor, Robert H.; Münsterkötter, Martin; Freitag, Michael; Humpf, Hans-Ulrich; Güldener, Ulrich; Tudzynski, Bettina
2013-01-01
The fungus Fusarium fujikuroi causes “bakanae” disease of rice due to its ability to produce gibberellins (GAs), but it is also known for producing harmful mycotoxins. However, the genetic capacity for the whole arsenal of natural compounds and their role in the fungus' interaction with rice remained unknown. Here, we present a high-quality genome sequence of F. fujikuroi that was assembled into 12 scaffolds corresponding to the 12 chromosomes described for the fungus. We used the genome sequence along with ChIP-seq, transcriptome, proteome, and HPLC-FTMS-based metabolome analyses to identify the potential secondary metabolite biosynthetic gene clusters and to examine their regulation in response to nitrogen availability and plant signals. The results indicate that expression of most but not all gene clusters correlate with proteome and ChIP-seq data. Comparison of the F. fujikuroi genome to those of six other fusaria revealed that only a small number of gene clusters are conserved among these species, thus providing new insights into the divergence of secondary metabolism in the genus Fusarium. Noteworthy, GA biosynthetic genes are present in some related species, but GA biosynthesis is limited to F. fujikuroi, suggesting that this provides a selective advantage during infection of the preferred host plant rice. Among the genome sequences analyzed, one cluster that includes a polyketide synthase gene (PKS19) and another that includes a non-ribosomal peptide synthetase gene (NRPS31) are unique to F. fujikuroi. The metabolites derived from these clusters were identified by HPLC-FTMS-based analyses of engineered F. fujikuroi strains overexpressing cluster genes. In planta expression studies suggest a specific role for the PKS19-derived product during rice infection. Thus, our results indicate that combined comparative genomics and genome-wide experimental analyses identified novel genes and secondary metabolites that contribute to the evolutionary success of F. fujikuroi as a rice pathogen. PMID:23825955
Gene duplications in prokaryotes can be associated with environmental adaptation
2010-01-01
Background Gene duplication is a normal evolutionary process. If there is no selective advantage in keeping the duplicated gene, it is usually reduced to a pseudogene and disappears from the genome. However, some paralogs are retained. These gene products are likely to be beneficial to the organism, e.g. in adaptation to new environmental conditions. The aim of our analysis is to investigate the properties of paralog-forming genes in prokaryotes, and to analyse the role of these retained paralogs by relating gene properties to life style of the corresponding prokaryotes. Results Paralogs were identified in a number of prokaryotes, and these paralogs were compared to singletons of persistent orthologs based on functional classification. This showed that the paralogs were associated with for example energy production, cell motility, ion transport, and defence mechanisms. A statistical overrepresentation analysis of gene and protein annotations was based on paralogs of the 200 prokaryotes with the highest fraction of paralog-forming genes. Biclustering of overrepresented gene ontology terms versus species was used to identify clusters of properties associated with clusters of species. The clusters were classified using similarity scores on properties and species to identify interesting clusters, and a subset of clusters were analysed by comparison to literature data. This analysis showed that paralogs often are associated with properties that are important for survival and proliferation of the specific organisms. This includes processes like ion transport, locomotion, chemotaxis and photosynthesis. However, the analysis also showed that the gene ontology terms sometimes were too general, imprecise or even misleading for automatic analysis. Conclusions Properties described by gene ontology terms identified in the overrepresentation analysis are often consistent with individual prokaryote lifestyles and are likely to give a competitive advantage to the organism. Paralogs and singletons dominate different categories of functional classification, where paralogs in particular seem to be associated with processes involving interaction with the environment. PMID:20961426
Gene duplications in prokaryotes can be associated with environmental adaptation.
Bratlie, Marit S; Johansen, Jostein; Sherman, Brad T; Huang, Da Wei; Lempicki, Richard A; Drabløs, Finn
2010-10-20
Gene duplication is a normal evolutionary process. If there is no selective advantage in keeping the duplicated gene, it is usually reduced to a pseudogene and disappears from the genome. However, some paralogs are retained. These gene products are likely to be beneficial to the organism, e.g. in adaptation to new environmental conditions. The aim of our analysis is to investigate the properties of paralog-forming genes in prokaryotes, and to analyse the role of these retained paralogs by relating gene properties to life style of the corresponding prokaryotes. Paralogs were identified in a number of prokaryotes, and these paralogs were compared to singletons of persistent orthologs based on functional classification. This showed that the paralogs were associated with for example energy production, cell motility, ion transport, and defence mechanisms. A statistical overrepresentation analysis of gene and protein annotations was based on paralogs of the 200 prokaryotes with the highest fraction of paralog-forming genes. Biclustering of overrepresented gene ontology terms versus species was used to identify clusters of properties associated with clusters of species. The clusters were classified using similarity scores on properties and species to identify interesting clusters, and a subset of clusters were analysed by comparison to literature data. This analysis showed that paralogs often are associated with properties that are important for survival and proliferation of the specific organisms. This includes processes like ion transport, locomotion, chemotaxis and photosynthesis. However, the analysis also showed that the gene ontology terms sometimes were too general, imprecise or even misleading for automatic analysis. Properties described by gene ontology terms identified in the overrepresentation analysis are often consistent with individual prokaryote lifestyles and are likely to give a competitive advantage to the organism. Paralogs and singletons dominate different categories of functional classification, where paralogs in particular seem to be associated with processes involving interaction with the environment.
Zuurbier, Linda; Gutierrez, Alejandro; Mullighan, Charles G.; Canté-Barrett, Kirsten; Gevaert, A. Olivier; de Rooi, Johan; Li, Yunlei; Smits, Willem K.; Buijs-Gladdines, Jessica G.C.A.M.; Sonneveld, Edwin; Look, A. Thomas; Horstmann, Martin; Pieters, Rob; Meijerink, Jules P.P.
2014-01-01
Three distinct immature T-cell acute lymphoblastic leukemia entities have been described including cases that express an early T-cell precursor immunophenotype or expression profile, immature MEF2C-dysregulated T-cell acute lymphoblastic leukemia cluster cases based on gene expression analysis (immature cluster) and cases that retain non-rearranged TRG@ loci. Early T-cell precursor acute lymphoblastic leukemia cases exclusively overlap with immature cluster samples based on the expression of early T-cell precursor acute lymphoblastic leukemia signature genes, indicating that both are featuring a single disease entity. Patients lacking TRG@ rearrangements represent only 40% of immature cluster cases, but no further evidence was found to suggest that cases with absence of bi-allelic TRG@ deletions reflect a distinct and even more immature disease entity. Immature cluster/early T-cell precursor acute lymphoblastic leukemia cases are strongly enriched for genes expressed in hematopoietic stem cells as well as genes expressed in normal early thymocyte progenitor or double negative-2A T-cell subsets. Identification of early T-cell precursor acute lymphoblastic leukemia cases solely by defined immunophenotypic criteria strongly underestimates the number of cases that have a corresponding gene signature. However, early T-cell precursor acute lymphoblastic leukemia samples correlate best with a CD1 negative, CD4 and CD8 double negative immunophenotype with expression of CD34 and/or myeloid markers CD13 or CD33. Unlike various other studies, immature cluster/early T-cell precursor acute lymphoblastic leukemia patients treated on the COALL-97 protocol did not have an overall inferior outcome, and demonstrated equal sensitivity levels to most conventional therapeutic drugs compared to other pediatric T-cell acute lymphoblastic leukemia patients. PMID:23975177
Inference from clustering with application to gene-expression microarrays.
Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M
2002-01-01
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Yu, Tonghu; Zhang, Huaping; Qi, Hong
2018-01-01
The aim of the present study was to investigate more colon cancer-related genes in different stages. Gene expression profile E-GEOD-62932 was extracted for differentially expressed gene (DEG) screening. Series test of cluster analysis was used to obtain significant trending models. Based on the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases, functional and pathway enrichment analysis were processed and a pathway relation network was constructed. Gene co-expression network and gene signal network were constructed for common DEGs. The DEGs with the same trend were clustered and in total, 16 clusters with statistical significance were obtained. The screened DEGs were enriched into small molecule metabolic process and metabolic pathways. The pathway relation network was constructed with 57 nodes. A total of 328 common DEGs were obtained. Gene signal network was constructed with 71 nodes. Gene co-expression network was constructed with 161 nodes and 211 edges. ABCD3, CPT2, AGL and JAM2 are potential biomarkers for the diagnosis of colon cancer. PMID:29928385
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.
Laetsch, Dominik R; Blaxter, Mark L
2017-10-05
The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.
Structural and functional annotation of the porcine immunome
2013-01-01
Background The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems. Results The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome. Conclusions This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response. PMID:23676093
Ziemons, Sandra; Koutsantas, Katerina; Becker, Kordula; Dahlmann, Tim; Kück, Ulrich
2017-02-16
Multi-copy gene integration into microbial genomes is a conventional tool for obtaining improved gene expression. For Penicillium chrysogenum, the fungal producer of the beta-lactam antibiotic penicillin, many production strains carry multiple copies of the penicillin biosynthesis gene cluster. This discovery led to the generally accepted view that high penicillin titers are the result of multiple copies of penicillin genes. Here we investigated strain P2niaD18, a production line that carries only two copies of the penicillin gene cluster. We performed pulsed-field gel electrophoresis (PFGE), quantitative qRT-PCR, and penicillin bioassays to investigate production, deletion and overexpression strains generated in the P. chrysogenum P2niaD18 background, in order to determine the copy number of the penicillin biosynthesis gene cluster, and study the expression of one penicillin biosynthesis gene, and the penicillin titer. Analysis of production and recombinant strain showed that the enhanced penicillin titer did not depend on the copy number of the penicillin gene cluster. Our assumption was strengthened by results with a penicillin null strain lacking pcbC encoding isopenicillin N synthase. Reintroduction of one or two copies of the cluster into the pcbC deletion strain restored transcriptional high expression of the pcbC gene, but recombinant strains showed no significantly different penicillin titer compared to parental strains. Here we present a molecular genetic analysis of production and recombinant strains in the P2niaD18 background carrying different copy numbers of the penicillin biosynthesis gene cluster. Our analysis shows that the enhanced penicillin titer does not strictly depend on the copy number of the cluster. Based on these overall findings, we hypothesize that instead, complex regulatory mechanisms are prominently implicated in increased penicillin biosynthesis in production strains.
Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K
2015-06-04
Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, M; Craft, D
Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less
Huang, Chen; Leung, Ross Ka-Kit; Guo, Min; Tuo, Li; Guo, Lin; Yew, Wing Wai; Lou, Inchio; Lee, Simon Ming Yuen; Sun, Chenghang
2016-01-01
Microbial secondary metabolites are valuable resources for novel drug discovery. In particular, actinomycetes expressed a range of antibiotics against a spectrum of bacteria. In genus level, strain Allosalinactinospora lopnorensis CA15-2T is the first new actinomycete isolated from the Lop Nor region, China. Antimicrobial assays revealed that the strain could inhibit the growth of certain types of bacteria, including Acinetobacter baumannii and Staphylococcus aureus, highlighting its clinical significance. Here we report the 5,894,259 base pairs genome of the strain, containing 5,662 predicted genes, and 832 of them cannot be detected by sequence similarity-based methods, suggesting the new species may carry a novel gene pool. Furthermore, our genome-mining investigation reveals that A. lopnorensis CA15-2T contains 17 gene clusters coding for known or novel secondary metabolites. Meanwhile, at least six secondary metabolites were disclosed from ethyl acetate (EA) extract of the fermentation broth of the strain by high-resolution UPLC-MS. Compared with reported clusters of other species, many new genes were found in clusters, and the physical chromosomal location and order of genes in the clusters are distinct. This study presents evidence in support of A. lopnorensis CA15-2T as a potent natural products source for drug discovery. PMID:26864220
NASA Astrophysics Data System (ADS)
Cahyaningrum, Rosalia D.; Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Technology of microarray became one of the imperative tools in life science to observe the gene expression levels, one of which is the expression of the genes of people with carcinoma. Carcinoma is a cancer that forms in the epithelial tissue. These data can be analyzed such as the identification expressions hereditary gene and also build classifications that can be used to improve diagnosis of carcinoma. Microarray data usually served in large dimension that most methods require large computing time to do the grouping. Therefore, this study uses spectral clustering method which allows to work with any object for reduces dimension. Spectral clustering method is a method based on spectral decomposition of the matrix which is represented in the form of a graph. After the data dimensions are reduced, then the data are partitioned. One of the famous partition method is Partitioning Around Medoids (PAM) which is minimize the objective function with exchanges all the non-medoid points into medoid point iteratively until converge. Objectivity of this research is to implement methods spectral clustering and partitioning algorithm PAM to obtain groups of 7457 genes with carcinoma based on the similarity value. The result in this study is two groups of genes with carcinoma.
Molecular evidence of Burkholderia pseudomallei genotypes based on geographical distribution.
Zulkefli, Noorfatin Jihan; Mariappan, Vanitha; Vellasamy, Kumutha Malar; Chong, Chun Wie; Thong, Kwai Lin; Ponnampalavanar, Sasheela; Vadivelu, Jamuna; Teh, Cindy Shuan Ju
2016-01-01
Background. Central intermediary metabolism (CIM) in bacteria is defined as a set of metabolic biochemical reactions within a cell, which is essential for the cell to survive in response to environmental perturbations. The genes associated with CIM are commonly found in both pathogenic and non-pathogenic strains. As these genes are involved in vital metabolic processes of bacteria, we explored the efficiency of the genes in genotypic characterization of Burkholderia pseudomallei isolates, compared with the established pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST) schemes. Methods. Nine previously sequenced B. pseudomallei isolates from Malaysia were characterized by PFGE, MLST and CIM genes. The isolates were later compared to the other 39 B. pseudomallei strains, retrieved from GenBank using both MLST and sequence analysis of CIM genes. UniFrac and hierachical clustering analyses were performed using the results generated by both MLST and sequence analysis of CIM genes. Results. Genetic relatedness of nine Malaysian B. pseudomallei isolates and the other 39 strains was investigated. The nine Malaysian isolates were subtyped into six PFGE profiles, four MLST profiles and five sequence types based on CIM genes alignment. All methods demonstrated the clonality of OB and CB as well as CMS and THE. However, PFGE showed less than 70% similarity between a pair of morphology variants, OS and OB. In contrast, OS was identical to the soil isolate, MARAN. To have a better understanding of the genetic diversity of B. pseudomallei worldwide, we further aligned the sequences of genes used in MLST and genes associated with CIM for the nine Malaysian isolates and 39 B. pseudomallei strains from NCBI database. Overall, based on the CIM genes, the strains were subtyped into 33 profiles where majority of the strains from Asian countries were clustered together. On the other hand, MLST resolved the isolates into 31 profiles which formed three clusters. Hierarchical clustering using UniFrac distance suggested that the isolates from Australia were genetically distinct from the Asian isolates. Nevertheless, statistical significant differences were detected between isolates from Malaysia, Thailand and Australia. Discussion. Overall, PFGE showed higher discriminative power in clustering the nine Malaysian B. pseudomallei isolates and indicated its suitability for localized epidemiological study. Compared to MLST, CIM genes showed higher resolution in distinguishing those non-related strains and better clustering of strains from different geographical regions. A closer genetic relatedness of Malaysian isolates with all Asian strains in comparison to Australian strains was observed. This finding was supported by UniFrac analysis which resulted in geographical segregation between Australia and the Asian countries.
Gemperlein, Katja; Zipf, Gregor; Bernauer, Hubert S; Müller, Rolf; Wenzel, Silke C
2016-01-01
Long-chain polyunsaturated fatty acids (LC-PUFAs) can be produced de novo via polyketide synthase-like enzymes known as PUFA synthases, which are encoded by pfa biosynthetic gene clusters originally discovered from marine microorganisms. Recently similar gene clusters were detected and characterized in terrestrial myxobacteria revealing several striking differences. As the identified myxobacterial producers are difficult to handle genetically and grow very slowly we aimed to establish heterologous expression platforms for myxobacterial PUFA synthases. Here we report the heterologous expression of the pfa gene cluster from Aetherobacter fasciculatus (SBSr002) in the phylogenetically distant model host bacteria Escherichia coli and Pseudomonas putida. The latter host turned out to be the more promising PUFA producer revealing higher production rates of n-6 docosapentaenoic acid (DPA) and docosahexaenoic acid (DHA). After several rounds of genetic engineering of expression plasmids combined with metabolic engineering of P. putida, DHA production yields were eventually increased more than threefold. Additionally, we applied synthetic biology approaches to redesign and construct artificial versions of the A. fasciculatus pfa gene cluster, which to the best of our knowledge represents the first example of a polyketide-like biosynthetic gene cluster modulated and synthesized for P. putida. Combination with the engineering efforts described above led to a further increase in LC-PUFA production yields. The established production platform based on synthetic DNA now sets the stage for flexible engineering of the complex PUFA synthase. Copyright © 2015 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
Genome-wide network of regulatory genes for construction of a chordate embryo.
Shoguchi, Eiichi; Hamaguchi, Makoto; Satoh, Nori
2008-04-15
Animal development is controlled by gene regulation networks that are composed of sequence-specific transcription factors (TF) and cell signaling molecules (ST). Although housekeeping genes have been reported to show clustering in the animal genomes, whether the genes comprising a given regulatory network are physically clustered on a chromosome is uncertain. We examined this question in the present study. Ascidians are the closest living relatives of vertebrates, and their tadpole-type larva represents the basic body plan of chordates. The Ciona intestinalis genome contains 390 core TF genes and 119 major ST genes. Previous gene disruption assays led to the formulation of a basic chordate embryonic blueprint, based on over 3000 genetic interactions among 79 zygotic regulatory genes. Here, we mapped the regulatory genes, including all 79 regulatory genes, on the 14 pairs of Ciona chromosomes by fluorescent in situ hybridization (FISH). Chromosomal localization of upstream and downstream regulatory genes demonstrates that the components of coherent developmental gene networks are evenly distributed over the 14 chromosomes. Thus, this study provides the first comprehensive evidence that the physical clustering of regulatory genes, or their target genes, is not relevant for the genome-wide control of gene expression during development.
Peterson, Leif E
2002-01-01
CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data
Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.
2003-01-01
Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292
Izquierdo, Javier A; Sizova, Maria V; Lynd, Lee R
2010-06-01
The enrichment from nature of novel microbial communities with high cellulolytic activity is useful in the identification of novel organisms and novel functions that enhance the fundamental understanding of microbial cellulose degradation. In this work we identify predominant organisms in three cellulolytic enrichment cultures with thermophilic compost as an inoculum. Community structure based on 16S rRNA gene clone libraries featured extensive representation of clostridia from cluster III, with minor representation of clostridial clusters I and XIV and a novel Lutispora species cluster. Our studies reveal different levels of 16S rRNA gene diversity, ranging from 3 to 18 operational taxonomic units (OTUs), as well as variability in community membership across the three enrichment cultures. By comparison, glycosyl hydrolase family 48 (GHF48) diversity analyses revealed a narrower breadth of novel clostridial genes associated with cultured and uncultured cellulose degraders. The novel GHF48 genes identified in this study were related to the novel clostridia Clostridium straminisolvens and Clostridium clariflavum, with one cluster sharing as little as 73% sequence similarity with the closest known relative. In all, 14 new GHF48 gene sequences were added to the known diversity of 35 genes from cultured species.
Kong, Liangliang; Jing, Hongmei; Kataoka, Takafumi; Buchwald, Carolyn; Liu, Hongbin
2013-01-01
Anaerobic ammonia oxidation (anammox) as an important nitrogen loss pathway has been reported in marine oxygen minimum zones (OMZs), but the community composition and spatial distribution of anammox bacteria in the eastern tropical North Pacific (ETNP) OMZ are poorly determined. In this study, anammox bacterial communities in the OMZ off Costa Rica (CRD-OMZ) were analyzed based on both hydrazine oxidoreductase (hzo) genes and their transcripts assigned to cluster 1 and 2. The anammox communities revealed by hzo genes and proteins in CRD-OMZ showed a low diversity. Gene quantification results showed that hzo gene abundances peaked in the upper OMZs, associated with the peaks of nitrite concentration. Nitrite and oxygen concentrations may therefore colimit the distribution of anammox bacteria in this area. Furthermore, transcriptional activity of anammox bacteria was confirmed by obtaining abundant hzo mRNA transcripts through qRT-PCR. A novel hzo cluster 2x clade was identified by the phylogenetic analysis and these novel sequences were abundant and widely distributed in this environment. Our study demonstrated that both cluster 1 and 2 anammox bacteria play an active role in the CRD-OMZ, and the cluster 1 abundance and transcriptional activity were higher than cluster 2 in both free-living and particle-attached fractions at both gene and transcriptional levels.
Global Identification of Genes Affecting Iron-Sulfur Cluster Biogenesis and Iron Homeostasis
Hidese, Ryota; Kurihara, Tatsuo; Esaki, Nobuyoshi
2014-01-01
Iron-sulfur (Fe-S) clusters are ubiquitous cofactors that are crucial for many physiological processes in all organisms. In Escherichia coli, assembly of Fe-S clusters depends on the activity of the iron-sulfur cluster (ISC) assembly and sulfur mobilization (SUF) apparatus. However, the underlying molecular mechanisms and the mechanisms that control Fe-S cluster biogenesis and iron homeostasis are still poorly defined. In this study, we performed a global screen to identify the factors affecting Fe-S cluster biogenesis and iron homeostasis using the Keio collection, which is a library of 3,815 single-gene E. coli knockout mutants. The approach was based on radiolabeling of the cells with [2-14C]dihydrouracil, which entirely depends on the activity of an Fe-S enzyme, dihydropyrimidine dehydrogenase. We identified 49 genes affecting Fe-S cluster biogenesis and/or iron homeostasis, including 23 genes important only under microaerobic/anaerobic conditions. This study defines key proteins associated with Fe-S cluster biogenesis and iron homeostasis, which will aid further understanding of the cellular mechanisms that coordinate the processes. In addition, we applied the [2-14C]dihydrouracil-labeling method to analyze the role of amino acid residues of an Fe-S cluster assembly scaffold (IscU) as a model of the Fe-S cluster assembly apparatus. The analysis showed that Cys37, Cys63, His105, and Cys106 are essential for the function of IscU in vivo, demonstrating the potential of the method to investigate in vivo function of proteins involved in Fe-S cluster assembly. PMID:24415728
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; Taylor, Ronald C.; Weisenhorn, Pamela; Olson, Robert D.; Stevens, Rick L.; Rocha, Miguel; Rocha, Isabel; Best, Aaron A.; DeJongh, Matthew; Tintle, Nathan L.; Parrello, Bruce; Overbeek, Ross; Henry, Christopher S.
2016-01-01
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain. PMID:27933038
A knowledge-driven approach to cluster validity assessment.
Bolshakova, Nadia; Azuaje, Francisco; Cunningham, Pádraig
2005-05-15
This paper presents an approach to assessing cluster validity based on similarity knowledge extracted from the Gene Ontology. The program is freely available for non-profit use on request from the authors.
Development of New Candidate Gene and EST-Based Molecular Markers for Gossypium Species
Buyyarapu, Ramesh; Kantety, Ramesh V.; Yu, John Z.; Saha, Sukumar; Sharma, Govind C.
2011-01-01
New source of molecular markers accelerate the efforts in improving cotton fiber traits and aid in developing high-density integrated genetic maps. We developed new markers based on candidate genes and G. arboreum EST sequences that were used for polymorphism detection followed by genetic and physical mapping. Nineteen gene-based markers were surveyed for polymorphism detection in 26 Gossypium species. Cluster analysis generated a phylogenetic tree with four major sub-clusters for 23 species while three species branched out individually. CAP method enhanced the rate of polymorphism of candidate gene-based markers between G. hirsutum and G. barbadense. Two hundred A-genome based SSR markers were designed after datamining of G. arboreum EST sequences (Mississippi Gossypium arboreum EST-SSR: MGAES). Over 70% of MGAES markers successfully produced amplicons while 65 of them demonstrated polymorphism between the parents of G. hirsutum and G. barbadense RIL population and formed 14 linkage groups. Chromosomal localization of both candidate gene-based and MGAES markers was assisted by euploid and hypoaneuploid CS-B analysis. Gene-based and MGAES markers were highly informative as they were designed from candidate genes and fiber transcriptome with a potential to be integrated into the existing cotton genetic and physical maps. PMID:22315588
Krzyżanowska, Dorota M.; Ossowicki, Adam; Rajewska, Magdalena; Maciąg, Tomasz; Jabłońska, Magdalena; Obuchowski, Michał; Heeb, Stephan; Jafra, Sylwia
2016-01-01
Dickeya solani and Pectobacterium carotovorum subsp. brasiliense are recently established species of bacterial plant pathogens causing black leg and soft rot of many vegetables and ornamental plants. Pseudomonas sp. strain P482 inhibits the growth of these pathogens, a desired trait considering the limited measures to combat these diseases. In this study, we determined the genetic background of the antibacterial activity of P482, and established the phylogenetic position of this strain. Pseudomonas sp. P482 was classified as Pseudomonas donghuensis. Genome mining revealed that the P482 genome does not contain genes determining the synthesis of known antimicrobials. However, the ClusterFinder algorithm, designed to detect atypical or novel classes of secondary metabolite gene clusters, predicted 18 such clusters in the genome. Screening of a Tn5 mutant library yielded an antimicrobial negative transposon mutant. The transposon insertion was located in a gene encoding an HpcH/HpaI aldolase/citrate lyase family protein. This gene is located in a hypothetical cluster predicted by the ClusterFinder, together with the downstream homologs of four nfs genes, that confer production of a non-fluorescent siderophore by P. donghuensis HYST. Site-directed inactivation of the HpcH/HpaI aldolase gene, the adjacent short chain dehydrogenase gene, as well as a homolog of an essential nfs cluster gene, all abolished the antimicrobial activity of the P482, suggesting their involvement in a common biosynthesis pathway. However, none of the mutants showed a decreased siderophore yield, neither was the antimicrobial activity of the wild type P482 compromised by high iron bioavailability. A genomic region comprising the nfs cluster and three upstream genes is involved in the antibacterial activity of P. donghuensis P482 against D. solani and P. carotovorum subsp. brasiliense. The genes studied are unique to the two known P. donghuensis strains. This study illustrates that mining of microbial genomes is a powerful approach for predictingthe presence of novel secondary-metabolite encoding genes especially when coupled with transposon mutagenesis. PMID:27303376
Clustering by soft-constraint affinity propagation: applications to gene-expression data.
Leone, Michele; Sumedha; Weigt, Martin
2007-10-15
Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.
Salem, Saeed; Ozcaglar, Cagri
2014-01-01
Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression. We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...
2016-11-24
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Derntl, Christian; Rassinger, Alice; Srebotnik, Ewald; Mach, Robert L.
2016-01-01
ABSTRACT The industrially used ascomycete Trichoderma reesei secretes a typical yellow pigment during cultivation, while other Trichoderma species do not. A comparative genomic analysis suggested that a putative secondary metabolism cluster, containing two polyketide-synthase encoding genes, is responsible for the yellow pigment synthesis. This cluster is conserved in a set of rather distantly related fungi, including Acremonium chrysogenum and Penicillium chrysogenum. In an attempt to silence the cluster in T. reesei, two genes of the cluster encoding transcription factors were individually deleted. For a complete genetic proof-of-function, the genes were reinserted into the genomes of the respective deletion strains. The deletion of the first transcription factor (termed yellow pigment regulator 1 [Ypr1]) resulted in the full abolishment of the yellow pigment formation and the expression of most genes of this cluster. A comparative high-pressure liquid chromatography (HPLC) analysis of supernatants of the ypr1 deletion and its parent strain suggested the presence of several yellow compounds in T. reesei that are all derived from the same cluster. A subsequent gas chromatography/mass spectrometry analysis strongly indicated the presence of sorbicillin in the major HPLC peak. The presence of the second transcription factor, termed yellow pigment regulator 2 (Ypr2), reduces the yellow pigment formation and the expression of most cluster genes, including the gene encoding the activator Ypr1. IMPORTANCE Trichoderma reesei is used for industry-scale production of carbohydrate-active enzymes. During growth, it secretes a typical yellow pigment. This is not favorable for industrial enzyme production because it makes the downstream process more complicated and thus increases operating costs. In this study, we demonstrate which regulators influence the synthesis of the yellow pigment. Based on these data, we also provide indication as to which genes are under the control of these regulators and are finally responsible for the biosynthesis of the yellow pigment. These genes are organized in a cluster that is also found in other industrially relevant fungi, such as the two antibiotic producers Penicillium chrysogenum and Acremonium chrysogenum. The targeted manipulation of a secondary metabolism cluster is an important option for any biotechnologically applied microorganism. PMID:27520818
Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki
2014-08-01
Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Horizontal transfer of a large and highly toxic secondary metabolic gene cluster between fungi.
Slot, Jason C; Rokas, Antonis
2011-01-25
Genes involved in intermediary and secondary metabolism in fungi are frequently physically linked or clustered. For example, in Aspergillus nidulans the entire pathway for the production of sterigmatocystin (ST), a highly toxic secondary metabolite and a precursor to the aflatoxins (AF), is located in a ∼54 kb, 23 gene cluster. We discovered that a complete ST gene cluster in Podospora anserina was horizontally transferred from Aspergillus. Phylogenetic analysis shows that most Podospora cluster genes are adjacent to or nested within Aspergillus cluster genes, although the two genera belong to different taxonomic classes. Furthermore, the Podospora cluster is highly conserved in content, sequence, and microsynteny with the Aspergillus ST/AF clusters and its intergenic regions contain 14 putative binding sites for AflR, the transcription factor required for activation of the ST/AF biosynthetic genes. Examination of ∼52,000 Podospora expressed sequence tags identified transcripts for 14 genes in the cluster, with several expressed at multiple life cycle stages. The presence of putative AflR-binding sites and the expression evidence for several cluster genes, coupled with the recent independent discovery of ST production in Podospora [1], suggest that this HGT event probably resulted in a functional cluster. Given the abundance of metabolic gene clusters in fungi, our finding that one of the largest known metabolic gene clusters moved intact between species suggests that such transfers might have significantly contributed to fungal metabolic diversity. PAPERFLICK: Copyright © 2011 Elsevier Ltd. All rights reserved.
Reimegård, Johan; Kundu, Snehangshu; Pendle, Ali; Irish, Vivian F.; Shaw, Peter
2017-01-01
Abstract Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation. PMID:28175342
Constrained clusters of gene expression profiles with pathological features.
Sese, Jun; Kurokawa, Yukinori; Monden, Morito; Kato, Kikuya; Morishita, Shinichi
2004-11-22
Gene expression profiles should be useful in distinguishing variations in disease, since they reflect accurately the status of cells. The primary clustering of gene expression reveals the genotypes that are responsible for the proximity of members within each cluster, while further clustering elucidates the pathological features of the individual members of each cluster. However, since the first clustering process and the second classification step, in which the features are associated with clusters, are performed independently, the initial set of clusters may omit genes that are associated with pathologically meaningful features. Therefore, it is important to devise a way of identifying gene expression clusters that are associated with pathological features. We present the novel technique of 'itemset constrained clustering' (IC-Clustering), which computes the optimal cluster that maximizes the interclass variance of gene expression between groups, which are divided according to the restriction that only divisions that can be expressed using common features are allowed. This constraint automatically labels each cluster with a set of pathological features which characterize that cluster. When applied to liver cancer datasets, IC-Clustering revealed informative gene expression clusters, which could be annotated with various pathological features, such as 'tumor' and 'man', or 'except tumor' and 'normal liver function'. In contrast, the k-means method overlooked these clusters.
Zhang, Xin; Wan, Jin-Xiang; Ke, Zun-Ping; Wang, Feng; Chai, Hai-Xia; Liu, Jia-Qiang
2017-07-01
Hepatocellular carcinoma is one of the most mortal and prevalent cancers with increasing incidence worldwide. Elucidating genetic driver genes for prognosis and palindromia of hepatocellular carcinoma helps managing clinical decisions for patients. In this study, the high-throughput RNA sequencing data on platform IlluminaHiSeq of hepatocellular carcinoma were downloaded from The Cancer Genome Atlas with 330 primary hepatocellular carcinoma patient samples. Stable key genes with differential expressions were identified with which Kaplan-Meier survival analysis was performed using Cox proportional hazards test in R language. Driver genes influencing the prognosis of this disease were determined using clustering analysis. Functional analysis of driver genes was performed by literature search and Gene Set Enrichment Analysis. Finally, the selected driver genes were verified using external dataset GSE40873. A total of 5781 stable key genes were identified, including 156 genes definitely related to prognoses of hepatocellular carcinoma. Based on the significant key genes, samples were grouped into five clusters which were further integrated into high- and low-risk classes based on clinical features. TMEM88, CCL14, and CLEC3B were selected as driver genes which clustered high-/low-risk patients successfully (generally, p = 0.0005124445). Finally, survival analysis of the high-/low-risk samples from external database illustrated significant difference with p value 0.0198. In conclusion, TMEM88, CCL14, and CLEC3B genes were stable and available in predicting the survival and palindromia time of hepatocellular carcinoma. These genes could function as potential prognostic genes contributing to improve patients' outcomes and survival.
Fernández-Bodega, Ángeles; Álvarez-Álvarez, Rubén; Liras, Paloma; Martín, Juan F
2017-08-01
Penicillium roqueforti produces several prenylated indole alkaloids, including roquefortine C and clavine alkaloids. The first step in the biosynthesis of roquefortine C is the prenylation of tryptophan-derived dipeptides by a dimethylallyltryptophan synthase, specific for roquefortine biosynthesis (roquefortine prenyltransferase). A second dimethylallyltryptophan synthase, DmaW2, different from the roquefortine prenyltransferase, has been studied in this article. Silencing the gene encoding this second dimethylallyltryptophan synthase, dmaW2, proved that inactivation of this gene does not prevent the production of roquefortine C, but suppresses the formation of other indole alkaloids. Mass spectrometry studies have identified these compounds as isofumigaclavine A, the pathway final product and prenylated intermediates. The silencing does not affect the production of mycophenolic acid and andrastin A. A bioinformatic study of the genome of P. roqueforti revealed that DmaW2 (renamed IfgA) is a prenyltransferase involved in isofumigaclavine A biosynthesis encoded by a gene located in a six genes cluster (cluster A). A second three genes cluster (cluster B) encodes the so-called yellow enzyme and enzymes for the late steps for the conversion of festuclavine to isofumigaclavine A. The yellow enzyme contains a tyrosine-181 at its active center, as occurs in Neosartorya fumigata, but in contrast to the Clavicipitaceae fungi. A complete isofumigaclavines A and B biosynthetic pathway is proposed based on the finding of these studies on the biosynthesis of clavine alkaloids.
Kong, Liangliang; Jing, Hongmei; Kataoka, Takafumi; Buchwald, Carolyn; Liu, Hongbin
2013-01-01
Anaerobic ammonia oxidation (anammox) as an important nitrogen loss pathway has been reported in marine oxygen minimum zones (OMZs), but the community composition and spatial distribution of anammox bacteria in the eastern tropical North Pacific (ETNP) OMZ are poorly determined. In this study, anammox bacterial communities in the OMZ off Costa Rica (CRD-OMZ) were analyzed based on both hydrazine oxidoreductase (hzo) genes and their transcripts assigned to cluster 1 and 2. The anammox communities revealed by hzo genes and proteins in CRD-OMZ showed a low diversity. Gene quantification results showed that hzo gene abundances peaked in the upper OMZs, associated with the peaks of nitrite concentration. Nitrite and oxygen concentrations may therefore colimit the distribution of anammox bacteria in this area. Furthermore, transcriptional activity of anammox bacteria was confirmed by obtaining abundant hzo mRNA transcripts through qRT-PCR. A novel hzo cluster 2x clade was identified by the phylogenetic analysis and these novel sequences were abundant and widely distributed in this environment. Our study demonstrated that both cluster 1 and 2 anammox bacteria play an active role in the CRD-OMZ, and the cluster 1 abundance and transcriptional activity were higher than cluster 2 in both free-living and particle-attached fractions at both gene and transcriptional levels. PMID:24205176
Improving cluster-based missing value estimation of DNA microarray data.
Brás, Lígia P; Menezes, José C
2007-06-01
We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.
Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver
2013-11-07
Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.
Chapter 7. Cloning and analysis of natural product pathways.
Gust, Bertolt
2009-01-01
The identification of gene clusters of natural products has lead to an enormous wealth of information about their biosynthesis and its regulation, and about self-resistance mechanisms. Well-established routine techniques are now available for the cloning and sequencing of gene clusters. The subsequent functional analysis of the complex biosynthetic machinery requires efficient genetic tools for manipulation. Until recently, techniques for the introduction of defined changes into Streptomyces chromosomes were very time-consuming. In particular, manipulation of large DNA fragments has been challenging due to the absence of suitable restriction sites for restriction- and ligation-based techniques. The homologous recombination approach called recombineering (referred to as Red/ET-mediated recombination in this chapter) has greatly facilitated targeted genetic modifications of complex biosynthetic pathways from actinomycetes by eliminating many of the time-consuming and labor-intensive steps. This chapter describes techniques for the cloning and identification of biosynthetic gene clusters, for the generation of gene replacements within such clusters, for the construction of integrative library clones and their expression in heterologous hosts, and for the assembly of entire biosynthetic gene clusters from the inserts of individual library clones. A systematic approach toward insertional mutation of a complete Streptomyces genome is shown by the use of an in vitro transposon mutagenesis procedure.
CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence
Nepal, Madhav P; Benson, Benjamin V
2015-01-01
Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the Ks-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future. PMID:25922568
CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence.
Nepal, Madhav P; Benson, Benjamin V
2015-01-01
Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the K s-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future.
Breakup of a homeobox cluster after genome duplication in teleosts
Mulley, John F.; Chiu, Chi-hua; Holland, Peter W. H.
2006-01-01
Several families of homeobox genes are arranged in genomic clusters in metazoan genomes, including the Hox, ParaHox, NK, Rhox, and Iroquois gene clusters. The selective pressures responsible for maintenance of these gene clusters are poorly understood. The ParaHox gene cluster is evolutionarily conserved between amphioxus and human but is fragmented in teleost fishes. We show that two basal ray-finned fish, Polypterus and Amia, each possess an intact ParaHox cluster; this implies that the selective pressure maintaining clustering was lost after whole-genome duplication in teleosts. Cluster breakup is because of gene loss, not transposition or inversion, and the total number of ParaHox genes is the same in teleosts, human, mouse, and frog. We propose that this homeobox gene cluster is held together in chordates by the existence of interdigitated control regions that could be separated after locus duplication in the teleost fish. PMID:16801555
Accurate prediction of secondary metabolite gene clusters in filamentous fungi.
Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H
2013-01-02
Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.
Zhang, Han; Rokas, Antonis; Slot, Jason C
2012-01-01
Dermatophyte fungi of the family Arthrodermataceae (Eurotiomycetes) colonize keratinized tissue, such as skin, frequently causing superficial mycoses in humans and other mammals, reptiles, and birds. Competition with native microflora likely underlies the propensity of these dermatophytes to produce a diversity of antibiotics and compounds for scavenging iron, which is extremely scarce, as well as the presence of an unusually large number of putative secondary metabolism gene clusters, most of which contain non-ribosomal peptide synthetases (NRPS), in their genomes. To better understand the historical origins and diversification of NRPS-containing gene clusters we examined the evolution of a variable locus (VL) that exists in one of three alternative conformations among the genomes of seven dermatophyte species. The first conformation of the VL (termed VLA) contains only 539 base pairs of sequence and lacks protein-coding genes, whereas the other two conformations (termed VLB and VLC) span 36 Kb and 27 Kb and contain 12 and 10 genes, respectively. Interestingly, both VLB and VLC appear to contain distinct secondary metabolism gene clusters; VLB contains a NRPS gene as well as four porphyrin metabolism genes never found to be physically linked in the genomes of 128 other fungal species, whereas VLC also contains a NRPS gene as well as several others typically found associated with secondary metabolism gene clusters. Phylogenetic evidence suggests that the VL locus was present in the ancestor of all seven species achieving its present distribution through subsequent differential losses or retentions of specific conformations. We propose that the existence of variable loci, similar to the one we studied, in fungal genomes could potentially explain the dramatic differences in secondary metabolic diversity between closely related species of filamentous fungi, and contribute to host adaptation and the generation of metabolic diversity.
A novel harmony search-K means hybrid algorithm for clustering gene expression data
Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.
Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
Charles, J. P.; Chihara, C.; Nejad, S.; Riddiford, L. M.
1997-01-01
A 36-kb genomic DNA segment of the Drosophila melanogaster genome containing 12 clustered cuticle genes has been mapped and partially sequenced. The cluster maps at 65A 5-6 on the left arm of the third chromosome, in agreement with the previously determined location of a putative cluster encompassing the genes for the third instar larval cuticle proteins LCP5, LCP6 and LCP8. This cluster is the largest cuticle gene cluster discovered to date and shows a number of surprising features that explain in part the genetic complexity of the LCP5, LCP6 and LCP8 loci. The genes encoding LCP5 and LCP8 are multiple copy genes and the presence of extensive similarity in their coding regions gives the first evidence for gene conversion in cuticle genes. In addition, five genes in the cluster are intronless. Four of these five have arisen by retroposition. The other genes in the cluster have a single intron located at an unusual location for insect cuticle genes. PMID:9383064
Exploring multicollinearity using a random matrix theory approach.
Feher, Kristen; Whelan, James; Müller, Samuel
2012-01-01
Clustering of gene expression data is often done with the latent aim of dimension reduction, by finding groups of genes that have a common response to potentially unknown stimuli. However, what is poorly understood to date is the behaviour of a low dimensional signal embedded in high dimensions. This paper introduces a multicollinear model which is based on random matrix theory results, and shows potential for the characterisation of a gene cluster's correlation matrix. This model projects a one dimensional signal into many dimensions and is based on the spiked covariance model, but rather characterises the behaviour of the corresponding correlation matrix. The eigenspectrum of the correlation matrix is empirically examined by simulation, under the addition of noise to the original signal. The simulation results are then used to propose a dimension estimation procedure of clusters from data. Moreover, the simulation results warn against considering pairwise correlations in isolation, as the model provides a mechanism whereby a pair of genes with `low' correlation may simply be due to the interaction of high dimension and noise. Instead, collective information about all the variables is given by the eigenspectrum.
Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing
2018-04-23
Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis and benefit the therapy improvement.
Enkh-Amgalan, Jigjiddorj; Kawasaki, Hiroko; Seki, Tatsuji
2006-01-01
A major nif cluster was detected in the strictly anaerobic, Gram-positive phototrophic bacterium Heliobacterium chlorum. The cluster consisted of 11 genes arranged within a 10 kb region in the order nifI1, nifI2, nifH, nifD, nifK, nifE, nifN, nifX, fdx, nifB and nifV. The phylogenetic position of Hbt. chlorum was the same in the NifH, NifD, NifK, NifE and NifN trees; Hbt. chlorum formed a cluster with Desulfitobacterium hafniense, the closest neighbour of heliobacteria based on the 16S rRNA phylogeny, and two species of the genus Geobacter belonging to the Deltaproteobacteria. Two nifI genes, known to occur in the nif clusters of methanogenic archaea between nifH and nifD, were found upstream of the nifH gene of Hbt. chlorum. The organization of the nif operon and the phylogeny of individual and concatenated gene products showed that the Hbt. chlorum nif operon carrying nifI genes upstream of the nifH gene was an intermediate between the nif operon with nifI downstream of nifH (group II and III of the nitrogenase classification) and the nif operon lacking nifI (group I). Thus, the phylogenetic position of Hbt. chlorum nitrogenase may reflect an evolutionary stage of a divergence of the two nitrogenase groups, with group I consisting of the aerobic diazotrophs and group II consisting of strictly anaerobic prokaryotes.
Nidheesh, N; Abdul Nazeer, K A; Ameer, P M
2017-12-01
Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ladero, Victor; Rattray, Fergal P.; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A.
2011-01-01
Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution. PMID:21803900
Zaag, Rim; Tamby, Jean Philippe; Guichard, Cécile; Tariq, Zakia; Rigaill, Guillem; Delannoy, Etienne; Renou, Jean-Pierre; Balzergue, Sandrine; Mary-Huard, Tristan; Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique
2015-01-01
CATdb (http://urgv.evry.inra.fr/CATdb) is a database providing a public access to a large collection of transcriptomic data, mainly for Arabidopsis but also for other plants. This resource has the rare advantage to contain several thousands of microarray experiments obtained with the same technical protocol and analyzed by the same statistical pipelines. In this paper, we present GEM2Net, a new module of CATdb that takes advantage of this homogeneous dataset to mine co-expression units and decipher Arabidopsis gene functions. GEM2Net explores 387 stress conditions organized into 18 biotic and abiotic stress categories. For each one, a model-based clustering is applied on expression differences to identify clusters of co-expressed genes. To characterize functions associated with these clusters, various resources are analyzed and integrated: Gene Ontology, subcellular localization of proteins, Hormone Families, Transcription Factor Families and a refined stress-related gene list associated to publications. Exploiting protein-protein interactions and transcription factors-targets interactions enables to display gene networks. GEM2Net presents the analysis of the 18 stress categories, in which 17,264 genes are involved and organized within 681 co-expression clusters. The meta-data analyses were stored and organized to compose a dynamic Web resource. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich
2017-01-01
Sequencing the actinomycin (acm) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN, encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S. antibioticus. PMID:28435299
Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich
2017-01-01
Sequencing the actinomycin ( acm ) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN , encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S. antibioticus .
Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.
2015-01-01
The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694
Uchoi, Ajit; Malik, Surendra Kumar; Choudhary, Ravish; Kumar, Susheel; Rohini, M R; Pal, Digvender; Ercisli, Sezai; Chaudhury, Rekha
2016-06-01
Phylogenetic relationships of Indian Citron (Citrus medica L.) with other important Citrus species have been inferred through sequence analyses of rbcL and matK gene region of chloroplast DNA. The study was based on 23 accessions of Citrus genotypes representing 15 taxa of Indian Citrus, collected from wild, semi-wild, and domesticated stocks. The phylogeny was inferred using the maximum parsimony (MP) and neighbor-joining (NJ) methods. Both MP and NJ trees separated all the 23 accessions of Citrus into five distinct clusters. The chloroplast DNA (cpDNA) analysis based on rbcL and matK sequence data carried out in Indian taxa of Citrus was useful in differentiating all the true species and species/varieties of probable hybrid origin in distinct clusters or groups. Sequence analysis based on rbcL and matK gene provided unambiguous identification and disposition of true species like C. maxima, C. medica, C. reticulata, and related hybrids/cultivars. The separation of C. maxima, C. medica, and C. reticulata in distinct clusters or sub-clusters supports their distinctiveness as the basic species of edible Citrus. However, the cpDNA sequence analysis of rbcL and matK gene could not find any clear cut differentiation between subgenera Citrus and Papeda as proposed in Swingle's system of classification.
Campa, Ana; Trabanco, Noemí; Ferreira, Juan José
2017-12-01
The correct identification of the anthracnose resistance systems present in the common bean cultivars AB136 and MDRK is important because both are included in the set of 12 differential cultivars proposed for use in classifying the races of the anthracnose causal agent, Colletrotrichum lindemuthianum. In this work, the responses against seven C. lindemuthianum races were analyzed in a recombinant inbred line population derived from the cross AB136 × MDRK. A genetic linkage map of 100 molecular markers distributed across the 11 bean chromosomes was developed in this population to locate the gene or genes conferring resistance against each race, based on linkage analyses and χ 2 tests of independence. The identified anthracnose resistance genes were organized in clusters. Two clusters were found in AB136: one located on linkage group Pv07, which corresponds to the anthracnose resistance cluster Co-5, and the other located at the end of linkage group Pv11, which corresponds to the Co-2 cluster. The presence of resistance genes at the Co-5 cluster in AB136 was validated through an allelism test conducted in the F 2 population TU × AB136. The presence of resistance genes at the Co-2 cluster in AB136 was validated through genetic dissection using the F 2:3 population ABM3 × MDRK, in which it was directly mapped to a genomic position between 46.01 and 47.77 Mb of chromosome Pv11. In MDRK, two independent clusters were identified: one located on linkage group Pv01, corresponding to the Co-1 cluster, and the second located on LG Pv04, corresponding to the Co-3 cluster. This report enhances the understanding of the race-specific Phaseolus vulgaris-C. lindemuthianum interactions and will be useful in breeding programs.
Detecting false positive sequence homology: a machine learning approach.
Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M
2016-02-24
Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.
The human RHOX gene cluster: target genes and functional analysis of gene variants in infertile men.
Borgmann, Jennifer; Tüttelmann, Frank; Dworniczak, Bernd; Röpke, Albrecht; Song, Hye-Won; Kliesch, Sabine; Wilkinson, Miles F; Laurentino, Sandra; Gromoll, Jörg
2016-11-15
The X-linked reproductive homeobox (RHOX) gene cluster encodes transcription factors preferentially expressed in reproductive tissues. This gene cluster has important roles in male fertility based on phenotypic defects of Rhox-mutant mice and the finding that aberrant RHOX promoter methylation is strongly associated with abnormal human sperm parameters. However, little is known about the molecular mechanism of RHOX function in humans. Using gene expression profiling, we identified genes regulated by members of the human RHOX gene cluster. Some genes were uniquely regulated by RHOXF1 or RHOXF2/2B, while others were regulated by both of these transcription factors. Several of these regulated genes encode proteins involved in processes relevant to spermatogenesis; e.g. stress protection and cell survival. One of the target genes of RHOXF2/2B is RHOXF1, suggesting cross-regulation to enhance transcriptional responses. The potential role of RHOX in human infertility was addressed by sequencing all RHOX exons in a group of 250 patients with severe oligozoospermia. This revealed two mutations in RHOXF1 (c.515G > A and c.522C > T) and four in RHOXF2/2B (-73C > G, c.202G > A, c.411C > T and c.679G > A), of which only one (c.202G > A) was found in a control group of men with normal sperm concentration. Functional analysis demonstrated that c.202G > A and c.679G > A significantly impaired the ability of RHOXF2/2B to regulate downstream genes. Molecular modelling suggested that these mutations alter RHOXF2/F2B protein conformation. By combining clinical data with in vitro functional analysis, we demonstrate how the X-linked RHOX gene cluster may function in normal human spermatogenesis and we provide evidence that it is impaired in human male fertility.
Pyeon, Hye-Rim; Nah, Hee-Ju; Kang, Seung-Hoon; Choi, Si-Sun; Kim, Eung-Soo
2017-05-31
Heterologous expression of biosynthetic gene clusters of natural microbial products has become an essential strategy for titer improvement and pathway engineering of various potentially-valuable natural products. A Streptomyces artificial chromosomal conjugation vector, pSBAC, was previously successfully applied for precise cloning and tandem integration of a large polyketide tautomycetin (TMC) biosynthetic gene cluster (Nah et al. in Microb Cell Fact 14(1):1, 2015), implying that this strategy could be employed to develop a custom overexpression scheme of natural product pathway clusters present in actinomycetes. To validate the pSBAC system as a generally-applicable heterologous overexpression system for a large-sized polyketide biosynthetic gene cluster in Streptomyces, another model polyketide compound, the pikromycin biosynthetic gene cluster, was preciously cloned and heterologously expressed using the pSBAC system. A unique HindIII restriction site was precisely inserted at one of the border regions of the pikromycin biosynthetic gene cluster within the chromosome of Streptomyces venezuelae, followed by site-specific recombination of pSBAC into the flanking region of the pikromycin gene cluster. Unlike the previous cloning process, one HindIII site integration step was skipped through pSBAC modification. pPik001, a pSBAC containing the pikromycin biosynthetic gene cluster, was directly introduced into two heterologous hosts, Streptomyces lividans and Streptomyces coelicolor, resulting in the production of 10-deoxymethynolide, a major pikromycin derivative. When two entire pikromycin biosynthetic gene clusters were tandemly introduced into the S. lividans chromosome, overproduction of 10-deoxymethynolide and the presence of pikromycin, which was previously not detected, were both confirmed. Moreover, comparative qRT-PCR results confirmed that the transcription of pikromycin biosynthetic genes was significantly upregulated in S. lividans containing tandem clusters of pikromycin biosynthetic gene clusters. The 60 kb pikromycin biosynthetic gene cluster was isolated in a single integration pSBAC vector. Introduction of the pikromycin biosynthetic gene cluster into the pikromycin non-producing strains resulted in higher pikromycin production. The utility of the pSBAC system as a precise cloning tool for large-sized biosynthetic gene clusters was verified through heterologous expression of the pikromycin biosynthetic gene cluster. Moreover, this pSBAC-driven heterologous expression strategy was confirmed to be an ideal approach for production of low and inconsistent natural products such as pikromycin in S. venezuelae, implying that this strategy could be employed for development of a custom overexpression scheme of natural product biosynthetic gene clusters in actinomycetes.
From hormones to secondary metabolism: the emergence of metabolic gene clusters in plants.
Chu, Hoi Yee; Wegel, Eva; Osbourn, Anne
2011-04-01
Gene clusters for the synthesis of secondary metabolites are a common feature of microbial genomes. Well-known examples include clusters for the synthesis of antibiotics in actinomycetes, and also for the synthesis of antibiotics and toxins in filamentous fungi. Until recently it was thought that genes for plant metabolic pathways were not clustered, and this is certainly true in many cases; however, five plant secondary metabolic gene clusters have now been discovered, all of them implicated in synthesis of defence compounds. An obvious assumption might be that these eukaryotic gene clusters have arisen by horizontal gene transfer from microbes, but there is compelling evidence to indicate that this is not the case. This raises intriguing questions about how widespread such clusters are, what the significance of clustering is, why genes for some metabolic pathways are clustered and those for others are not, and how these clusters form. In answering these questions we may hope to learn more about mechanisms of genome plasticity and adaptive evolution in plants. It is noteworthy that for the five plant secondary metabolic gene clusters reported so far, the enzymes for the first committed steps all appear to have been recruited directly or indirectly from primary metabolic pathways involved in hormone synthesis. This may or may not turn out to be a common feature of plant secondary metabolic gene clusters as new clusters emerge. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.
Unthan, Simon; Baumgart, Meike; Radek, Andreas; Herbst, Marius; Siebert, Daniel; Brühl, Natalie; Bartsch, Anna; Bott, Michael; Wiechert, Wolfgang; Marin, Kay; Hans, Stephan; Krämer, Reinhard; Seibold, Gerd; Frunzke, Julia; Kalinowski, Jörn; Rückert, Christian; Wendisch, Volker F; Noack, Stephan
2015-02-01
For synthetic biology applications, a robust structural basis is required, which can be constructed either from scratch or in a top-down approach starting from any existing organism. In this study, we initiated the top-down construction of a chassis organism from Corynebacterium glutamicum ATCC 13032, aiming for the relevant gene set to maintain its fast growth on defined medium. We evaluated each native gene for its essentiality considering expression levels, phylogenetic conservation, and knockout data. Based on this classification, we determined 41 gene clusters ranging from 3.7 to 49.7 kbp as target sites for deletion. 36 deletions were successful and 10 genome-reduced strains showed impaired growth rates, indicating that genes were hit, which are relevant to maintain biological fitness at wild-type level. In contrast, 26 deleted clusters were found to include exclusively irrelevant genes for growth on defined medium. A combinatory deletion of all irrelevant gene clusters would, in a prophage-free strain, decrease the size of the native genome by about 722 kbp (22%) to 2561 kbp. Finally, five combinatory deletions of irrelevant gene clusters were investigated. The study introduces the novel concept of relevant genes and demonstrates general strategies to construct a chassis suitable for biotechnological application. © 2014 The Authors. Biotechnology Journal published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim. This is an open access article under the terms of the Creative Commons Attribution-Non-Commercial-NoDerivs Licence, which permits use and distribution in any medium, provided the original work is properly cited, the use is non- commercial and no modifications or adaptations are made.
Rong, Junkang; Feltus, F. Alex; Waghmare, Vijay N.; Pierce, Gary J.; Chee, Peng W.; Draye, Xavier; Saranga, Yehoshua; Wright, Robert J.; Wilkins, Thea A.; May, O. Lloyd; Smith, C. Wayne; Gannaway, John R.; Wendel, Jonathan F.; Paterson, Andrew H.
2007-01-01
QTL mapping experiments yield heterogeneous results due to the use of different genotypes, environments, and sampling variation. Compilation of QTL mapping results yields a more complete picture of the genetic control of a trait and reveals patterns in organization of trait variation. A total of 432 QTL mapped in one diploid and 10 tetraploid interspecific cotton populations were aligned using a reference map and depicted in a CMap resource. Early demonstrations that genes from the non-fiber-producing diploid ancestor contribute to tetraploid lint fiber genetics gain further support from multiple populations and environments and advanced-generation studies detecting QTL of small phenotypic effect. Both tetraploid subgenomes contribute QTL at largely non-homeologous locations, suggesting divergent selection acting on many corresponding genes before and/or after polyploid formation. QTL correspondence across studies was only modest, suggesting that additional QTL for the target traits remain to be discovered. Crosses between closely-related genotypes differing by single-gene mutants yield profoundly different QTL landscapes, suggesting that fiber variation involves a complex network of interacting genes. Members of the lint fiber development network appear clustered, with cluster members showing heterogeneous phenotypic effects. Meta-analysis linked to synteny-based and expression-based information provides clues about specific genes and families involved in QTL networks. PMID:17565937
Rong, Junkang; Feltus, F Alex; Waghmare, Vijay N; Pierce, Gary J; Chee, Peng W; Draye, Xavier; Saranga, Yehoshua; Wright, Robert J; Wilkins, Thea A; May, O Lloyd; Smith, C Wayne; Gannaway, John R; Wendel, Jonathan F; Paterson, Andrew H
2007-08-01
QTL mapping experiments yield heterogeneous results due to the use of different genotypes, environments, and sampling variation. Compilation of QTL mapping results yields a more complete picture of the genetic control of a trait and reveals patterns in organization of trait variation. A total of 432 QTL mapped in one diploid and 10 tetraploid interspecific cotton populations were aligned using a reference map and depicted in a CMap resource. Early demonstrations that genes from the non-fiber-producing diploid ancestor contribute to tetraploid lint fiber genetics gain further support from multiple populations and environments and advanced-generation studies detecting QTL of small phenotypic effect. Both tetraploid subgenomes contribute QTL at largely non-homeologous locations, suggesting divergent selection acting on many corresponding genes before and/or after polyploid formation. QTL correspondence across studies was only modest, suggesting that additional QTL for the target traits remain to be discovered. Crosses between closely-related genotypes differing by single-gene mutants yield profoundly different QTL landscapes, suggesting that fiber variation involves a complex network of interacting genes. Members of the lint fiber development network appear clustered, with cluster members showing heterogeneous phenotypic effects. Meta-analysis linked to synteny-based and expression-based information provides clues about specific genes and families involved in QTL networks.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Mugford, Sam T.; Louveau, Thomas; Melton, Rachel; Qi, Xiaoquan; Bakht, Saleha; Hill, Lionel; Tsurushima, Tetsu; Honkanen, Suvi; Rosser, Susan J.; Lomonossoff, George P.; Osbourn, Anne
2013-01-01
Operon-like gene clusters are an emerging phenomenon in the field of plant natural products. The genes encoding some of the best-characterized plant secondary metabolite biosynthetic pathways are scattered across plant genomes. However, an increasing number of gene clusters encoding the synthesis of diverse natural products have recently been reported in plant genomes. These clusters have arisen through the neo-functionalization and relocation of existing genes within the genome, and not by horizontal gene transfer from microbes. The reasons for clustering are not yet clear, although this form of gene organization is likely to facilitate co-inheritance and co-regulation. Oats (Avena spp) synthesize antimicrobial triterpenoids (avenacins) that provide protection against disease. The synthesis of these compounds is encoded by a gene cluster. Here we show that a module of three adjacent genes within the wider biosynthetic gene cluster is required for avenacin acylation. Through the characterization of these genes and their encoded proteins we present a model of the subcellular organization of triterpenoid biosynthesis. PMID:23532069
Inferring time-varying network topologies from gene expression data.
Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas
2007-01-01
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.
Prokaryotic Gene Clusters: A Rich Toolbox for Synthetic Biology
Fischbach, Michael; Voigt, Christopher A.
2014-01-01
Bacteria construct elaborate nanostructures, obtain nutrients and energy from diverse sources, synthesize complex molecules, and implement signal processing to react to their environment. These complex phenotypes require the coordinated action of multiple genes, which are often encoded in a contiguous region of the genome, referred to as a gene cluster. Gene clusters sometimes contain all of the genes necessary and sufficient for a particular function. As an evolutionary mechanism, gene clusters facilitate the horizontal transfer of the complete function between species. Here, we review recent work on a number of clusters whose functions are relevant to biotechnology. Engineering these clusters has been hindered by their regulatory complexity, the need to balance the expression of many genes, and a lack of tools to design and manipulate DNA at this scale. Advances in synthetic biology will enable the large-scale bottom-up engineering of the clusters to optimize their functions, wake up cryptic clusters, or to transfer them between organisms. Understanding and manipulating gene clusters will move towards an era of genome engineering, where multiple functions can be “mixed-and-matched” to create a designer organism. PMID:21154668
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.
Schulz, Tizian; Stoye, Jens; Doerr, Daniel
2018-05-08
Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm
NASA Astrophysics Data System (ADS)
Frisca, Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A.; Marks, Jonathan A.; Haiser, Henry J.; Turnbaugh, Peter J.
2015-01-01
ABSTRACT Elucidation of the molecular mechanisms underlying the human gut microbiota’s effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. PMID:25873372
2014-01-01
Background Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression. Results We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways. PMID:25221624
Influence of putative exopolysaccharide genes on Pseudomonas putida KT2440 biofilm stability.
Nilsson, Martin; Chiang, Wen-Chi; Fazli, Mustafa; Gjermansen, Morten; Givskov, Michael; Tolker-Nielsen, Tim
2011-05-01
We report a study of the role of putative exopolysaccharide gene clusters in the formation and stability of Pseudomonas putida KT2440 biofilm. Two novel putative exopolysaccharide gene clusters, pea and peb, were identified, and evidence is provided that they encode products that stabilize P. putida KT2440 biofilm. The gene clusters alg and bcs, which code for proteins mediating alginate and cellulose biosynthesis, were found to play minor roles in P. putida KT2440 biofilm formation and stability under the conditions tested. A P. putida KT2440 derivative devoid of any identifiable exopolysaccharide genes was found to form biofilm with a structure similar to wild-type biofilm, but with a stability lower than that of wild-type biofilm. Based on our data, we suggest that the formation of structured P. putida KT2440 biofilm can occur in the absence of exopolysaccharides; however, exopolysaccharides play a role as structural stabilizers. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
Hox cluster disintegration with persistent anteroposterior order of expression in Oikopleura dioica.
Seo, Hee-Chan; Edvardsen, Rolf Brudvik; Maeland, Anne Dorthea; Bjordal, Marianne; Jensen, Marit Flo; Hansen, Anette; Flaat, Mette; Weissenbach, Jean; Lehrach, Hans; Wincker, Patrick; Reinhardt, Richard; Chourrout, Daniel
2004-09-02
Tunicate embryos and larvae have small cell numbers and simple anatomical features in comparison with other chordates, including vertebrates. Although they branch near the base of chordate phylogenetic trees, their degree of divergence from the common chordate ancestor remains difficult to evaluate. Here we show that the tunicate Oikopleura dioica has a complement of nine Hox genes in which all central genes are lacking but a full vertebrate-like set of posterior genes is present. In contrast to all bilaterians studied so far, Hox genes are not clustered in the Oikopleura genome. Their expression occurs mostly in the tail, with some tissue preference, and a strong partition of expression domains in the nerve cord, in the notochord and in the muscle. In each tissue of the tail, the anteroposterior order of Hox gene expression evokes spatial collinearity, with several alterations. We propose a relationship between the Hox cluster breakdown, the separation of Hox expression domains, and a transition to a determinative mode of development.
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Li, Li; Stoeckert, Christian J.; Roos, David S.
2003-01-01
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome. PMID:12952885
Mäurer, André P; Mehlitz, Adrian; Mollenkopf, Hans J; Meyer, Thomas F
2007-01-01
The obligate intracellular, gram-negative bacterium Chlamydophila pneumoniae (Cpn) has impact as a human pathogen. Little is known about changes in the Cpn transcriptome during its biphasic developmental cycle (the acute infection) and persistence. The latter stage has been linked to chronic diseases. To analyze Cpn CWL029 gene expression, we designed a pathogen-specific oligo microarray and optimized the extraction method for pathogen RNA. Throughout the acute infection, ratio expression profiles for each gene were generated using 48 h post infection as a reference. Based on these profiles, significantly expressed genes were separated into 12 expression clusters using self-organizing map clustering and manual sorting into the “early”, “mid”, “late”, and “tardy” cluster classes. The latter two were differentiated because the “tardy” class showed steadily increasing expression at the end of the cycle. The transcriptome of the Cpn elementary body (EB) and published EB proteomics data were compared to the cluster profile of the acute infection. We found an intriguing association between “late” genes and genes coding for EB proteins, whereas “tardy” genes were mainly associated with genes coding for EB mRNA. It has been published that iron depletion leads to Cpn persistence. We compared the gene expression profiles during iron depletion–mediated persistence with the expression clusters of the acute infection. This led to the finding that establishment of iron depletion–mediated persistence is more likely a mid-cycle arrest in development rather than a completely distinct gene expression pattern. Here, we describe the Cpn transcriptome during the acute infection, differentiating “late” genes, which correlate to EB proteins, and “tardy” genes, which lead to EB mRNA. Expression profiles during iron mediated–persistence led us to propose the hypothesis that the transcriptomic “clock” is arrested during acute mid-cycle. PMID:17590080
Identification of Common Differentially Expressed Genes in Urinary Bladder Cancer
Zaravinos, Apostolos; Lambrou, George I.; Boulalas, Ioannis; Delakas, Dimitris; Spandidos, Demetrios A.
2011-01-01
Background Current diagnosis and treatment of urinary bladder cancer (BC) has shown great progress with the utilization of microarrays. Purpose Our goal was to identify common differentially expressed (DE) genes among clinically relevant subclasses of BC using microarrays. Methodology/Principal Findings BC samples and controls, both experimental and publicly available datasets, were analyzed by whole genome microarrays. We grouped the samples according to their histology and defined the DE genes in each sample individually, as well as in each tumor group. A dual analysis strategy was followed. First, experimental samples were analyzed and conclusions were formulated; and second, experimental sets were combined with publicly available microarray datasets and were further analyzed in search of common DE genes. The experimental dataset identified 831 genes that were DE in all tumor samples, simultaneously. Moreover, 33 genes were up-regulated and 85 genes were down-regulated in all 10 BC samples compared to the 5 normal tissues, simultaneously. Hierarchical clustering partitioned tumor groups in accordance to their histology. K-means clustering of all genes and all samples, as well as clustering of tumor groups, presented 49 clusters. K-means clustering of common DE genes in all samples revealed 24 clusters. Genes manifested various differential patterns of expression, based on PCA. YY1 and NFκB were among the most common transcription factors that regulated the expression of the identified DE genes. Chromosome 1 contained 32 DE genes, followed by chromosomes 2 and 11, which contained 25 and 23 DE genes, respectively. Chromosome 21 had the least number of DE genes. GO analysis revealed the prevalence of transport and binding genes in the common down-regulated DE genes; the prevalence of RNA metabolism and processing genes in the up-regulated DE genes; as well as the prevalence of genes responsible for cell communication and signal transduction in the DE genes that were down-regulated in T1-Grade III tumors and up-regulated in T2/T3-Grade III tumors. Combination of samples from all microarray platforms revealed 17 common DE genes, (BMP4, CRYGD, DBH, GJB1, KRT83, MPZ, NHLH1, TACR3, ACTC1, MFAP4, SPARCL1, TAGLN, TPM2, CDC20, LHCGR, TM9SF1 and HCCS) 4 of which participate in numerous pathways. Conclusions/Significance The identification of the common DE genes among BC samples of different histology can provide further insight into the discovery of new putative markers. PMID:21483740
A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data.
Ray, Shubhra Sankar; Ganivada, Avatharam; Pal, Sankar K
2016-09-01
A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of β -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.
Kudo, Fumitaka; Matsuura, Yasunori; Hayashi, Takaaki; Fukushima, Masayuki; Eguchi, Tadashi
2016-07-01
Sordarin is a glycoside antibiotic with a unique tetracyclic diterpene aglycone structure called sordaricin. To understand its intriguing biosynthetic pathway that may include a Diels-Alder-type [4+2]cycloaddition, genome mining of the gene cluster from the draft genome sequence of the producer strain, Sordaria araneosa Cain ATCC 36386, was carried out. A contiguous 67 kb gene cluster consisting of 20 open reading frames encoding a putative diterpene cyclase, a glycosyltransferase, a type I polyketide synthase, and six cytochrome P450 monooxygenases were identified. In vitro enzymatic analysis of the putative diterpene cyclase SdnA showed that it catalyzes the transformation of geranylgeranyl diphosphate to cycloaraneosene, a known biosynthetic intermediate of sordarin. Furthermore, a putative glycosyltransferase SdnJ was found to catalyze the glycosylation of sordaricin in the presence of GDP-6-deoxy-d-altrose to give 4'-O-demethylsordarin. These results suggest that the identified sdn gene cluster is responsible for the biosynthesis of sordarin. Based on the isolated potential biosynthetic intermediates and bioinformatics analysis, a plausible biosynthetic pathway for sordarin is proposed.
Stevens, David Cole; Conway, Kyle R.; Pearce, Nelson; Villegas-Peñaranda, Luis Roberto; Garza, Anthony G.; Boddy, Christopher N.
2013-01-01
Background Heterologous expression of bacterial biosynthetic gene clusters is currently an indispensable tool for characterizing biosynthetic pathways. Development of an effective, general heterologous expression system that can be applied to bioprospecting from metagenomic DNA will enable the discovery of a wealth of new natural products. Methodology We have developed a new Escherichia coli-based heterologous expression system for polyketide biosynthetic gene clusters. We have demonstrated the over-expression of the alternative sigma factor σ54 directly and positively regulates heterologous expression of the oxytetracycline biosynthetic gene cluster in E. coli. Bioinformatics analysis indicates that σ54 promoters are present in nearly 70% of polyketide and non-ribosomal peptide biosynthetic pathways. Conclusions We have demonstrated a new mechanism for heterologous expression of the oxytetracycline polyketide biosynthetic pathway, where high-level pleiotropic sigma factors from the heterologous host directly and positively regulate transcription of the non-native biosynthetic gene cluster. Our bioinformatics analysis is consistent with the hypothesis that heterologous expression mediated by the alternative sigma factor σ54 may be a viable method for the production of additional polyketide products. PMID:23724102
Mallik, Saurav; Zhao, Zhongming
2017-12-28
For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.
Swaminathan, Sivakumar; Morrone, Dana; Wang, Qiang; Fulton, D. Bruce; Peters, Reuben J.
2009-01-01
Biosynthetic gene clusters are common in microbial organisms, but rare in plants, raising questions regarding the evolutionary forces that drive their assembly in multicellular eukaryotes. Here, we characterize the biochemical function of a rice (Oryza sativa) cytochrome P450 monooxygenase, CYP76M7, which seems to act in the production of antifungal phytocassanes and defines a second diterpenoid biosynthetic gene cluster in rice. This cluster is uniquely multifunctional, containing enzymatic genes involved in the production of two distinct sets of phytoalexins, the antifungal phytocassanes and antibacterial oryzalides/oryzadiones, with the corresponding genes being subject to distinct transcriptional regulation. The lack of uniform coregulation of the genes within this multifunctional cluster suggests that this was not a primary driving force in its assembly. However, the cluster is dedicated to specialized metabolism, as all genes in the cluster are involved in phytoalexin metabolism. We hypothesize that this dedication to specialized metabolism led to the assembly of the corresponding biosynthetic gene cluster. Consistent with this hypothesis, molecular phylogenetic comparison demonstrates that the two rice diterpenoid biosynthetic gene clusters have undergone independent elaboration to their present-day forms, indicating continued evolutionary pressure for coclustering of enzymatic genes encoding components of related biosynthetic pathways. PMID:19825834
Scheps, Karen G; Varela, Viviana
Different hemoglobin isoforms are expressed during the embryonic, fetal and postnatal stages. They are formed by combination of polypeptide chains synthesized from the α- and β-globin gene clusters. Based on the fact that the presence of high hemoglobin F levels is beneficial in both sickle cell disease and severe thalassemic syndromes, a revision of the regulation of the β-globin cluster expression is proposed, especially regarding the genes encoding the y-globin chains (HBG1 and HBG2). In this review we describe the current knowledge about transcription factors and epigenetic regulators involved in the switches of the β-globin cluster. It is expected that the consolidation of knowledge in this field will allow finding new therapeutic targets for the treatment of hemoglobinopathies.
Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species
Lind, Abigail L.; Wisecaver, Jennifer H.; Lameiras, Catarina; Wiemann, Philipp; Palmer, Jonathan M.; Keller, Nancy P.; Rodrigues, Fernando; Goldman, Gustavo H.
2017-01-01
Filamentous fungi produce a diverse array of secondary metabolites (SMs) critical for defense, virulence, and communication. The metabolic pathways that produce SMs are found in contiguous gene clusters in fungal genomes, an atypical arrangement for metabolic pathways in other eukaryotes. Comparative studies of filamentous fungal species have shown that SM gene clusters are often either highly divergent or uniquely present in one or a handful of species, hampering efforts to determine the genetic basis and evolutionary drivers of SM gene cluster divergence. Here, we examined SM variation in 66 cosmopolitan strains of a single species, the opportunistic human pathogen Aspergillus fumigatus. Investigation of genome-wide within-species variation revealed 5 general types of variation in SM gene clusters: nonfunctional gene polymorphisms; gene gain and loss polymorphisms; whole cluster gain and loss polymorphisms; allelic polymorphisms, in which different alleles corresponded to distinct, nonhomologous clusters; and location polymorphisms, in which a cluster was found to differ in its genomic location across strains. These polymorphisms affect the function of representative A. fumigatus SM gene clusters, such as those involved in the production of gliotoxin, fumigaclavine, and helvolic acid as well as the function of clusters with undefined products. In addition to enabling the identification of polymorphisms, the detection of which requires extensive genome-wide synteny conservation (e.g., mobile gene clusters and nonhomologous cluster alleles), our approach also implicated multiple underlying genetic drivers, including point mutations, recombination, and genomic deletion and insertion events as well as horizontal gene transfer from distant fungi. Finally, most of the variants that we uncover within A. fumigatus have been previously hypothesized to contribute to SM gene cluster diversity across entire fungal classes and phyla. We suggest that the drivers of genetic diversity operating within a fungal species shown here are sufficient to explain SM cluster macroevolutionary patterns. PMID:29149178
DNA methylation and differentiation: HOX genes in muscle cells
2013-01-01
Background Tight regulation of homeobox genes is essential for vertebrate development. In a study of genome-wide differential methylation, we recently found that homeobox genes, including those in the HOX gene clusters, were highly overrepresented among the genes with hypermethylation in the skeletal muscle lineage. Methylation was analyzed by reduced representation bisulfite sequencing (RRBS) of postnatal myoblasts, myotubes and adult skeletal muscle tissue and 30 types of non-muscle-cell cultures or tissues. Results In this study, we found that myogenic hypermethylation was present in specific subregions of all four HOX gene clusters and was associated with various chromatin epigenetic features. Although the 3′ half of the HOXD cluster was silenced and enriched in polycomb repression-associated H3 lysine 27 trimethylation in most examined cell types, including myoblasts and myotubes, myogenic samples were unusual in also displaying much DNA methylation in this region. In contrast, both HOXA and HOXC clusters displayed myogenic hypermethylation bordering a central region containing many genes preferentially expressed in myogenic progenitor cells and consisting largely of chromatin with modifications typical of promoters and enhancers in these cells. A particularly interesting example of myogenic hypermethylation was HOTAIR, a HOXC noncoding RNA gene, which can silence HOXD genes in trans via recruitment of polycomb proteins. In myogenic progenitor cells, the preferential expression of HOTAIR was associated with hypermethylation immediately downstream of the gene. Other HOX gene regions also displayed myogenic DNA hypermethylation despite being moderately expressed in myogenic cells. Analysis of representative myogenic hypermethylated sites for 5-hydroxymethylcytosine revealed little or none of this base, except for an intragenic site in HOXB5 which was specifically enriched in this base in skeletal muscle tissue, whereas myoblasts had predominantly 5-methylcytosine at the same CpG site. Conclusions Our results suggest that myogenic hypermethylation of HOX genes helps fine-tune HOX sense and antisense gene expression through effects on 5′ promoters, intragenic and intergenic enhancers and internal promoters. Myogenic hypermethylation might also affect the relative abundance of different RNA isoforms, facilitate transcription termination, help stop the spread of activation-associated chromatin domains and stabilize repressive chromatin structures. PMID:23916067
Grindberg, Rashel V.; Ishoey, Thomas; Brinza, Dumitru; Esquenazi, Eduardo; Coates, R. Cameron; Liu, Wei-ting; Gerwick, Lena; Dorrestein, Pieter C.; Pevzner, Pavel; Lasken, Roger; Gerwick, William H.
2011-01-01
Filamentous marine cyanobacteria are extraordinarily rich sources of structurally novel, biomedically relevant natural products. To understand their biosynthetic origins as well as produce increased supplies and analog molecules, access to the clustered biosynthetic genes that encode for the assembly enzymes is necessary. Complicating these efforts is the universal presence of heterotrophic bacteria in the cell wall and sheath material of cyanobacteria obtained from the environment and those grown in uni-cyanobacterial culture. Moreover, the high similarity in genetic elements across disparate secondary metabolite biosynthetic pathways renders imprecise current gene cluster targeting strategies and contributes sequence complexity resulting in partial genome coverage. Thus, it was necessary to use a dual-method approach of single-cell genomic sequencing based on multiple displacement amplification (MDA) and metagenomic library screening. Here, we report the identification of the putative apratoxin. A biosynthetic gene cluster, a potent cancer cell cytotoxin with promise for medicinal applications. The roughly 58 kb biosynthetic gene cluster is composed of 12 open reading frames and has a type I modular mixed polyketide synthase/nonribosomal peptide synthetase (PKS/NRPS) organization and features loading and off-loading domain architecture never previously described. Moreover, this work represents the first successful isolation of a complete biosynthetic gene cluster from Lyngbya bouillonii, a tropical marine cyanobacterium renowned for its production of diverse bioactive secondary metabolites. PMID:21533272
Hu, Valerie W.; Steinberg, Mara E.
2009-01-01
Heterogeneity in phenotypic presentation of ASD has been cited as one explanation for the difficulty in pinpointing specific genes involved in autism. Recent studies have attempted to reduce the “noise” in genetic and other biological data by reducing the phenotypic heterogeneity of the sample population. The current study employs multiple clustering algorithms on 123 item scores from the Autism Diagnostic Interview-Revised (ADI-R) diagnostic instrument of nearly 2000 autistic individuals to identify subgroups of autistic probands with clinically relevant behavioral phenotypes in order to isolate more homogeneous groups of subjects for gene expression analyses. Our combined cluster analyses suggest optimal division of the autistic probands into 4 phenotypic clusters based on similarity of symptom severity across the 123 selected item scores. One cluster is characterized by severe language deficits, while another exhibits milder symptoms across the domains. A third group possesses a higher frequency of savant skills while the fourth group exhibited intermediate severity across all domains. Grouping autistic individuals by multivariate cluster analysis of ADI-R scores reveals meaningful phenotypes of subgroups within the autistic spectrum which we show, in a related (accompanying) study, to be associated with distinct gene expression profiles. PMID:19455643
NASA Astrophysics Data System (ADS)
Guo, Jingyu; Tian, Dehua; McKinney, Brett A.; Hartman, John L.
2010-06-01
Interactions between genetic and/or environmental factors are ubiquitous, affecting the phenotypes of organisms in complex ways. Knowledge about such interactions is becoming rate-limiting for our understanding of human disease and other biological phenomena. Phenomics refers to the integrative analysis of how all genes contribute to phenotype variation, entailing genome and organism level information. A systems biology view of gene interactions is critical for phenomics. Unfortunately the problem is intractable in humans; however, it can be addressed in simpler genetic model systems. Our research group has focused on the concept of genetic buffering of phenotypic variation, in studies employing the single-cell eukaryotic organism, S. cerevisiae. We have developed a methodology, quantitative high throughput cellular phenotyping (Q-HTCP), for high-resolution measurements of gene-gene and gene-environment interactions on a genome-wide scale. Q-HTCP is being applied to the complete set of S. cerevisiae gene deletion strains, a unique resource for systematically mapping gene interactions. Genetic buffering is the idea that comprehensive and quantitative knowledge about how genes interact with respect to phenotypes will lead to an appreciation of how genes and pathways are functionally connected at a systems level to maintain homeostasis. However, extracting biologically useful information from Q-HTCP data is challenging, due to the multidimensional and nonlinear nature of gene interactions, together with a relative lack of prior biological information. Here we describe a new approach for mining quantitative genetic interaction data called recursive expectation-maximization clustering (REMc). We developed REMc to help discover phenomic modules, defined as sets of genes with similar patterns of interaction across a series of genetic or environmental perturbations. Such modules are reflective of buffering mechanisms, i.e., genes that play a related role in the maintenance of physiological homeostasis. To develop the method, 297 gene deletion strains were selected based on gene-drug interactions with hydroxyurea, an inhibitor of ribonucleotide reductase enzyme activity, which is critical for DNA synthesis. To partition the gene functions, these 297 deletion strains were challenged with growth inhibitory drugs known to target different genes and cellular pathways. Q-HTCP-derived growth curves were used to quantify all gene interactions, and the data were used to test the performance of REMc. Fundamental advantages of REMc include objective assessment of total number of clusters and assignment to each cluster a log-likelihood value, which can be considered an indicator of statistical quality of clusters. To assess the biological quality of clusters, we developed a method called gene ontology information divergence z-score (GOid_z). GOid_z summarizes total enrichment of GO attributes within individual clusters. Using these and other criteria, we compared the performance of REMc to hierarchical and K-means clustering. The main conclusion is that REMc provides distinct efficiencies for mining Q-HTCP data. It facilitates identification of phenomic modules, which contribute to buffering mechanisms that underlie cellular homeostasis and the regulation of phenotypic expression.
Hodgetts, Jennifer; Boonham, Neil; Mumford, Rick; Harrison, Nigel; Dickinson, Matthew
2008-08-01
Phytoplasma phylogenetics has focused primarily on sequences of the non-coding 16S rRNA gene and the 16S-23S rRNA intergenic spacer region (16-23S ISR), and primers that enable amplification of these regions from all phytoplasmas by PCR are well established. In this study, primers based on the secA gene have been developed into a semi-nested PCR assay that results in a sequence of the expected size (about 480 bp) from all 34 phytoplasmas examined, including strains representative of 12 16Sr groups. Phylogenetic analysis of secA gene sequences showed similar clustering of phytoplasmas when compared with clusters resolved by similar sequence analyses of a 16-23S ISR-23S rRNA gene contig or of the 16S rRNA gene alone. The main differences between trees were in the branch lengths, which were elongated in the 16-23S ISR-23S rRNA gene tree when compared with the 16S rRNA gene tree and elongated still further in the secA gene tree, despite this being a shorter sequence. The improved resolution in the secA gene-derived phylogenetic tree resulted in the 16SrII group splitting into two distinct clusters, while phytoplasmas associated with coconut lethal yellowing-type diseases split into three distinct groups, thereby supporting past proposals that they represent different candidate species within 'Candidatus Phytoplasma'. The ability to differentiate 16Sr groups and subgroups by virtual RFLP analysis of secA gene sequences suggests that this gene may provide an informative alternative molecular marker for pathogen identification and diagnosis of phytoplasma diseases.
An efficient method to identify differentially expressed genes in microarray experiments
Qin, Huaizhen; Feng, Tao; Harding, Scott A.; Tsai, Chung-Jui; Zhang, Shuanglin
2013-01-01
Motivation Microarray experiments typically analyze thousands to tens of thousands of genes from small numbers of biological replicates. The fact that genes are normally expressed in functionally relevant patterns suggests that gene-expression data can be stratified and clustered into relatively homogenous groups. Cluster-wise dimensionality reduction should make it feasible to improve screening power while minimizing information loss. Results We propose a powerful and computationally simple method for finding differentially expressed genes in small microarray experiments. The method incorporates a novel stratification-based tight clustering algorithm, principal component analysis and information pooling. Comprehensive simulations show that our method is substantially more powerful than the popular SAM and eBayes approaches. We applied the method to three real microarray datasets: one from a Populus nitrogen stress experiment with 3 biological replicates; and two from public microarray datasets of human cancers with 10 to 40 biological replicates. In all three analyses, our method proved more robust than the popular alternatives for identification of differentially expressed genes. Availability The C++ code to implement the proposed method is available upon request for academic use. PMID:18453554
Zhang, Zhi-Guo; Song, Chang-Heng; Zhang, Fang-Zhen; Chen, Yan-Jing; Xiang, Li-Hua; Xiao, Gary Guishan; Ju, Da-Hong
2016-06-01
Rhizoma Dioscoreae extract (RDE) exhibits a protective effect on alveolar bone loss in ovariectomized (OVX) rats. The aim of this study was to predict the pathways or targets that are regulated by RDE, by re‑assessing our previously reported data and conducting a protein‑protein interaction (PPI) network analysis. In total, 383 differentially expressed genes (≥3‑fold) between alveolar bone samples from the RDE and OVX group rats were identified, and a PPI network was constructed based on these genes. Furthermore, four molecular clusters (A‑D) in the PPI network with the smallest P‑values were detected by molecular complex detection (MCODE) algorithm. Using Database for Annotation, Visualization and Integrated Discovery (DAVID) and Ingenuity Pathway Analysis (IPA) tools, two molecular clusters (A and B) were enriched for biological process in Gene Ontology (GO). Only cluster A was associated with biological pathways in the IPA database. GO and pathway analysis results showed that cluster A, associated with cell cycle regulation, was the most important molecular cluster in the PPI network. In addition, cyclin‑dependent kinase 1 (CDK1) may be a key molecule achieving the cell‑cycle‑regulatory function of cluster A. From the PPI network analysis, it was predicted that delayed cell cycle progression in excessive alveolar bone remodeling via downregulation of CDK1 may be another mechanism underling the anti‑osteopenic effect of RDE on alveolar bone.
Nowrousian, Minou
2009-04-01
During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.
Clustered Xenopus keratin genes: A genomic, transcriptomic, and proteomic analysis.
Suzuki, Ken-Ichi T; Suzuki, Miyuki; Shigeta, Mitsuki; Fortriede, Joshua D; Takahashi, Shuji; Mawaribuchi, Shuuji; Yamamoto, Takashi; Taira, Masanori; Fukui, Akimasa
2017-06-15
Keratin genes belong to the intermediate filament superfamily and their expression is altered following morphological and physiological changes in vertebrate epithelial cells. Keratin genes are divided into two groups, type I and II, and are clustered on vertebrate genomes, including those of Xenopus species. Various keratin genes have been identified and characterized by their unique expression patterns throughout ontogeny in Xenopus laevis; however, compilation of previously reported and newly identified keratin genes in two Xenopus species is required for our further understanding of keratin gene evolution, not only in amphibians but also in all terrestrial vertebrates. In this study, 120 putative type I and II keratin genes in total were identified based on the genome data from two Xenopus species. We revealed that most of these genes are highly clustered on two homeologous chromosomes, XLA9_10 and XLA2 in X. laevis, and XTR10 and XTR2 in X. tropicalis, which are orthologous to those of human, showing conserved synteny among tetrapods. RNA-Seq data from various embryonic stages and adult tissues highlighted the unique expression profiles of orthologous and homeologous keratin genes in developmental stage- and tissue-specific manners. Moreover, we identified dozens of epidermal keratin proteins from the whole embryo, larval skin, tail, and adult skin using shotgun proteomics. In light of our results, we discuss the radiation, diversification, and unique expression of the clustered keratin genes, which are closely related to epidermal development and terrestrial adaptation during amphibian evolution, including Xenopus speciation. Copyright © 2016 Elsevier Inc. All rights reserved.
Grimplet, Jérôme; Tello, Javier; Laguna, Natalia; Ibáñez, Javier
2017-01-01
Grapevine cluster compactness has a clear impact on fruit quality and health status, as clusters with greater compactness are more susceptible to pests and diseases and ripen more asynchronously. Different parameters related to inflorescence and cluster architecture (length, width, branching, etc.), fruitfulness (number of berries, number of seeds) and berry size (length, width) contribute to the final level of compactness. From a collection of 501 clones of cultivar Garnacha Tinta, two compact and two loose clones with stable differences for cluster compactness-related traits were selected and phenotyped. Key organs and developmental stages were selected for sampling and transcriptomic analyses. Comparison of global gene expression patterns in flowers at the end of bloom allowed identification of potential gene networks with a role in determining the final berry number, berry size and ultimately cluster compactness. A large portion of the differentially expressed genes were found in networks related to cell division (carbohydrates uptake, cell wall metabolism, cell cycle, nucleic acids metabolism, cell division, DNA repair). Their greater expression level in flowers of compact clones indicated that the number of berries and the berry size at ripening appear related to the rate of cell replication in flowers during the early growth stages after pollination. In addition, fluctuations in auxin and gibberellin signaling and transport related gene expression support that they play a central role in fruit set and impact berry number and size. Other hormones, such as ethylene and jasmonate may differentially regulate indirect effects, such as defense mechanisms activation or polyphenols production. This is the first transcriptomic based analysis focused on the discovery of the underlying gene networks involved in grapevine traits of grapevine cluster compactness, berry number and berry size. PMID:28496449
Grimplet, Jérôme; Tello, Javier; Laguna, Natalia; Ibáñez, Javier
2017-01-01
Grapevine cluster compactness has a clear impact on fruit quality and health status, as clusters with greater compactness are more susceptible to pests and diseases and ripen more asynchronously. Different parameters related to inflorescence and cluster architecture (length, width, branching, etc.), fruitfulness (number of berries, number of seeds) and berry size (length, width) contribute to the final level of compactness. From a collection of 501 clones of cultivar Garnacha Tinta, two compact and two loose clones with stable differences for cluster compactness-related traits were selected and phenotyped. Key organs and developmental stages were selected for sampling and transcriptomic analyses. Comparison of global gene expression patterns in flowers at the end of bloom allowed identification of potential gene networks with a role in determining the final berry number, berry size and ultimately cluster compactness. A large portion of the differentially expressed genes were found in networks related to cell division (carbohydrates uptake, cell wall metabolism, cell cycle, nucleic acids metabolism, cell division, DNA repair). Their greater expression level in flowers of compact clones indicated that the number of berries and the berry size at ripening appear related to the rate of cell replication in flowers during the early growth stages after pollination. In addition, fluctuations in auxin and gibberellin signaling and transport related gene expression support that they play a central role in fruit set and impact berry number and size. Other hormones, such as ethylene and jasmonate may differentially regulate indirect effects, such as defense mechanisms activation or polyphenols production. This is the first transcriptomic based analysis focused on the discovery of the underlying gene networks involved in grapevine traits of grapevine cluster compactness, berry number and berry size.
Poole, William; Leinonen, Kalle; Shmulevich, Ilya
2017-01-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady
2017-02-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Analysis of multiplex gene expression maps obtained by voxelation.
An, Li; Xie, Hongbo; Chin, Mark H; Obradovic, Zoran; Smith, Desmond J; Megalooikonomou, Vasileios
2009-04-29
Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum. The experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists.
Reddy, G B Manjunatha; Singh, R; Singh, R P; Singh, K P; Gupta, P K; Mahadevan, Anita; Desai, Anita; Shankar, S K; Ramakrishnan, M A; Verma, Rishendra
2011-08-01
Rabies is endemic and an important zoonosis in India. There are very few reports available on molecular epidemiology of rabies virus of Indian origin. In this study to know the dynamics of rabies virus, a total of 41 rabies positive brain samples from dogs, cats, domestic animals, wildlife, and humans from 11 states were subjected to RT-PCR amplification of N gene between nucleotide N521-N1262 (742 bp) and P gene between nucleotide P239-P750 (512 bp). The N gene could be amplified from 30, while P gene from 41 samples, using specific sets of primers. The N gene-based phylogenetic analysis indicated that all Indian virus isolates are genetically closely related with a single cluster under arctic/arctic-like viruses. However, two distinct clusters were realized in P gene-based phylogeny viz., Rabies virus isolates of Punjab and Rabies virus isolates of remaining parts of India (other than Punjab). All the Indian rabies virus isolates were closely related to geography (>95% homology), but not to host species.
Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W
2018-02-14
Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.
Haarmann, Thomas; Machado, Caroline; Lübbe, Yvonne; Correia, Telmo; Schardl, Christopher L; Panaccione, Daniel G; Tudzynski, Paul
2005-06-01
The genomic region of Claviceps purpurea strain P1 containing the ergot alkaloid gene cluster [Tudzynski, P., Hölter, K., Correia, T., Arntz, C., Grammel, N., Keller, U., 1999. Evidence for an ergot alkaloid gene cluster in Claviceps purpurea. Mol. Gen. Genet. 261, 133-141] was explored by chromosome walking, and additional genes probably involved in the ergot alkaloid biosynthesis have been identified. The putative cluster sequence (extending over 68.5kb) contains 4 different nonribosomal peptide synthetase (NRPS) genes and several putative oxidases. Northern analysis showed that most of the genes were co-regulated (repressed by high phosphate), and identified probable flanking genes by lack of co-regulation. Comparison of the cluster sequences of strain P1, an ergotamine producer, with that of strain ECC93, an ergocristine producer, showed high conservation of most of the cluster genes, but significant variation in the NRPS modules, strongly suggesting that evolution of these chemical races of C. purpurea is determined by evolution of NRPS module specificity.
Patel, Vidushi S; Cooper, Steven J B; Deakin, Janine E; Fulton, Bob; Graves, Tina; Warren, Wesley C; Wilson, Richard K; Graves, Jennifer A M
2008-07-25
Vertebrate alpha (alpha)- and beta (beta)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the alpha- and beta-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil beta-globin gene (omega) in the marsupial alpha-cluster, however, suggested that duplication of the alpha-beta cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous alpha- and beta-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation. The platypus alpha-globin cluster (chromosome 21) contains embryonic and adult alpha- globin genes, a beta-like omega-globin gene, and the GBY globin gene with homology to cytoglobin, arranged as 5'-zeta-zeta'-alphaD-alpha3-alpha2-alpha1-omega-GBY-3'. The platypus beta-globin cluster (chromosome 2) contains single embryonic and adult globin genes arranged as 5'-epsilon-beta-3'. Surprisingly, all of these globin genes were expressed in some adult tissues. Comparison of flanking sequences revealed that all jawed vertebrate alpha-globin clusters are flanked by MPG-C16orf35 and LUC7L, whereas all bird and mammal beta-globin clusters are embedded in olfactory genes. Thus, the mammalian alpha- and beta-globin clusters are orthologous to the bird alpha- and beta-globin clusters respectively. We propose that alpha- and beta-globin clusters evolved from an ancient MPG-C16orf35-alpha-beta-GBY-LUC7L arrangement 410 million years ago. A copy of the original beta (represented by omega in marsupials and monotremes) was inserted into an array of olfactory genes before the amniote radiation (>315 million years ago), then duplicated and diverged to form orthologous clusters of beta-globin genes with different expression profiles in different lineages.
Lu, Hongsheng; Fujimura, Reiko; Sato, Yoshinori; Nanba, Kenji; Kamijo, Takashi; Ohta, Hiroyuki
2008-01-01
The role of microbes in the early development of ecosystems on new volcanic materials seems to be crucial to primary plant succession but is not well characterized. Here we analyzed the bacterial community colonizing 22-year-old volcanic deposits of the Miyake-jima Island (Japan) using culture-based and 16S rRNA gene clone library methods. The majority of 91 bacterial isolates were placed phylogenetically in two clusters (A and B) of the Betaproteobacteria. Cluster A (82% of isolates) was related to the genus Limnobacter and Cluster B (9%) was affiliated with the Herbaspirillum clade. The clone library analysis supported the predominance of Cluster B rather than Cluster A. Strain KP1-50 of Cluster B was able to grow on a mineral medium under an atmosphere of H(2), O(2), and CO(2) (85:5:10), and characterized by its large-subunit gene of ribulose 1,5-bisphosphate carboxylase/oxygenase (rbcL) and nitrogenase reductase gene (nifH). In contrast, strains of Cluster A did not grow chemolithoautotrophically with H(2), O(2), and CO(2) but increased their cell biomass with the addition of thiosulfate to the succinate medium, suggesting the use of thiosulfate as an energy source. From phenotypic characterization, it was suggested that the Cluster A and B strains were novel species in the genus Limnobacter and Herbaspirillum, respectively.
Chu, Jinyu; Zhang, Jinping; Zhou, Xiaohong; Liu, Biao; Li, Yimin
2015-09-01
Quantitative polymerase chain reaction (qPCR) assays and 16S rRNA gene clone libraries were used to document the abundance, diversity and community structure of anaerobic ammonia-oxidising (anammox) bacteria in the rhizosphere and non-rhizosphere sediments of three emergent macrophyte species (Iris pseudacorus, Thalia dealbata and Typha orientalis). The qPCR results confirmed the existence of anammox bacteria (AMX) with observed log number of gene copies per dry gram sediment ranging from 5.00 to 6.78. AMX was more abundant in T. orientalis-associated sediments than in the other two plant species. The I. pseudacorus- and T. orientalis-associated sediments had higher Shannon diversity values, indicating higher AMX diversity in these sediments. Based on the 16S rRNA gene, Candidatus 'Brocadia', Candidatus 'Kuenenia', Candidatus 'Jettenia' and new clusters were observed with the predominant Candidatus 'Kuenenia' cluster. The I. pseudacorus-associated sediments contained all the sequences of the C. 'Jettenia' cluster. Sequences obtained from T. orientalis-associated sediments contributed more than 90 % sequences in the new cluster, whereas none was found from I. pseudacorus. The new cluster was distantly related to known sequences; thus, this cluster was grouped outside the known clusters, indicating that the new cluster may be a new Planctomycetales genus. Further studies should be undertaken to confirm this finding.
Choque, Elodie; Klopp, Christophe; Valiere, Sophie; Raynal, José; Mathieu, Florence
2018-03-15
Black Aspergilli represent one of the most important fungal resources of primary and secondary metabolites for biotechnological industry. Having several black Aspergilli sequenced genomes should allow targeting the production of certain metabolites with bioactive properties. In this study, we report the draft genome of a black Aspergilli, A. tubingensis G131, isolated from a French Mediterranean vineyard. This 35 Mb genome includes 10,994 predicted genes. A genomic-based discovery identifies 80 secondary metabolites biosynthetic gene clusters. Genomic sequences of these clusters were blasted on 3 chosen black Aspergilli genomes: A. tubingensis CBS 134.48, A. niger CBS 513.88 and A. kawachii IFO 4308. This comparison highlights different levels of clusters conservation between the four strains. It also allows identifying seven unique clusters in A. tubingensis G131. Moreover, the putative secondary metabolites clusters for asperazine and naphtho-gamma-pyrones production were proposed based on this genomic analysis. Key biosynthetic genes required for the production of 2 mycotoxins, ochratoxin A and fumonisin, are absent from this draft genome. Even if intergenic sequences of these mycotoxins biosynthetic pathways are present, this could not lead to the production of those mycotoxins by A. tubingensis G131. Functional and bioinformatics analyses of A. tubingensis G131 genome highlight its potential for metabolites production in particular for TAN-1612, asperazine and naphtho-gamma-pyrones presenting antioxidant, anticancer or antibiotic properties.
Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila
2010-07-16
Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering
Shi, Jiejun; Qin, Li-Xuan
2014-01-01
We report a new R package implementing the clustering of regression models (CORM) method for clustering genes using gene expression data and provide data examples illustrating each clustering function in the package. The CORM package is freely available at CRAN from http://cran.r-project.org. PMID:25452684
Clustering approaches to identifying gene expression patterns from DNA microarray data.
Do, Jin Hwan; Choi, Dong-Kug
2008-04-30
The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.
Bacillus sp. CDB3 isolated from cattle dip-sites possesses two ars gene clusters.
Bhat, Somanath; Luo, Xi; Xu, Zhiqiang; Liu, Lixia; Zhang, Ren
2011-01-01
Contamination of soil and water by arsenic is a global problem. In Australia, the dipping of cattle in arsenic-containing solution to control cattle ticks in last centenary has left many sites heavily contaminated with arsenic and other toxicants. We had previously isolated five soil bacterial strains (CDB1-5) highly resistant to arsenic. To understand the resistance mechanism, molecular studies have been carried out. Two chromosome-encoded arsenic resistance (ars) gene clusters have been cloned from CDB3 (Bacillus sp.). They both function in Escherichia coli and cluster 1 exerts a much higher resistance to the toxic metalloid. Cluster 2 is smaller possessing four open reading frames (ORFs) arsRorf2BC, similar to that identified in Bacillus subtilis Skin element. Among the eight ORFs in cluster 1 five are analogs of common ars genes found in other bacteria, however, organized in a unique order arsRBCDA instead of arsRDABC. Three other putative genes are located directly downstream and designated as arsTIP based on the homologies of their theoretical translation sequences respectively to thioredoxin reductases, iron-sulphur cluster proteins and protein phosphatases. The latter two are novel of any known ars operons. The arsD gene from Bacillus species was cloned for the first time and the predict protein differs from the well studied E. coli ArsD by lacking two pairs of C-terminal cysteine residues. Its functional involvement in arsenic resistance has been confirmed by a deletion experiment. There exists also an inverted repeat in the intergenic region between arsC and arsD implying some unknown transcription regulation.
Conditions for the Evolution of Gene Clusters in Bacterial Genomes
Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.
2010-01-01
Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992
Wang, Jiang; Yu, Yi; Tang, Kexuan; Liu, Wen; He, Xinyi; Huang, Xi; Deng, Zixin
2010-01-01
Thiopeptide antibiotics are an important class of natural products resulting from posttranslational modifications of ribosomally synthesized peptides. Cyclothiazomycin is a typical thiopeptide antibiotic that has a unique bridged macrocyclic structure derived from an 18-amino-acid structural peptide. Here we reported cloning, sequencing, and heterologous expression of the cyclothiazomycin biosynthetic gene cluster from Streptomyces hygroscopicus 10-22. Remarkably, successful heterologous expression of a 22.7-kb gene cluster in Streptomyces lividans 1326 suggested that there is a minimum set of 15 open reading frames that includes all of the functional genes required for cyclothiazomycin production. Six genes of these genes, cltBCDEFG flanking the structural gene cltA, were predicted to encode the enzymes required for the main framework of cyclothiazomycin, and two enzymes encoded by a putative operon, cltMN, were hypothesized to participate in the tailoring step to generate the tertiary thioether, leading to the final cyclization of the bridged macrocyclic structure. This rigorous bioinformatics analysis based on heterologous expression of cyclothiazomycin resulted in an ideal biosynthetic model for us to understand the biosynthesis of thiopeptides. PMID:20154110
USDA-ARS?s Scientific Manuscript database
Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...
Yamaguchi-Kabata, Yumi; Tsunoda, Tatsuhiko; Kumasaka, Natsuhiko; Takahashi, Atsushi; Hosono, Naoya; Kubo, Michiaki; Nakamura, Yusuke; Kamatani, Naoyuki
2012-05-01
Although the Japanese population has a rather low genetic diversity, we recently confirmed the presence of two main clusters (the Hondo and Ryukyu clusters) through principal component analysis of genome-wide single-nucleotide polymorphism (SNP) genotypes. Understanding the genetic differences between the two main clusters requires further genome-wide analyses based on a dense SNP set and comparison of haplotype frequencies. In the present study, we determined haplotypes for the Hondo cluster of the Japanese population by detecting SNP homozygotes with 388,591 autosomal SNPs from 18,379 individuals and estimated the haplotype frequencies. Haplotypes for the Ryukyu cluster were inferred by a statistical approach using the genotype data from 504 individuals. We then compared the haplotype frequencies between the Hondo and Ryukyu clusters. In most genomic regions, the haplotype frequencies in the Hondo and Ryukyu clusters were very similar. However, in addition to the human leukocyte antigen region on chromosome 6, other genomic regions (chromosomes 3, 4, 5, 7, 10 and 12) showed dissimilarities in haplotype frequency. These regions were enriched for genes involved in the immune system, cell-cell adhesion and the intracellular signaling cascade. These differentiated genomic regions between the Hondo and Ryukyu clusters are of interest because they (1) should be examined carefully in association studies and (2) likely contain genes responsible for morphological or physiological differences between the two groups.
Zhang, Han; Rokas, Antonis; Slot, Jason C.
2012-01-01
Background Dermatophyte fungi of the family Arthrodermataceae (Eurotiomycetes) colonize keratinized tissue, such as skin, frequently causing superficial mycoses in humans and other mammals, reptiles, and birds. Competition with native microflora likely underlies the propensity of these dermatophytes to produce a diversity of antibiotics and compounds for scavenging iron, which is extremely scarce, as well as the presence of an unusually large number of putative secondary metabolism gene clusters, most of which contain non-ribosomal peptide synthetases (NRPS), in their genomes. To better understand the historical origins and diversification of NRPS-containing gene clusters we examined the evolution of a variable locus (VL) that exists in one of three alternative conformations among the genomes of seven dermatophyte species. Results The first conformation of the VL (termed VLA) contains only 539 base pairs of sequence and lacks protein-coding genes, whereas the other two conformations (termed VLB and VLC) span 36 Kb and 27 Kb and contain 12 and 10 genes, respectively. Interestingly, both VLB and VLC appear to contain distinct secondary metabolism gene clusters; VLB contains a NRPS gene as well as four porphyrin metabolism genes never found to be physically linked in the genomes of 128 other fungal species, whereas VLC also contains a NRPS gene as well as several others typically found associated with secondary metabolism gene clusters. Phylogenetic evidence suggests that the VL locus was present in the ancestor of all seven species achieving its present distribution through subsequent differential losses or retentions of specific conformations. Conclusions We propose that the existence of variable loci, similar to the one we studied, in fungal genomes could potentially explain the dramatic differences in secondary metabolic diversity between closely related species of filamentous fungi, and contribute to host adaptation and the generation of metabolic diversity. PMID:22860027
Nykyri, Johanna; Mattinen, Laura; Niemi, Outi; Adhikari, Satish; Kõiv, Viia; Somervuo, Panu; Fang, Xin; Auvinen, Petri; Mäe, Andres; Palva, E. Tapio; Pirhonen, Minna
2013-01-01
In this study, we characterized a putative Flp/Tad pilus-encoding gene cluster, and we examined its regulation at the transcriptional level and its role in the virulence of potato pathogenic enterobacteria of the genus Pectobacterium. The Flp/Tad pilus-encoding gene clusters in Pectobacterium atrosepticum, Pectobacterium wasabiae and Pectobacterium aroidearum were compared to previously characterized flp/tad gene clusters, including that of the well-studied Flp/Tad pilus model organism Aggregatibacter actinomycetemcomitans, in which this pilus is a major virulence determinant. Comparative analyses revealed substantial protein sequence similarity and open reading frame synteny between the previously characterized flp/tad gene clusters and the cluster in Pectobacterium, suggesting that the predicted flp/tad gene cluster in Pectobacterium encodes a Flp/Tad pilus-like structure. We detected genes for a novel two-component system adjacent to the flp/tad gene cluster in Pectobacterium, and mutant analysis demonstrated that this system has a positive effect on the transcription of selected Flp/Tad pilus biogenesis genes, suggesting that this response regulator regulate the flp/tad gene cluster. Mutagenesis of either the predicted regulator gene or selected Flp/Tad pilus biogenesis genes had a significant impact on the maceration ability of the bacterial strains in potato tubers, indicating that the Flp/Tad pilus-encoding gene cluster represents a novel virulence determinant in Pectobacterium. Soft-rot enterobacteria in the genera Pectobacterium and Dickeya are of great agricultural importance, and an investigation of the virulence of these pathogens could facilitate improvements in agricultural practices, thus benefiting farmers, the potato industry and consumers. PMID:24040039
Nykyri, Johanna; Mattinen, Laura; Niemi, Outi; Adhikari, Satish; Kõiv, Viia; Somervuo, Panu; Fang, Xin; Auvinen, Petri; Mäe, Andres; Palva, E Tapio; Pirhonen, Minna
2013-01-01
In this study, we characterized a putative Flp/Tad pilus-encoding gene cluster, and we examined its regulation at the transcriptional level and its role in the virulence of potato pathogenic enterobacteria of the genus Pectobacterium. The Flp/Tad pilus-encoding gene clusters in Pectobacterium atrosepticum, Pectobacterium wasabiae and Pectobacterium aroidearum were compared to previously characterized flp/tad gene clusters, including that of the well-studied Flp/Tad pilus model organism Aggregatibacter actinomycetemcomitans, in which this pilus is a major virulence determinant. Comparative analyses revealed substantial protein sequence similarity and open reading frame synteny between the previously characterized flp/tad gene clusters and the cluster in Pectobacterium, suggesting that the predicted flp/tad gene cluster in Pectobacterium encodes a Flp/Tad pilus-like structure. We detected genes for a novel two-component system adjacent to the flp/tad gene cluster in Pectobacterium, and mutant analysis demonstrated that this system has a positive effect on the transcription of selected Flp/Tad pilus biogenesis genes, suggesting that this response regulator regulate the flp/tad gene cluster. Mutagenesis of either the predicted regulator gene or selected Flp/Tad pilus biogenesis genes had a significant impact on the maceration ability of the bacterial strains in potato tubers, indicating that the Flp/Tad pilus-encoding gene cluster represents a novel virulence determinant in Pectobacterium. Soft-rot enterobacteria in the genera Pectobacterium and Dickeya are of great agricultural importance, and an investigation of the virulence of these pathogens could facilitate improvements in agricultural practices, thus benefiting farmers, the potato industry and consumers.
Arashida, Ryo; Kakizawa, Shigeyuki; Hoshi, Ayaka; Ishii, Yoshiko; Jung, Hee-Young; Kagiwada, Satoshi; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou
2008-04-01
Phytoplasmas are phloem-limited plant pathogens that are transmitted by insect vectors and are associated with diseases in hundreds of plant species. Despite their small sizes, phytoplasma genomes have repeat-rich sequences, which are due to several genes that are encoded as multiple copies. These multiple genes exist in a gene cluster, the potential mobile unit (PMU). PMUs are present at several distinct regions in the phytoplasma genome. The multicopy genes encoded by PMUs (herein named mobile unit genes [MUGs]) and similar genes elsewhere in the genome (herein named fundamental genes [FUGs]) are likely to have the same function based on their annotations. In this manuscript we show evidence that MUGs and FUGs do not cluster together within the same clade. Each MUG is in a cluster with a short branch length, suggesting that MUGs are recently diverged paralogs, whereas the origin of FUGs is different from that of MUGs. We also compared the genome structures around the lplA gene in two derivative lines of the 'Candidatus Phytoplasma asteris' OY strain, the severe-symptom line W (OY-W) and the mild-symptom line M (OY-M). The gene organizations of the nucleotide sequences upstream of the lplA genes of OY-W and OY-M were dramatically different. The tra5 insertion sequence, an element of PMUs, was found only in this region in OY-W. These results suggest that transposition of entire PMUs and PMU sections has occurred frequently in the OY phytoplasma genome. The difference in the pathogenicities of OY-W and OY-M might be caused by the duplication and transposition of PMUs, followed by genome rearrangement.
Boldogköi, Zsolt
2012-01-01
The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too. PMID:22783276
Boldogköi, Zsolt
2012-01-01
The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.
NASA Astrophysics Data System (ADS)
Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia
2002-06-01
We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Brodsky, Leonid; Leontovich, Andrei; Shtutman, Michael; Feinstein, Elena
2004-01-01
Mathematical methods of analysis of microarray hybridizations deal with gene expression profiles as elementary units. However, some of these profiles do not reflect a biologically relevant transcriptional response, but rather stem from technical artifacts. Here, we describe two technically independent but rationally interconnected methods for identification of such artifactual profiles. Our diagnostics are based on detection of deviations from uniformity, which is assumed as the main underlying principle of microarray design. Method 1 is based on detection of non-uniformity of microarray distribution of printed genes that are clustered based on the similarity of their expression profiles. Method 2 is based on evaluation of the presence of gene-specific microarray spots within the slides’ areas characterized by an abnormal concentration of low/high differential expression values, which we define as ‘patterns of differentials’. Applying two novel algorithms, for nested clustering (method 1) and for pattern detection (method 2), we can make a dual estimation of the profile’s quality for almost every printed gene. Genes with artifactual profiles detected by method 1 may then be removed from further analysis. Suspicious differential expression values detected by method 2 may be either removed or weighted according to the probabilities of patterns that cover them, thus diminishing their input in any further data analysis. PMID:14999086
Li, Yaqian; Du, Xilin; Lu, Zhi John; Wu, Daqiang; Zhao, Yilei; Ren, Bin; Huang, Jiaofang; Huang, Xianqing; Xu, Yuhong; Xu, Yuquan
2011-01-01
Background Phenazines are important compounds produced by pseudomonads and other bacteria. Two phz gene clusters called phzA1-G1 and phzA2-G2, respectively, were found in the genome of Pseudomonas sp. M18, an effective biocontrol agent, which is highly homologous to the opportunistic human pathogen P. aeruginosa PAO1, however little is known about the correlation between the expressions of two phz gene clusters. Methodology/Principal Findings Two chromosomal insertion inactivated mutants for the two gene clusters were constructed respectively and the correlation between the expressions of two phz gene clusters was investigated in strain M18. Phenazine-1-carboxylic acid (PCA) molecules produced from phzA2-G2 gene cluster are able to auto-regulate expression itself and activate the expression of phzA1-G1 gene cluster in a circulated amplification pattern. However, the post-transcriptional expression of phzA1-G1 transcript was blocked principally through 5′-untranslated region (UTR). In contrast, the phzA2-G2 gene cluster was transcribed to a lesser extent and translated efficiently and was negatively regulated by the GacA signal transduction pathway, mainly at a post-transcriptional level. Conclusions/Significance A single molecule, PCA, produced in different quantities by the two phz gene clusters acted as the functional mediator and the two phz gene clusters developed a specific regulatory mechanism which acts through 5′-UTR to transfer a single, but complex bacterial signaling event in Pseudomonas sp. strain M18. PMID:21559370
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhai, Ying; Bai, Silei; Liu, Jingjing
Dithiolopyrrolone group antibiotics characterized by an electronically unique dithiolopyrrolone heterobicyclic core are known for their antibacterial, antifungal, insecticidal and antitumor activities. Recently the biosynthetic gene clusters for two dithiolopyrrolone compounds, holomycin and thiomarinol, have been identified respectively in different bacterial species. Here, we report a novel dithiolopyrrolone biosynthetic gene cluster (aut) isolated from Streptomyces thioluteus DSM 40027 which produces two pyrrothine derivatives, aureothricin and thiolutin. By comparison with other characterized dithiolopyrrolone clusters, eight genes in the aut cluster were verified to be responsible for the assembly of dithiolopyrrolone core. The aut cluster was further confirmed by heterologous expression and in-framemore » gene deletion experiments. Intriguingly, we found that the heterogenetic thioesterase HlmK derived from the holomycin (hlm) gene cluster in Streptomyces clavuligerus significantly improved heterologous biosynthesis of dithiolopyrrolones in Streptomyces albus through coexpression with the aut cluster. In the previous studies, HlmK was considered invalid because it has a Ser to Gly point mutation within the canonical Ser-His-Asp catalytic triad of thioesterases. However, gene inactivation and complementation experiments in our study unequivocally demonstrated that HlmK is an active distinctive type II thioesterase that plays a beneficial role in dithiolopyrrolone biosynthesis. - Highlights: • Cloning of the aureothricin biosynthetic gene cluster from Streptomyces thioluteus DSM 40027. • Identification of the aureothricin gene cluster by heterologous expression and in-frame gene deletion. • The heterogenetic thioesterase HlmK significantly improved dithiolopyrrolones production of the aureothricin gene cluster. • Identification of HlmK as an unusual type II thioesterase.« less
Osborne, Peter W; Benoit, Gérard; Laudet, Vincent; Schubert, Michael; Ferrier, David E K
2009-03-01
The ParaHox cluster is the evolutionary sister to the Hox cluster. Like the Hox cluster, the ParaHox cluster displays spatial and temporal regulation of the component genes along the anterior/posterior axis in a manner that correlates with the gene positions within the cluster (a feature called collinearity). The ParaHox cluster is however a simpler system to study because it is composed of only three genes. We provide a detailed analysis of the amphioxus ParaHox cluster and, for the first time in a single species, examine the regulation of the cluster in response to a single developmental signalling molecule, retinoic acid (RA). Embryos treated with either RA or RA antagonist display altered ParaHox gene expression: AmphiGsx expression shifts in the neural tube, and the endodermal boundary between AmphiXlox and AmphiCdx shifts its anterior/posterior position. We identified several putative retinoic acid response elements and in vitro assays suggest some may participate in RA regulation of the ParaHox genes. By comparison to vertebrate ParaHox gene regulation we explore the evolutionary implications. This work highlights how insights into the regulation and evolution of more complex vertebrate arrangements can be obtained through studies of a simpler, unduplicated amphioxus gene cluster.
cluML: A markup language for clustering and cluster validity assessment of microarray data.
Bolshakova, Nadia; Cunningham, Pádraig
2005-01-01
cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.
de Lima-Morales, Daiana; Chaves-Moreno, Diego; Wos-Oxley, Melissa L; Jáuregui, Ruy; Vilchez-Vargas, Ramiro; Pieper, Dietmar H
2016-01-01
Pseudomonas veronii 1YdBTEX2, a benzene and toluene degrader, and Pseudomonas veronii 1YB2, a benzene degrader, have previously been shown to be key players in a benzene-contaminated site. These strains harbor unique catabolic pathways for the degradation of benzene comprising a gene cluster encoding an isopropylbenzene dioxygenase where genes encoding downstream enzymes were interrupted by stop codons. Extradiol dioxygenases were recruited from gene clusters comprising genes encoding a 2-hydroxymuconic semialdehyde dehydrogenase necessary for benzene degradation but typically absent from isopropylbenzene dioxygenase-encoding gene clusters. The benzene dihydrodiol dehydrogenase-encoding gene was not clustered with any other aromatic degradation genes, and the encoded protein was only distantly related to dehydrogenases of aromatic degradation pathways. The involvement of the different gene clusters in the degradation pathways was suggested by real-time quantitative reverse transcription PCR. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Jørgensen, Hanne; Fjærvik, Espen; Hakvåg, Sigrid; Bruheim, Per; Bredholt, Harald; Klinkenberg, Geir; Ellingsen, Trond E.; Zotchev, Sergey B.
2009-01-01
A large number of Streptomyces bacteria with antifungal activity isolated from samples collected in the Trondheim fjord (Norway) were found to produce polyene compounds. Investigation of polyene-containing extracts revealed that most of the isolates produced the same compound, which had an atomic mass and UV spectrum corresponding to those of candicidin D. The morphological diversity of these isolates prompted us to speculate about the involvement of a mobile genetic element in dissemination of the candicidin biosynthesis gene cluster (can). Eight candicidin-producing isolates were analyzed by performing a 16S rRNA gene-based taxonomic analysis, pulsed-field gel electrophoresis, PCR, and Southern blot hybridization with can-specific probes. These analyses revealed that most of the isolates were related, although they were morphologically diverse, and that all of them contained can genes. The majority of the isolates studied contained large plasmids, and two can-specific probes hybridized to a 250-kb plasmid in one isolate. Incubation of the latter isolate at a high temperature resulted in loss of the can genes and candicidin production, while mating of the “cured” strain with a plasmid-containing donor restored candicidin production. The latter result suggested that the 250-kb plasmid contains the complete can gene cluster and could be responsible for conjugative transfer of this cluster to other streptomycetes. PMID:19286787
2013-01-01
Background The development of new therapies for orphan genetic diseases represents an extremely important medical and social challenge. Drug repositioning, i.e. finding new indications for approved drugs, could be one of the most cost- and time-effective strategies to cope with this problem, at least in a subset of cases. Therefore, many computational approaches based on the analysis of high throughput gene expression data have so far been proposed to reposition available drugs. However, most of these methods require gene expression profiles directly relevant to the pathologic conditions under study, such as those obtained from patient cells and/or from suitable experimental models. In this work we have developed a new approach for drug repositioning, based on identifying known drug targets showing conserved anti-correlated expression profiles with human disease genes, which is completely independent from the availability of ‘ad hoc’ gene expression data-sets. Results By analyzing available data, we provide evidence that the genes displaying conserved anti-correlation with drug targets are antagonistically modulated in their expression by treatment with the relevant drugs. We then identified clusters of genes associated to similar phenotypes and showing conserved anticorrelation with drug targets. On this basis, we generated a list of potential candidate drug-disease associations. Importantly, we show that some of the proposed associations are already supported by independent experimental evidence. Conclusions Our results support the hypothesis that the identification of gene clusters showing conserved anticorrelation with drug targets can be an effective method for drug repositioning and provide a wide list of new potential drug-disease associations for experimental validation. PMID:24088245
Molecular codes for neuronal individuality and cell assembly in the brain
Yagi, Takeshi
2012-01-01
The brain contains an enormous, but finite, number of neurons. The ability of this limited number of neurons to produce nearly limitless neural information over a lifetime is typically explained by combinatorial explosion; that is, by the exponential amplification of each neuron's contribution through its incorporation into “cell assemblies” and neural networks. In development, each neuron expresses diverse cellular recognition molecules that permit the formation of the appropriate neural cell assemblies to elicit various brain functions. The mechanism for generating neuronal assemblies and networks must involve molecular codes that give neurons individuality and allow them to recognize one another and join appropriate networks. The extensive molecular diversity of cell-surface proteins on neurons is likely to contribute to their individual identities. The clustered protocadherins (Pcdh) is a large subfamily within the diverse cadherin superfamily. The clustered Pcdh genes are encoded in tandem by three gene clusters, and are present in all known vertebrate genomes. The set of clustered Pcdh genes is expressed in a random and combinatorial manner in each neuron. In addition, cis-tetramers composed of heteromultimeric clustered Pcdh isoforms represent selective binding units for cell-cell interactions. Here I present the mathematical probabilities for neuronal individuality based on the random and combinatorial expression of clustered Pcdh isoforms and their formation of cis-tetramers in each neuron. Notably, clustered Pcdh gene products are known to play crucial roles in correct axonal projections, synaptic formation, and neuronal survival. Their molecular and biological features induce a hypothesis that the diverse clustered Pcdh molecules provide the molecular code by which neuronal individuality and cell assembly permit the combinatorial explosion of networks that supports enormous processing capability and plasticity of the brain. PMID:22518100
Omura, S; Ikeda, H; Ishikawa, J; Hanamoto, A; Takahashi, C; Shinose, M; Takahashi, Y; Horikawa, H; Nakazawa, H; Osonoe, T; Kikuchi, H; Shiba, T; Sakaki, Y; Hattori, M
2001-10-09
Streptomyces avermitilis is a soil bacterium that carries out not only a complex morphological differentiation but also the production of secondary metabolites, one of which, avermectin, is commercially important in human and veterinary medicine. The major interest in this genus Streptomyces is the diversity of its production of secondary metabolites as an industrial microorganism. A major factor in its prominence as a producer of the variety of secondary metabolites is its possession of several metabolic pathways for biosynthesis. Here we report sequence analysis of S. avermitilis, covering 99% of its genome. At least 8.7 million base pairs exist in the linear chromosome; this is the largest bacterial genome sequence, and it provides insights into the intrinsic diversity of the production of the secondary metabolites of Streptomyces. Twenty-five kinds of secondary metabolite gene clusters were found in the genome of S. avermitilis. Four of them are concerned with the biosyntheses of melanin pigments, in which two clusters encode tyrosinase and its cofactor, another two encode an ochronotic pigment derived from homogentiginic acid, and another polyketide-derived melanin. The gene clusters for carotenoid and siderophore biosyntheses are composed of seven and five genes, respectively. There are eight kinds of gene clusters for type-I polyketide compound biosyntheses, and two clusters are involved in the biosyntheses of type-II polyketide-derived compounds. Furthermore, a polyketide synthase that resembles phloroglucinol synthase was detected. Eight clusters are involved in the biosyntheses of peptide compounds that are synthesized by nonribosomal peptide synthetases. These secondary metabolite clusters are widely located in the genome but half of them are near both ends of the genome. The total length of these clusters occupies about 6.4% of the genome.
A 6-gene signature identifies four molecular subgroups of neuroblastoma
2011-01-01
Background There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. Results The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). Conclusions Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics. PMID:21492432
Poswar, Fabiano de Oliveira; Farias, Lucyana Conceição; Fraga, Carlos Alberto de Carvalho; Bambirra, Wilson; Brito-Júnior, Manoel; Sousa-Neto, Manoel Damião; Santos, Sérgio Henrique Souza; de Paula, Alfredo Maurício Batista; D'Angelo, Marcos Flávio Silveira Vasconcelos; Guimarães, André Luiz Sena
2015-06-01
Bioinformatics has emerged as an important tool to analyze the large amount of data generated by research in different diseases. In this study, gene expression for radicular cysts (RCs) and periapical granulomas (PGs) was characterized based on a leader gene approach. A validated bioinformatics algorithm was applied to identify leader genes for RCs and PGs. Genes related to RCs and PGs were first identified in PubMed, GenBank, GeneAtlas, and GeneCards databases. The Web-available STRING software (The European Molecular Biology Laboratory [EMBL], Heidelberg, Baden-Württemberg, Germany) was used in order to build the interaction map among the identified genes by a significance score named weighted number of links. Based on the weighted number of links, genes were clustered using k-means. The genes in the highest cluster were considered leader genes. Multilayer perceptron neural network analysis was used as a complementary supplement for gene classification. For RCs, the suggested leader genes were TP53 and EP300, whereas PGs were associated with IL2RG, CCL2, CCL4, CCL5, CCR1, CCR3, and CCR5 genes. Our data revealed different gene expression for RCs and PGs, suggesting that not only the inflammatory nature but also other biological processes might differentiate RCs and PGs. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J
2006-01-01
One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.
Shamseldin, Abdelaal; Carro, Lorena; Peix, Alvaro; Velázquez, Encarna; Moawad, Hassan; Sadowsky, Michael J
2016-06-01
In the present work we analyzed the taxonomic status of several Rhizobium strains isolated from Trifolium alexandrinum L. nodules in Egypt. The 16S rRNA genes of these strains were identical to those of Rhizobium bangladeshense BLR175(T) and Rhizobium binae BLR195(T). However, the analyses of recA and atpD genes split the strains into two clusters. Cluster II strains are identified as R. bangladeshense with >98% similarity values in both genes. The cluster I strains are phylogenetically related to Rhizobium etli CFN42(T) and R. bangladeshense BLR175(T), but with less than 94% similarity values in recA and atpD genes. DNA-DNA hybridization analysis showed 42% and 48% average relatedness between the strain 1010(T) from cluster I with respect to R. bangladeshense BLR175(T) and R. etli CFN42(T), respectively. Phenotypic characteristics of cluster I strains also differed from those of their closest related Rhizobium species. Analysis of the nodC gene showed that the strains belong to two groups within the symbiovar trifolii which was identified in Egypt linked to the species R. bangladeshense. Based on the genotypic and phenotypic characteristics, the group I strains belong to a new species for which the name Rhizobium aegyptiacum sp. nov. (sv. trifolii) is proposed, with strain 1010(T) being designated as the type strain (= USDA 7124(T)=LMG 29296(T)=CECT 9098(T)). Copyright © 2016 Elsevier GmbH. All rights reserved.
Zhang, Xiujun; Alemany, Lawrence B.; Fiedler, Hans-Peter; Goodfellow, Michael; Parry, Ronald J.
2008-01-01
The antibiotics lactonamycin and lactonamycin Z provide attractive leads for antibacterial drug development. Both antibiotics contain a novel aglycone core called lactonamycinone. To gain insight into lactonamycinone biosynthesis, cloning and precursor incorporation experiments were undertaken. The lactonamycin gene cluster was initially cloned from Streptomyces rishiriensis. Sequencing of ca. 61 kb of S. rishiriensis DNA revealed the presence of 57 open reading frames. These included genes coding for the biosynthesis of l-rhodinose, the sugar found in lactonamycin, and genes similar to those in the tetracenomycin biosynthetic gene cluster. Since lactonamycin production by S. rishiriensis could not be sustained, additional proof for the identity of the S. rishiriensis cluster was obtained by cloning the lactonamycin Z gene cluster from Streptomyces sanglieri. Partial sequencing of the S. sanglieri cluster revealed 15 genes that exhibited a very high degree of similarity to genes within the lactonamycin cluster, as well as an identical organization. Double-crossover disruption of one gene in the S. sanglieri cluster abolished lactonamycin Z production, and production was restored by complementation. These results confirm the identity of the genetic locus cloned from S. sanglieri and indicate that the highly similar locus in S. rishiriensis encodes lactonamycin biosynthetic genes. Precursor incorporation experiments with S. sanglieri revealed that lactonamycinone is biosynthesized in an unusual manner whereby glycine or a glycine derivative serves as a starter unit that is extended by nine acetate units. Analysis of the gene clusters and of the precursor incorporation data suggested a hypothetical scheme for lactonamycinone biosynthesis. PMID:18070976
Lieberman, Richard; Kranzler, Henry R; Joshi, Pujan; Shin, Dong-Guk; Covault, Jonathan
2015-09-01
Genetic variation in a region of chromosome 4p12 that includes the GABAA subunit gene GABRA2 has been reproducibly associated with alcohol dependence (AD). However, the molecular mechanisms underlying the association are unknown. This study examined correlates of in vitro gene expression of the AD-associated GABRA2 rs279858*C-allele in human neural cells using an induced pluripotent stem cell (iPSC) model system. We examined mRNA expression of chromosome 4p12 GABAA subunit genes (GABRG1, GABRA2, GABRA4, and GABRB1) in 36 human neural cell lines differentiated from iPSCs using quantitative polymerase chain reaction and next-generation RNA sequencing. mRNA expression in adult human brain was examined using the BrainCloud and BRAINEAC data sets. We found significantly lower levels of GABRA2 mRNA in neural cell cultures derived from rs279858*C-allele carriers. Levels of GABRA2 RNA were correlated with those of the other 3 chromosome 4p12 GABAA genes, but not other neural genes. Cluster analysis based on the relative RNA levels of the 4 chromosome 4p12 GABAA genes identified 2 distinct clusters of cell lines, a low-expression cluster associated with rs279858*C-allele carriers and a high-expression cluster enriched for the rs279858*T/T genotype. In contrast, there was no association of genotype with chromosome 4p12 GABAA gene expression in postmortem adult cortex in either the BrainCloud or BRAINEAC data sets. AD-associated variation in GABRA2 is associated with differential expression of the entire cluster of GABAA subunit genes on chromosome 4p12 in human iPSC-derived neural cell cultures. The absence of a parallel effect in postmortem human adult brain samples suggests that AD-associated genotype effects on GABAA expression, although not present in mature cortex, could have effects on regulation of the chromosome 4p12 GABAA cluster during neural development. Copyright © 2015 by the Research Society on Alcoholism.
Booma, P M; Prabhakaran, S; Dhanalakshmi, R
2014-01-01
Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.
Booma, P. M.; Prabhakaran, S.; Dhanalakshmi, R.
2014-01-01
Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality. PMID:25136661
Janevska, Slavica; Arndt, Birgit; Baumann, Leonie; Apken, Lisa Helene; Mauriz Marques, Lucas Maciel; Humpf, Hans-Ulrich; Tudzynski, Bettina
2017-01-01
The PKS-NRPS-derived tetramic acid equisetin and its N-desmethyl derivative trichosetin exhibit remarkable biological activities against a variety of organisms, including plants and bacteria, e.g., Staphylococcus aureus. The equisetin biosynthetic gene cluster was first described in Fusarium heterosporum, a species distantly related to the notorious rice pathogen Fusarium fujikuroi. Here we present the activation and characterization of a homologous, but silent, gene cluster in F. fujikuroi. Bioinformatic analysis revealed that this cluster does not contain the equisetin N-methyltransferase gene eqxD and consequently, trichosetin was isolated as final product. The adaption of the inducible, tetracycline-dependent Tet-on promoter system from Aspergillus niger achieved a controlled overproduction of this toxic metabolite and a functional characterization of each cluster gene in F. fujikuroi. Overexpression of one of the two cluster-specific transcription factor (TF) genes, TF22, led to an activation of the three biosynthetic cluster genes, including the PKS-NRPS key gene. In contrast, overexpression of TF23, encoding a second Zn(II)2Cys6 TF, did not activate adjacent cluster genes. Instead, TF23 was induced by the final product trichosetin and was required for expression of the transporter-encoding gene MFS-T. TF23 and MFS-T likely act in consort and contribute to detoxification of trichosetin and therefore, self-protection of the producing fungus. PMID:28379186
Janevska, Slavica; Arndt, Birgit; Baumann, Leonie; Apken, Lisa Helene; Mauriz Marques, Lucas Maciel; Humpf, Hans-Ulrich; Tudzynski, Bettina
2017-04-05
The PKS-NRPS-derived tetramic acid equisetin and its N -desmethyl derivative trichosetin exhibit remarkable biological activities against a variety of organisms, including plants and bacteria, e.g., Staphylococcus aureus . The equisetin biosynthetic gene cluster was first described in Fusarium heterosporum , a species distantly related to the notorious rice pathogen Fusarium fujikuroi . Here we present the activation and characterization of a homologous, but silent, gene cluster in F. fujikuroi . Bioinformatic analysis revealed that this cluster does not contain the equisetin N -methyltransferase gene eqxD and consequently, trichosetin was isolated as final product. The adaption of the inducible, tetracycline-dependent Tet-on promoter system from Aspergillus niger achieved a controlled overproduction of this toxic metabolite and a functional characterization of each cluster gene in F. fujikuroi . Overexpression of one of the two cluster-specific transcription factor (TF) genes, TF22 , led to an activation of the three biosynthetic cluster genes, including the PKS-NRPS key gene. In contrast, overexpression of TF23 , encoding a second Zn(II)₂Cys₆ TF, did not activate adjacent cluster genes. Instead, TF23 was induced by the final product trichosetin and was required for expression of the transporter-encoding gene MFS-T . TF23 and MFS-T likely act in consort and contribute to detoxification of trichosetin and therefore, self-protection of the producing fungus.
Computational gene expression profiling under salt stress reveals patterns of co-expression
Sanchita; Sharma, Ashok
2016-01-01
Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
Mihali, Troco K; Kellmann, Ralf; Neilan, Brett A
2009-03-30
Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs) are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved in the biosynthesis, may also afford the identification of these gene clusters in dinoflagellates, the cause of human mortalities and significant financial loss to the tourism and shellfish industries.
Mihali, Troco K; Kellmann, Ralf; Neilan, Brett A
2009-01-01
Background Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs) are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. Results We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. Conclusion The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved in the biosynthesis, may also afford the identification of these gene clusters in dinoflagellates, the cause of human mortalities and significant financial loss to the tourism and shellfish industries. PMID:19331657
Global Occurrence of Archaeal amoA Genes in Terrestrial Hot Springs▿
Zhang, Chuanlun L.; Ye, Qi; Huang, Zhiyong; Li, WenJun; Chen, Jinquan; Song, Zhaoqi; Zhao, Weidong; Bagwell, Christopher; Inskeep, William P.; Ross, Christian; Gao, Lei; Wiegel, Juergen; Romanek, Christopher S.; Shock, Everett L.; Hedlund, Brian P.
2008-01-01
Despite the ubiquity of ammonium in geothermal environments and the thermodynamic favorability of aerobic ammonia oxidation, thermophilic ammonia-oxidizing microorganisms belonging to the crenarchaeota kingdom have only recently been described. In this study, we analyzed microbial mats and surface sediments from 21 hot spring samples (pH 3.4 to 9.0; temperature, 41 to 86°C) from the United States, China, and Russia and obtained 846 putative archaeal ammonia monooxygenase large-subunit (amoA) gene and transcript sequences, representing a total of 41 amoA operational taxonomic units (OTUs) at 2% identity. The amoA gene sequences were highly diverse, yet they clustered within two major clades of archaeal amoA sequences known from water columns, sediments, and soils: clusters A and B. Eighty-four percent (711/846) of the sequences belonged to cluster A, which is typically found in water columns and sediments, whereas 16% (135/846) belonged to cluster B, which is typically found in soils and sediments. Although a few amoA OTUs were present in several geothermal regions, most were specific to a single region. In addition, cluster A amoA genes formed geographic groups, while cluster B sequences did not group geographically. With the exception of only one hot spring, principal-component analysis and UPGMA (unweighted-pair group method using average linkages) based on the UniFrac metric derived from cluster A grouped the springs by location, regardless of temperature or bulk water pH, suggesting that geography may play a role in structuring communities of putative ammonia-oxidizing archaea (AOA). The amoA genes were distinct from those of low-temperature environments; in particular, pair-wise comparisons between hot spring amoA genes and those from sympatric soils showed less than 85% sequence identity, underscoring the distinctness of hot spring archaeal communities from those of the surrounding soil system. Reverse transcription-PCR showed that amoA genes were transcribed in situ in one spring and the transcripts were closely related to the amoA genes amplified from the same spring. Our study demonstrates the global occurrence of putative archaeal amoA genes in a wide variety of terrestrial hot springs and suggests that geography may play an important role in selecting different assemblages of AOA. PMID:18676703
Global occurrence of archaeal amoA genes in terrestrial hot springs.
Zhang, Chuanlun L; Ye, Qi; Huang, Zhiyong; Li, Wenjun; Chen, Jinquan; Song, Zhaoqi; Zhao, Weidong; Bagwell, Christopher; Inskeep, William P; Ross, Christian; Gao, Lei; Wiegel, Juergen; Romanek, Christopher S; Shock, Everett L; Hedlund, Brian P
2008-10-01
Despite the ubiquity of ammonium in geothermal environments and the thermodynamic favorability of aerobic ammonia oxidation, thermophilic ammonia-oxidizing microorganisms belonging to the crenarchaeota kingdom have only recently been described. In this study, we analyzed microbial mats and surface sediments from 21 hot spring samples (pH 3.4 to 9.0; temperature, 41 to 86 degrees C) from the United States, China, and Russia and obtained 846 putative archaeal ammonia monooxygenase large-subunit (amoA) gene and transcript sequences, representing a total of 41 amoA operational taxonomic units (OTUs) at 2% identity. The amoA gene sequences were highly diverse, yet they clustered within two major clades of archaeal amoA sequences known from water columns, sediments, and soils: clusters A and B. Eighty-four percent (711/846) of the sequences belonged to cluster A, which is typically found in water columns and sediments, whereas 16% (135/846) belonged to cluster B, which is typically found in soils and sediments. Although a few amoA OTUs were present in several geothermal regions, most were specific to a single region. In addition, cluster A amoA genes formed geographic groups, while cluster B sequences did not group geographically. With the exception of only one hot spring, principal-component analysis and UPGMA (unweighted-pair group method using average linkages) based on the UniFrac metric derived from cluster A grouped the springs by location, regardless of temperature or bulk water pH, suggesting that geography may play a role in structuring communities of putative ammonia-oxidizing archaea (AOA). The amoA genes were distinct from those of low-temperature environments; in particular, pair-wise comparisons between hot spring amoA genes and those from sympatric soils showed less than 85% sequence identity, underscoring the distinctness of hot spring archaeal communities from those of the surrounding soil system. Reverse transcription-PCR showed that amoA genes were transcribed in situ in one spring and the transcripts were closely related to the amoA genes amplified from the same spring. Our study demonstrates the global occurrence of putative archaeal amoA genes in a wide variety of terrestrial hot springs and suggests that geography may play an important role in selecting different assemblages of AOA.
Wang, H-X; Chen, Y-Y; Ge, L; Fang, T-T; Meng, J; Liu, Z; Fang, X-Y; Ni, S; Lin, C; Wu, Y-Y; Wang, M-L; Shi, N-N; He, H-G; Hong, K; Shen, Y-M
2013-07-01
Ansamycins are a family of macrolactams that are synthesized by type I polyketide synthase (PKS) using 3-amino-5-hydroxybenzoic acid (AHBA) as the starter unit. Most members of the family have strong antimicrobial, antifungal, anticancer and/or antiviral activities. We aimed to discover new ansamycins and/or other AHBA-containing natural products from actinobacteria. Through PCR screening of AHBA synthase gene, we identified 26 AHBA synthase gene-positive strains from 206 plant-associated actinomycetes (five positives) and 688 marine-derived actinomycetes (21 positives), representing a positive ratio of 2·4-3·1%. Twenty-five ansamycins, including eight new compounds, were isolated from six AHBA synthase gene-positive strains through TLC-guided fractionations followed by repeated column chromatography. To gain information about those potential ansamycin gene clusters whose products were unknown, seven strains with phylogenetically divergent AHBA synthase genes were subjected to fosmid library construction. Of the seven gene clusters we obtained, three show characteristics for typical ansamycin gene clusters, and other four, from Micromonospora spp., appear to lack the amide synthase gene, which is unusual for ansamycin biosynthesis. The gene composition of these four gene clusters suggests that they are involved in the biosynthesis of a new family of hybrid PK-NRP compounds containing AHBA substructure. PCR screening of AHBA synthase is an efficient approach to discover novel ansamycins and other AHBA-containing natural products. This work demonstrates that the AHBA-based screening method is a useful approach for discovering novel ansamycins and other AHBA-containing natural products from new microbial resources. Journal of Applied Microbiology © 2013 The Society for Applied Microbiology.
Beites, Tiago; Mendes, Marta V
2015-01-01
The increased number of bacterial genome sequencing projects has generated over the last years a large reservoir of genomic information. In silico analysis of this genomic data has renewed the interest in bacterial bioprospecting for bioactive compounds by unveiling novel biosynthetic gene clusters of unknown or uncharacterized metabolites. However, only a small fraction of those metabolites is produced under laboratory-controlled conditions; the remaining clusters represent a pool of novel metabolites that are waiting to be "awaken". Activation of the biosynthetic gene clusters that present reduced or no expression (known as cryptic or silent clusters) by heterologous expression has emerged as a strategy for the identification and production of novel bioactive molecules. Synthetic biology, with engineering principles at its core, provides an excellent framework for the development of efficient heterologous systems for the expression of biosynthetic gene clusters. However, a common problem in its application is the host-interference problem, i.e., the unpredictable interactions between the device and the host that can hamper the desired output. Although an effort has been made to develop orthogonal devices, the most proficient way to overcome the host-interference problem is through genome simplification. In this review we present an overview on the strategies and tools used in the development of hosts/chassis for the heterologous expression of specialized metabolites biosynthetic gene clusters. Finally, we introduce the concept of specialized host as the next step of development of expression hosts.
Lim, Si-Kyu; Ju, Jianhua; Zazopoulos, Emmanuel; Jiang, Hui; Seo, Jeong-Woo; Chen, Yihua; Feng, Zhiyang; Rajski, Scott R; Farnet, Chris M; Shen, Ben
2009-10-23
iso-Migrastatin and related glutarimide-containing polyketides are potent inhibitors of tumor cell migration and their implied potential as antimetastatic agents for human cancers has garnered significant attention. Genome scanning of Streptomyces platensis NRRL 18993 unveiled two candidate gene clusters (088D and mgs); each encodes acyltransferase-less type I polyketide synthases commensurate with iso-migrastatin biosynthesis. Both clusters were inactivated by lambda-RED-mediated PCR-targeting mutagenesis in S. platensis; iso-migrastatin production was completely abolished in the DeltamgsF mutant SB11012 strain, whereas inactivation of 088D-orf7 yielded the SB11006 strain that exhibited no discernible change in iso-migrastatin biosynthesis. These data indicate that iso-migrastatin production is governed by the mgs cluster. Systematic gene inactivation allowed determination of the precise boundaries of the mgs cluster and the essentiality of the genes within the mgs cluster in iso-migrastatin production. The mgs cluster consists of 11 open reading frames that encode three acyltransferase-less type I polyketide synthases (MgsEFG), one discrete acyltransferase (MgsH), a type II thioesterase (MgsB), three post-PKS tailoring enzymes (MgsIJK), two glutarimide biosynthesis enzymes (MgsCD), and one regulatory protein (MgsA). A model for iso-migrastatin biosynthesis is proposed based on functional assignments derived from bioinformatics and is further supported by the results of in vivo gene inactivation experiments.
Lim, Si-Kyu; Ju, Jianhua; Zazopoulos, Emmanuel; Jiang, Hui; Seo, Jeong-Woo; Chen, Yihua; Feng, Zhiyang; Rajski, Scott R.; Farnet, Chris M.; Shen, Ben
2009-01-01
iso-Migrastatin and related glutarimide-containing polyketides are potent inhibitors of tumor cell migration and their implied potential as antimetastatic agents for human cancers has garnered significant attention. Genome scanning of Streptomyces platensis NRRL 18993 unveiled two candidate gene clusters (088D and mgs); each encodes acyltransferase-less type I polyketide synthases commensurate with iso-migrastatin biosynthesis. Both clusters were inactivated by λ-RED-mediated PCR-targeting mutagenesis in S. platensis; iso-migrastatin production was completely abolished in the ΔmgsF mutant SB11012 strain, whereas inactivation of 088D-orf7 yielded the SB11006 strain that exhibited no discernible change in iso-migrastatin biosynthesis. These data indicate that iso-migrastatin production is governed by the mgs cluster. Systematic gene inactivation allowed determination of the precise boundaries of the mgs cluster and the essentiality of the genes within the mgs cluster in iso-migrastatin production. The mgs cluster consists of 11 open reading frames that encode three acyltransferase-less type I polyketide synthases (MgsEFG), one discrete acyltransferase (MgsH), a type II thioesterase (MgsB), three post-PKS tailoring enzymes (MgsIJK), two glutarimide biosynthesis enzymes (MgsCD), and one regulatory protein (MgsA). A model for iso-migrastatin biosynthesis is proposed based on functional assignments derived from bioinformatics and is further supported by the results of in vivo gene inactivation experiments. PMID:19726666
Guo, Liliang; Sui, Zhenghong; Zhang, Shu; Ren, Yuanyuan; Liu, Yuan
2015-04-01
Diatoms form an enormous group of photoautotrophic micro-eukaryotes and play a crucial role in marine ecology. In this study, we evaluated typical genes to determine whether they were effective at different levels of diatom clustering analysis to assess the potential of these regions for barcoding taxa. Our test genes included nuclear rRNA genes (the nuclear small-subunit rRNA gene and the 5.8S rRNA gene+ITS-2), a mitochondrial gene (cytochrome c-oxidase subunit 1, COI), a chloroplast gene [ribulose-1,5-biphosphate carboxylase/oxygenase large subunit (rbcL)] and the universal plastid amplicon (UPA). Calculated genetic divergence was highest for the internal transcribed spacer (ITS; 5.8S+ITS-2) (p-distance of 1.569, 85.84% parsimony-informative sites) and COI (6.084, 82.14%), followed by the 18S rRNA gene (0.139, 57.69%), rbcL (0.120, 42.01%) and UPA (0.050, 14.97%), which indicated that ITS and COI were highly divergent compared with the other tested genes, and that their nucleotide compositions were variable within the whole group of diatoms. Bayesian inference (BI) analysis showed that the phylogenetic trees generated from each gene clustered diatoms at different phylogenetic levels. The 18S rRNA gene was better than the other genes in clustering higher diatom taxa, and both the 18S rRNA gene and rbcL performed well in clustering some lower taxa. The COI region was able to barcode species of some genera within the Bacillariophyceae. ITS was a potential marker for DNA based-taxonomy and DNA barcoding of Thalassiosirales, while species of Cyclotella, Skeletonema and Stephanodiscus gathered in separate clades, and were paraphyletic with those of Thalassiosira. Finally, UPA was too conserved to serve as a diatom barcode. © 2015 IUMS.
A formal concept analysis approach to consensus clustering of multi-experiment expression data
2014-01-01
Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Patel, Vidushi S; Cooper, Steven JB; Deakin, Janine E; Fulton, Bob; Graves, Tina; Warren, Wesley C; Wilson, Richard K; Graves, Jennifer AM
2008-01-01
Background Vertebrate alpha (α)- and beta (β)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the α- and β-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil β-globin gene (ω) in the marsupial α-cluster, however, suggested that duplication of the α-β cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous α- and β-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation. Results The platypus α-globin cluster (chromosome 21) contains embryonic and adult α- globin genes, a β-like ω-globin gene, and the GBY globin gene with homology to cytoglobin, arranged as 5'-ζ-ζ'-αD-α3-α2-α1-ω-GBY-3'. The platypus β-globin cluster (chromosome 2) contains single embryonic and adult globin genes arranged as 5'-ε-β-3'. Surprisingly, all of these globin genes were expressed in some adult tissues. Comparison of flanking sequences revealed that all jawed vertebrate α-globin clusters are flanked by MPG-C16orf35 and LUC7L, whereas all bird and mammal β-globin clusters are embedded in olfactory genes. Thus, the mammalian α- and β-globin clusters are orthologous to the bird α- and β-globin clusters respectively. Conclusion We propose that α- and β-globin clusters evolved from an ancient MPG-C16orf35-α-β-GBY-LUC7L arrangement 410 million years ago. A copy of the original β (represented by ω in marsupials and monotremes) was inserted into an array of olfactory genes before the amniote radiation (>315 million years ago), then duplicated and diverged to form orthologous clusters of β-globin genes with different expression profiles in different lineages. PMID:18657265
van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A
2009-06-01
Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
Favre, Patrick; Bapaume, Laure; Bossolini, Eligio; Delorenzi, Mauro; Falquet, Laurent; Reinhardt, Didier
2014-12-03
Genes involved in arbuscular mycorrhizal (AM) symbiosis have been identified primarily by mutant screens, followed by identification of the mutated genes (forward genetics). In addition, a number of AM-related genes has been identified by their AM-related expression patterns, and their function has subsequently been elucidated by knock-down or knock-out approaches (reverse genetics). However, genes that are members of functionally redundant gene families, or genes that have a vital function and therefore result in lethal mutant phenotypes, are difficult to identify. If such genes are constitutively expressed and therefore escape differential expression analyses, they remain elusive. The goal of this study was to systematically search for AM-related genes with a bioinformatics strategy that is insensitive to these problems. The central element of our approach is based on the fact that many AM-related genes are conserved only among AM-competent species. Our approach involves genome-wide comparisons at the proteome level of AM-competent host species with non-mycorrhizal species. Using a clustering method we first established orthologous/paralogous relationships and subsequently identified protein clusters that contain members only of the AM-competent species. Proteins of these clusters were then analyzed in an extended set of 16 plant species and ranked based on their relatedness among AM-competent monocot and dicot species, relative to non-mycorrhizal species. In addition, we combined the information on the protein-coding sequence with gene expression data and with promoter analysis. As a result we present a list of yet uncharacterized proteins that show a strongly AM-related pattern of sequence conservation, indicating that the respective genes may have been under selection for a function in AM. Among the top candidates are three genes that encode a small family of similar receptor-like kinases that are related to the S-locus receptor kinases involved in sporophytic self-incompatibility. We present a new systematic strategy of gene discovery based on conservation of the protein-coding sequence that complements classical forward and reverse genetics. This strategy can be applied to diverse other biological phenomena if species with established genome sequences fall into distinguished groups that differ in a defined functional trait of interest.
The PhytoClust tool for metabolic gene clusters discovery in plant genomes
Fuchs, Lisa-Maria
2017-01-01
Abstract The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. PMID:28486689
The PhytoClust tool for metabolic gene clusters discovery in plant genomes.
Töpfer, Nadine; Fuchs, Lisa-Maria; Aharoni, Asaph
2017-07-07
The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin
2016-01-01
ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins, borrelidin, two novel lipopeptides, and one unknown antibiotic from Streptomyces rochei Sal35. The transfer, expression, and screening of the library were all performed in a high-throughput way, so that this approach is scalable and adaptable to industrial automation for next-generation antibiotic discovery. PMID:27451447
Adamek, Martina; Alanjary, Mohammad; Sales-Ortells, Helena; Goodfellow, Michael; Bull, Alan T; Winkler, Anika; Wibberg, Daniel; Kalinowski, Jörn; Ziemert, Nadine
2018-06-01
Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis' strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.
Jadhav, Rohit R; Ye, Zhenqing; Huang, Rui-Lan; Liu, Joseph; Hsu, Pei-Yin; Huang, Yi-Wen; Rangel, Leticia B; Lai, Hung-Cheng; Roa, Juan Carlos; Kirma, Nameer B; Huang, Tim Hui-Ming; Jin, Victor X
2015-01-01
Recent genome-wide analysis has shown that DNA methylation spans long stretches of chromosome regions consisting of clusters of contiguous CpG islands or gene families. Hypermethylation of various gene clusters has been reported in many types of cancer. In this study, we conducted methyl-binding domain capture (MBDCap) sequencing (MBD-seq) analysis on a breast cancer cohort consisting of 77 patients and 10 normal controls, as well as a panel of 38 breast cancer cell lines. Bioinformatics analysis determined seven gene clusters with a significant difference in overall survival (OS) and further revealed a distinct feature that the conservation of a large gene cluster (approximately 70 kb) metallothionein-1 (MT1) among 45 species is much lower than the average of all RefSeq genes. Furthermore, we found that DNA methylation is an important epigenetic regulator contributing to gene repression of MT1 gene cluster in both ERα positive (ERα+) and ERα negative (ERα-) breast tumors. In silico analysis revealed much lower gene expression of this cluster in The Cancer Genome Atlas (TCGA) cohort for ERα + tumors. To further investigate the role of estrogen, we conducted 17β-estradiol (E2) and demethylating agent 5-aza-2'-deoxycytidine (DAC) treatment in various breast cancer cell types. Cell proliferation and invasion assays suggested MT1F and MT1M may play an anti-oncogenic role in breast cancer. Our data suggests that DNA methylation in large contiguous gene clusters can be potential prognostic markers of breast cancer. Further investigation of these clusters revealed that estrogen mediates epigenetic repression of MT1 cluster in ERα + breast cancer cell lines. In all, our studies identify thousands of breast tumor hypermethylated regions for the first time, in particular, discovering seven large contiguous hypermethylated gene clusters.
Rezzonico, Fabio; Braun-Kiewnick, Andrea; Mann, Rachel A; Rodoni, Brendan; Goesmann, Alexander; Duffy, Brion; Smits, Theo H M
2012-10-01
Comparative genomic analysis revealed differences in the lipopolysaccharide (LPS) biosynthesis gene cluster between the Rubus-infecting strain ATCC BAA-2158 and the Spiraeoideae-infecting strain CFBP 1430 of Erwinia amylovora. These differences corroborate rpoB-based phylogenetic clustering of E. amylovora into four different groups and enable the discrimination of Spiraeoideae- and Rubus-infecting strains. The structure of the differences between the two groups supports the hypothesis that adaptation to Rubus spp. took place after species separation of E. amylovora and E. pyrifoliae that contrasts with a recently proposed scenario, based on CRISPR data, in which the shift to domesticated apple would have caused an evolutionary bottleneck in the Spiraeoideae-infecting strains of E. amylovora which would be a much earlier event. In the core region of the LPS biosynthetic gene cluster, Spiraeoideae-infecting strains encode three glycosyltransferases and an LPS ligase (Spiraeoideae-type waaL), whereas Rubus-infecting strains encode two glycosyltransferases and a different LPS ligase (Rubus-type waaL). These coding domains share little to no homology at the amino acid level between Rubus- and Spiraeoideae-infecting strains, and this genotypic difference was confirmed by polymerase chain reaction analysis of the associated DNA region in 31 Rubus- and Spiraeoideae-infecting strains. The LPS biosynthesis gene cluster may thus be used as a molecular marker to distinguish between Rubus- and Spiraeoideae-infecting strains of E. amylovora using primers designed in this study. © 2012 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2012 BSPP AND BLACKWELL PUBLISHING LTD.
Guo, Bing; Greenwood, Paul L; Cafe, Linda M; Zhou, Guanghong; Zhang, Wangang; Dalrymple, Brian P
2015-03-13
This study aimed to identify markers for muscle growth rate and the different cellular contributors to cattle muscle and to link the muscle growth rate markers to specific cell types. The expression of two groups of genes in the longissimus muscle (LM) of 48 Brahman steers of similar age, significantly enriched for "cell cycle" and "ECM (extracellular matrix) organization" Gene Ontology (GO) terms was correlated with average daily gain/kg liveweight (ADG/kg) of the animals. However, expression of the same genes was only partly related to growth rate across a time course of postnatal LM development in two cattle genotypes, Piedmontese x Hereford (high muscling) and Wagyu x Hereford (high marbling). The deposition of intramuscular fat (IMF) altered the relationship between the expression of these genes and growth rate. K-means clustering across the development time course with a large set of genes (5,596) with similar expression profiles to the ECM genes was undertaken. The locations in the clusters of published markers of different cell types in muscle were identified and used to link clusters of genes to the cell type most likely to be expressing them. Overall correspondence between published cell type expression of markers and predicted major cell types of expression in cattle LM was high. However, some exceptions were identified: expression of SOX8 previously attributed to muscle satellite cells was correlated with angiogenesis. Analysis of the clusters and cell types suggested that the "cell cycle" and "ECM" signals were from the fibro/adipogenic lineage. Significant contributions to these signals from the muscle satellite cells, angiogenic cells and adipocytes themselves were not as strongly supported. Based on the clusters and cell type markers, sets of five genes predicted to be representative of fibro/adipogenic precursors (FAPs) and endothelial cells, and/or ECM remodelling and angiogenesis were identified. Gene sets and gene markers for the analysis of many of the major processes/cell populations contributing to muscle composition and growth have been proposed, enabling a consistent interpretation of gene expression datasets from cattle LM. The same gene sets are likely to be applicable in other cattle muscles and in other species.
Norde, Marina Maintinguer; Oki, Erica; Carioca, Antonio A F; Castro, Inar A; Souza, José M P; Marchioni, Dirce M L; Fisberg, Regina M; Rogero, Marcelo M
2017-03-01
The aim of this study was to investigate the interaction of toll-like receptor 4 (TLR4) gene single nucleotide polymorphism (SNP) and plasma fatty acid (FA) profile in modulating risk for systemic inflammation. In all, 262 adult (19-59 y) participants of the Health Survey of São Paulo met the inclusion criteria. Anthropometric parameters, blood pressure, plasma inflammatory biomarker concentration, and fatty acid profile were measured and four SNPs of the TLR4 gene (rs4986790, rs4986791, rs11536889, and rs5030728) were genotyped. Multivariate cluster analysis was performed to stratify individuals based on levels of 11 plasma inflammatory biomarkers into two groups: inflammatory (INF) and noninflammatory (NINF). No association was found between any of the SNPs studied and systemic inflammation. The INF cluster had higher palmitic acid levels (P = 0.039) and estimated stearoyl coenzyme A desaturase activity (P = 0.045) and lower polyunsaturated fatty acid (P = 0.011), ω-6 fatty acid (P = 0.018), arachidonic acid (P = 0.002) levels, and estimated δ-5 desaturase activity (P = 0.025) compared with the NINF cluster. Statistically significant interaction between rs11536889 and arachidonic acid/eicosapentaenoic acid (AA/EPA) ratio (P = 0.034) was found to increase the odds of belonging to the INF cluster when individuals had the variant allele C and were at the higher percentile of AA/EPA plasma ratio. Plasma fatty acid profile modulated the odds of belonging to the INF cluster depending on genotypes of TRL4 gene polymorphisms. Copyright © 2016 Elsevier Inc. All rights reserved.
Large clusters of co-expressed genes in the Drosophila genome.
Boutanaev, Alexander M; Kalmykova, Alla I; Shevelyov, Yuri Y; Nurminsky, Dmitry I
2002-12-12
Clustering of co-expressed, non-homologous genes on chromosomes implies their co-regulation. In lower eukaryotes, co-expressed genes are often found in pairs. Clustering of genes that share aspects of transcriptional regulation has also been reported in higher eukaryotes. To advance our understanding of the mode of coordinated gene regulation in multicellular organisms, we performed a genome-wide analysis of the chromosomal distribution of co-expressed genes in Drosophila. We identified a total of 1,661 testes-specific genes, one-third of which are clustered on chromosomes. The number of clusters of three or more genes is much higher than expected by chance. We observed a similar trend for genes upregulated in the embryo and in the adult head, although the expression pattern of individual genes cannot be predicted on the basis of chromosomal position alone. Our data suggest that the prevalent mechanism of transcriptional co-regulation in higher eukaryotes operates with extensive chromatin domains that comprise multiple genes.
Unusual Gene Order and Organization of the Sea Urchin Hox Cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cameron, R A; Rowen, L; Nesbitt, R
2005-10-11
The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3 gene is Hox5. (The gene order is :more » 5-Hox1, 2, 3, 11/13c, 11/13b, 11/13a, 9/10, 8, 7, 6, 5 - 3). The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.« less
Unusual Gene Order and Organization of the Sea Urchin HoxCluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew
2005-05-10
The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is :more » 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.« less
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants
Schläpfer, Pascal; Zhang, Peifen; Wang, Chuan; ...
2017-04-01
Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we will need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can bemore » used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.« less
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants1[OPEN
Zhang, Peifen; Kim, Taehyong; Banf, Michael; Chavali, Arvind K.; Nilo-Poyanco, Ricardo; Bernard, Thomas
2017-01-01
Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters. PMID:28228535
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schläpfer, Pascal; Zhang, Peifen; Wang, Chuan
Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we will need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can bemore » used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.« less
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants.
Schläpfer, Pascal; Zhang, Peifen; Wang, Chuan; Kim, Taehyong; Banf, Michael; Chae, Lee; Dreher, Kate; Chavali, Arvind K; Nilo-Poyanco, Ricardo; Bernard, Thomas; Kahn, Daniel; Rhee, Seung Y
2017-04-01
Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters. © 2017 American Society of Plant Biologists. All Rights Reserved.
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
Boesten, Rolf; Schuren, Frank; Wind, Richèle D; Knol, Jan; de Vos, Willem M
2011-09-01
A total of 20 Bifidobacterium strains were isolated from fecal samples of 4 breast- and bottle-fed infants and all were characterized as Bifidobacterium breve based on 16S rRNA gene sequence and metabolic analysis. These isolates were further characterized and compared to the type strains of B. breve and 7 other Bifidobacterium spp. by comparative genome hybridization. For this purpose, we constructed and used a DNA-based microarray containing over 2000 randomly cloned DNA fragments from B. breve type strain LMG13208. This molecular analysis revealed a high degree of genomic variation between the isolated strains and allowed the vast majority to be grouped into 4 clusters. One cluster contained a single isolate that was virtually indistinguishable from the B. breve type strain. The 3 other clusters included 19 B. breve strains that differed considerably from all type strains. Remarkably, each of the 4 clusters included strains that were isolated from a single infant, indicating that a niche adaptation may contribute to variation within the B. breve species. Based on genomic hybridization data, the new B. breve isolates were estimated to contain approximately 60-90% of the genes of the B. breve type strain, attesting to the existence of various subspecies within the species B. breve. Further bioinformatic analysis identified several hundred diagnostic clones specific to the genomic clustering of the B. breve isolates. Molecular analysis of representatives of these revealed that annotated genes from the conserved B. breve core encoded mainly housekeeping functions, while the strain-specific genes were predicted to code for functions related to life style, such as carbohydrate metabolism and transport. This is compatible with genetic adaptation of the strains to their niche, a combination of infants and diet. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Underlying mathematics in diversification of human olfactory receptors in different loci.
Hassan, Sk Sarif; Choudhury, Pabitra Pal; Goswami, Arunava
2013-12-01
As per conservative estimate, approximately 51-105 Olfactory Receptors (ORs) loci are present in human genome occurring in clusters. These clusters are apparently unevenly spread as mosaics over 21 pairs of human chromosomes. Olfactory Receptor (OR) gene families which are thought to have expanded for the need to provide recognition capability for a huge number of pure and complex odorants, form the largest known multigene family in the human genome. Recent studies have shown that 388 full length and 414 OR pseudo-genes are present in these OR genomic clusters. In this paper, the authors report a classification method for all human ORs based on their sequential quantitative information like presence of poly strings of nucleotides bases, long range correlation and so on. An L-System generated sequence has been taken as an input into a star-model of specific subfamily members and resultant sequence has been mapped to a specific OR based on the classification scheme using fractal parameters like Hurst exponent and fractal dimensions.
Tanaka-Tsuno, Fumiko; Mizukami-Murata, Satomi; Murata, Yoshinori; Nakamura, Toshihide; Ando, Akira; Takagi, Hiroshi; Shima, Jun
2007-10-01
In the modern baking industry, high-sucrose-tolerant (HS) and maltose-utilizing (LS) yeast were developed using breeding techniques and are now used commercially. Sugar utilization and high-sucrose tolerance differ significantly between HS and LS yeasts. We analysed the gene expression profiles of HS and LS yeasts under different sucrose conditions in order to determine their basic physiology. Two-way hierarchical clustering was performed to obtain the overall patterns of gene expression. The clustering clearly showed that the gene expression patterns of LS yeast differed from those of HS yeast. Quality threshold clustering was used to identify the gene clusters containing upregulated genes (cluster 1) and downregulated genes (cluster 2) under high-sucrose conditions. Clusters 1 and 2 contained numerous genes involved in carbon and nitrogen metabolism, respectively. The expression level of the genes involved in the metabolism of glycerol and trehalose, which are known to be osmoprotectants, in LS yeast was higher than that in HS yeast under sucrose concentrations of 5-40%. No clear correlation was found between the expression level of the genes involved in the biosynthesis of the osmoprotectants and the intracellular contents of the osmoprotectants. The present gene expression data were compared with data previously reported in a comprehensive analysis of a gene deletion strain collection. Welch's t-test for this comparison showed that the relative growth rates of the deletion strains whose deletion occurred in genes belonging to cluster 1 were significantly higher than the average growth rates of all deletion strains. Copyright 2007 John Wiley & Sons, Ltd.
Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.
Medema, Marnix H; Paalvast, Yared; Nguyen, Don D; Melnik, Alexey; Dorrestein, Pieter C; Takano, Eriko; Breitling, Rainer
2014-09-01
Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X.
Williams, Bronwyn W; Scribner, Kim T
2010-01-01
Reintroductions and translocations are increasingly used to repatriate or increase probabilities of persistence for animal and plant species. Genetic and demographic characteristics of founding individuals and suitability of habitat at release sites are commonly believed to affect the success of these conservation programs. Genetic divergence among multiple source populations of American martens (Martes americana) and well documented introduction histories permitted analyses of post-introduction dispersion from release sites and development of genetic clusters in the Upper Peninsula (UP) of Michigan <50 years following release. Location and size of spatial genetic clusters and measures of individual-based autocorrelation were inferred using 11 microsatellite loci. We identified three genetic clusters in geographic proximity to original release locations. Estimated distances of effective gene flow based on spatial autocorrelation varied greatly among genetic clusters (30-90 km). Spatial contiguity of genetic clusters has been largely maintained with evidence for admixture primarily in localized regions, suggesting recent contact or locally retarded rates of gene flow. Data provide guidance for future studies of the effects of permeabilities of different land-cover and land-use features to dispersal and of other biotic and environmental factors that may contribute to the colonization process and development of spatial genetic associations.
Response of Human Skin to Aesthetic Scarification
Gabriel, Vincent A.; McClellan, Elizabeth A.; Scheuermann, Richard H.
2014-01-01
This study was undertaken to investigate changes in RNA expression in previously healthy adult human skin following thermal injury induced by contact with hot metal that was undertaken as part of aesthetic scarification, a body modification practice. Subjects were recruited to have pre-injury skin and serial wound biopsies performed. 4 mm punch biopsies were taken prior to branding and 1 hour, 1 week, and 1, 2 and 3 months post injury. RNA was extracted and quality assured prior to the use of a whole-genome based bead array platform to describe expression changes in the samples using the pre-injury skin as a comparator. Analysis of the array data was performed using k-means clustering and a hypergeometric probability distribution without replacement and corrections for multiple comparisons were done. Confirmatory q-PCR was performed. Using a k of 10, several clusters of genes were shown to co-cluster together based on Gene Ontology classification with probabilities unlikely to occur by chance alone. OF particular interest were clusters relating to cell cycle, proteinaceous extracellular matrix and keratinization. Given the consistent expression changes at one week following injury in the cell cycle cluster, there is an opportunity to intervene early following burn injury to influence scar development. PMID:24582755
Differential Retention of Gene Functions in a Secondary Metabolite Cluster.
Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W
2017-08-01
In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.
Wang, Tao; Li, Hua; Wang, Hua; Su, Jing
2015-04-16
The present study established a typing method with NotI-based pulsed-field gel electrophoresis (PFGE) and stress response gene schemed multilocus sequence typing (MLST) for 55 Oenococcus oeni strains isolated from six individual regions in China and two model strains PSU-1 (CP000411) and ATCC BAA-1163 (AAUV00000000). Seven stress response genes, cfa, clpL, clpP, ctsR, mleA, mleP and omrA, were selected for MLST testing, and positive selective pressure was detected for these genes. Furthermore, both methods separated the strains into two clusters. The PFGE clusters are correlated with the region, whereas the sequence types (STs) formed by the MLST confirm the two clusters identified by PFGE. In addition, the population structure was a mixture of evolutionary pathways, and the strains exhibited both clonal and panmictic characteristics. Copyright © 2015 Elsevier B.V. All rights reserved.
Karbalaei, Reza; Allahyari, Marzieh; Rezaei-Tavirani, Mostafa; Asadzadeh-Aghdaei, Hamid; Zali, Mohammad Reza
2018-01-01
Analysis reconstruction networks from two diseases, NAFLD and Alzheimer`s diseases and their relationship based on systems biology methods. NAFLD and Alzheimer`s diseases are two complex diseases, with progressive prevalence and high cost for countries. There are some reports on relation and same spreading pathways of these two diseases. In addition, they have some similar risk factors, exclusively lifestyle such as feeding, exercises and so on. Therefore, systems biology approach can help to discover their relationship. DisGeNET and STRING databases were sources of disease genes and constructing networks. Three plugins of Cytoscape software, including ClusterONE, ClueGO and CluePedia, were used to analyze and cluster networks and enrichment of pathways. An R package used to define best centrality method. Finally, based on degree and Betweenness, hubs and bottleneck nodes were defined. Common genes between NAFLD and Alzheimer`s disease were 190 genes that used construct a network with STRING database. The resulting network contained 182 nodes and 2591 edges and comprises from four clusters. Enrichment of these clusters separately lead to carbohydrate metabolism, long chain fatty acid and regulation of JAK-STAT and IL-17 signaling pathways, respectively. Also seven genes selected as hub-bottleneck include: IL6, AKT1, TP53, TNF, JUN, VEGFA and PPARG. Enrichment of these proteins and their first neighbors in network by OMIM database lead to diabetes and obesity as ancestors of NAFLD and AD. Systems biology methods, specifically PPI networks, can be useful for analyzing complicated related diseases. Finding Hub and bottleneck proteins should be the goal of drug designing and introducing disease markers.
Wichmann, Gunnar; Rosolowski, Maciej; Krohn, Knut; Kreuz, Markus; Boehm, Andreas; Reiche, Anett; Scharrer, Ulrike; Halama, Dirk; Bertolini, Julia; Bauer, Ulrike; Holzinger, Dana; Pawlita, Michael; Hess, Jochen; Engel, Christoph; Hasenclever, Dirk; Scholz, Markus; Ahnert, Peter; Kirsten, Holger; Hemprich, Alexander; Wittekind, Christian; Herbarth, Olf; Horn, Friedemann; Dietz, Andreas; Loeffler, Markus
2015-12-15
Stratification of head and neck squamous cell carcinomas (HNSCC) based on HPV16 DNA and RNA status, gene expression patterns, and mutated candidate genes may facilitate patient treatment decision. We characterize head and neck squamous cell carcinomas (HNSCC) with different HPV16 DNA and RNA (E6*I) status from 290 consecutively recruited patients by gene expression profiling and targeted sequencing of 50 genes. We show that tumors with transcriptionally inactive HPV16 (DNA+ RNA-) are similar to HPV-negative (DNA-) tumors regarding gene expression and frequency of TP53 mutations (47%, 8/17 and 43%, 72/167, respectively). We also find that an immune response-related gene expression cluster is associated with lymph node metastasis, independent of HPV16 status and that disruptive TP53 mutations are associated with lymph node metastasis in HPV16 DNA- tumors. We validate each of these associations in another large data set. Four gene expression clusters which we identify differ moderately but significantly in overall survival. Our findings underscore the importance of measuring the HPV16 RNA (E6*I) and TP53-mutation status for patient stratification and identify associations of an immune response-related gene expression cluster and TP53 mutations with lymph node metastasis in HNSCC. © 2015 UICC.
Dolezal, Tomas; Gazi, Michal; Zurovec, Michal; Bryant, Peter J
2003-10-01
Many Drosophila genes exist as members of multigene families and within each family the members can be functionally redundant, making it difficult to identify them by classical mutagenesis techniques based on phenotypic screening. We have addressed this problem in a genetic analysis of a novel family of six adenosine deaminase-related growth factors (ADGFs). We used ends-in targeting to introduce mutations into five of the six ADGF genes, taking advantage of the fact that five of the family members are encoded by a three-gene cluster and a two-gene cluster. We used two targeting constructs to introduce loss-of-function mutations into all five genes, as well as to isolate different combinations of multiple mutations, independent of phenotypic consequences. The results show that (1) it is possible to use ends-in targeting to disrupt gene clusters; (2) gene conversion, which is usually considered a complication in gene targeting, can be used to help recover different mutant combinations in a single screening procedure; (3) the reduction of duplication to a single copy by induction of a double-strand break is better explained by the single-strand annealing mechanism than by simple crossing over between repeats; and (4) loss of function of the most abundantly expressed family member (ADGF-A) leads to disintegration of the fat body and the development of melanotic tumors in mutant larvae.
Computational gene network study on antibiotic resistance genes of Acinetobacter baumannii.
Anitha, P; Anbarasu, Anand; Ramaiah, Sudha
2014-05-01
Multi Drug Resistance (MDR) in Acinetobacter baumannii is one of the major threats for emerging nosocomial infections in hospital environment. Multidrug-resistance in A. baumannii may be due to the implementation of multi-combination resistance mechanisms such as β-lactamase synthesis, Penicillin-Binding Proteins (PBPs) changes, alteration in porin proteins and in efflux pumps against various existing classes of antibiotics. Multiple antibiotic resistance genes are involved in MDR. These resistance genes are transferred through plasmids, which are responsible for the dissemination of antibiotic resistance among Acinetobacter spp. In addition, these resistance genes may also have a tendency to interact with each other or with their gene products. Therefore, it becomes necessary to understand the impact of these interactions in antibiotic resistance mechanism. Hence, our study focuses on protein and gene network analysis on various resistance genes, to elucidate the role of the interacting proteins and to study their functional contribution towards antibiotic resistance. From the search tool for the retrieval of interacting gene/protein (STRING), a total of 168 functional partners for 15 resistance genes were extracted based on the confidence scoring system. The network study was then followed up with functional clustering of associated partners using molecular complex detection (MCODE). Later, we selected eight efficient clusters based on score. Interestingly, the associated protein we identified from the network possessed greater functional similarity with known resistance genes. This network-based approach on resistance genes of A. baumannii could help in identifying new genes/proteins and provide clues on their association in antibiotic resistance. Copyright © 2014 Elsevier Ltd. All rights reserved.
Stevenson, G; Andrianopoulos, K; Hobbs, M; Reeves, P R
1996-01-01
Colanic acid (CA) is an extracellular polysaccharide produced by most Escherichia coli strains as well as by other species of the family Enterobacteriaceae. We have determined the sequence of a 23-kb segment of the E. coli K-12 chromosome which includes the cluster of genes necessary for production of CA. The CA cluster comprises 19 genes. Two other sequenced genes (orf1.3 and galF), which are situated between the CA cluster and the O-antigen cluster, were shown to be unnecessary for CA production. The CA cluster includes genes for synthesis of GDP-L-fucose, one of the precursors of CA, and the gene for one of the enzymes in this pathway (GDP-D-mannose 4,6-dehydratase) was identified by biochemical assay. Six of the inferred proteins show sequence similarity to glycosyl transferases, and two others have sequence similarity to acetyl transferases. Another gene (wzx) is predicted to encode a protein with multiple transmembrane segments and may function in export of the CA repeat unit from the cytoplasm into the periplasm in a process analogous to O-unit export. The first three genes of the cluster are predicted to encode an outer membrane lipoprotein, a phosphatase, and an inner membrane protein with an ATP-binding domain. Since homologs of these genes are found in other extracellular polysaccharide gene clusters, they may have a common function, such as export of polysaccharide from the cell. PMID:8759852
Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex
2010-01-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
Ficklin, Stephen P; Luo, Feng; Feltus, F Alex
2010-09-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
The sirodesmin biosynthetic gene cluster of the plant pathogenic fungus Leptosphaeria maculans.
Gardiner, Donald M; Cozijnsen, Anton J; Wilson, Leanne M; Pedras, M Soledade C; Howlett, Barbara J
2004-09-01
Sirodesmin PL is a phytotoxin produced by the fungus Leptosphaeria maculans, which causes blackleg disease of canola (Brassica napus). This phytotoxin belongs to the epipolythiodioxopiperazine (ETP) class of toxins produced by fungi including mammalian and plant pathogens. We report the cloning of a cluster of genes with predicted roles in the biosynthesis of sirodesmin PL and show via gene disruption that one of these genes (encoding a two-module non-ribosomal peptide synthetase) is essential for sirodesmin PL biosynthesis. Of the nine genes in the cluster tested, all are co-regulated with the production of sirodesmin PL in culture. A similar cluster is present in the genome of the opportunistic human pathogen Aspergillus fumigatus and is most likely responsible for the production of gliotoxin, which is also an ETP. Homologues of the genes in the cluster were also identified in expressed sequence tags of the ETP producing fungus Chaetomium globosum. Two other fungi with publicly available genome sequences, Magnaporthe grisea and Fusarium graminearum, had similar gene clusters. A comparative analysis of all four clusters is presented. This is the first report of the genes responsible for the biosynthesis of an ETP. Copyright 2004 Blackwell Publishing Ltd
Kurka, Hedwig; Ehrenreich, Armin; Ludwig, Wolfgang; Monot, Marc; Rupnik, Maja; Barbut, Frederic; Indra, Alexander; Dupuy, Bruno; Liebl, Wolfgang
2014-01-01
PCR-ribotyping is a broadly used method for the classification of isolates of Clostridium difficile, an emerging intestinal pathogen, causing infections with increased disease severity and incidence in several European and North American countries. We have now carried out clustering analysis with selected genes of numerous C. difficile strains as well as gene content comparisons of their genomes in order to broaden our view of the relatedness of strains assigned to different ribotypes. We analyzed the genomic content of 48 C. difficile strains representing 21 different ribotypes. The calculation of distance matrix-based dendrograms using the neighbor joining method for 14 conserved genes (standard phylogenetic marker genes) from the genomes of the C. difficile strains demonstrated that the genes from strains with the same ribotype generally clustered together. Further, certain ribotypes always clustered together and formed ribotype groups, i.e. ribotypes 078, 033 and 126, as well as ribotypes 002 and 017, indicating their relatedness. Comparisons of the gene contents of the genomes of ribotypes that clustered according to the conserved gene analysis revealed that the number of common genes of the ribotypes belonging to each of these three ribotype groups were very similar for the 078/033/126 group (at most 69 specific genes between the different strains with the same ribotype) but less similar for the 002/017 group (86 genes difference). It appears that the ribotype is indicative not only of a specific pattern of the amplified 16S–23S rRNA intergenic spacer but also reflects specific differences in the nucleotide sequences of the conserved genes studied here. It can be anticipated that the sequence deviations of more genes of C. difficile strains are correlated with their PCR-ribotype. In conclusion, the results of this study corroborate and extend the concept of clonal C. difficile lineages, which correlate with ribotypes affiliation. PMID:24482682
Sun, Zhengda; Wang, Chih-Yang; Lawson, Devon A; Kwek, Serena; Velozo, Hugo Gonzalez; Owyong, Mark; Lai, Ming-Derg; Fong, Lawrence; Wilson, Mark; Su, Hua; Werb, Zena; Cooke, Daniel L
2018-02-16
Tumor endothelial cells (TEC) play an indispensible role in tumor growth and metastasis although much of the detailed mechanism still remains elusive. In this study we characterized and compared the global gene expression profiles of TECs and control ECs isolated from human breast cancerous tissues and reduction mammoplasty tissues respectively by single cell RNA sequencing (scRNA-seq). Based on the qualified scRNA-seq libraries that we made, we found that 1302 genes were differentially expressed between these two EC phenotypes. Both principal component analysis (PCA) and heat map-based hierarchical clustering separated the cancerous versus control ECs as two distinctive clusters, and MetaCore disease biomarker analysis indicated that these differentially expressed genes are highly correlated with breast neoplasm diseases. Gene Set Enrichment Analysis software (GSEA) enriched these genes to extracellular matrix (ECM) signal pathways and highlighted 127 ECM-associated genes. External validation verified some of these ECM-associated genes are not only generally overexpressed in various cancer tissues but also specifically overexpressed in colorectal cancer ECs and lymphoma ECs. In conclusion, our data demonstrated that ECM-associated genes play pivotal roles in breast cancer EC biology and some of them could serve as potential TEC biomarkers for various cancers.
Many nonuniversal archaeal ribosomal proteins are found in conserved gene clusters
WANG, JIACHEN; DASGUPTA, INDRANI; FOX, GEORGE E.
2009-01-01
The genomic associations of the archaeal ribosomal proteins, (r-proteins), were examined in detail. The archaeal versions of the universal r-protein genes are typically in clusters similar or identical and to those found in bacteria. Of the 35 nonuniversal archaeal r-protein genes examined, the gene encoding L18e was found to be associated with the conserved L13 cluster, whereas the genes for S4e, L32e and L19e were found in the archaeal version of the spc operon. Eleven nonuniversal protein genes were not associated with any common genomic context. Of the remaining 19 protein genes, 17 were convincingly assigned to one of 10 previously unrecognized gene clusters. Examination of the gene content of these clusters revealed multiple associations with genes involved in the initiation of protein synthesis, transcription or other cellular processes. The lack of such associations in the universal clusters suggests that initially the ribosome evolved largely independently of other processes. More recently it likely has evolved in concert with other cellular systems. It was also verified that a second copy of the gene encoding L7ae found in some bacteria is actually a homolog of the gene encoding L30e and should be annotated as such. PMID:19478915
Clustering-based spot segmentation of cDNA microarray images.
Uslan, Volkan; Bucak, Ihsan Ömür
2010-01-01
Microarrays are utilized as that they provide useful information about thousands of gene expressions simultaneously. In this study segmentation step of microarray image processing has been implemented. Clustering-based methods, fuzzy c-means and k-means, have been applied for the segmentation step that separates the spots from the background. The experiments show that fuzzy c-means have segmented spots of the microarray image more accurately than the k-means.
Saeed, Isaam; Tang, Sen-Lin; Halgamuge, Saman K.
2012-01-01
An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis. PMID:22180538
Allcock, Richard J N; Barrow, Alexander D; Forbes, Simon; Beck, Stephan; Trowsdale, John
2003-02-01
We have characterized a cluster of single immunoglobulin variable (IgV) domain receptors centromeric of the major histocompatibility complex (MHC) on human chromosome 6. In addition to triggering receptor expressed on myeloid cells (TREM)-1 and TREM2, the cluster contains NKp44, a triggering receptor whose expression is limited to NK cells. We identified three new related genes and two gene fragments within a cluster of approximately 200 kb. Two of the three new genes lack charged residues in their transmembrane domain tails. Further, one of the genes contains two potential immunotyrosine Inhibitory motifs in its cytoplasmic tail, suggesting that it delivers inhibitory signals. The human and mouse TREM clusters appear to have diverged such that there are unique sequences in each species. Finally, each gene in the TREM cluster was expressed in a different range of cell types.
Kumar, Rakshak; Acharya, Vishal; Singh, Dharam; Kumar, Sanjay
2018-01-01
A light pink coloured bacterial strain ERGS5:01 isolated from glacial stream water of Sikkim Himalaya was affiliated to Janthinobacterium lividum based on 16S rRNA gene sequence identity and phylogenetic clustering. Whole genome sequencing was performed for the strain to confirm its taxonomy as it lacked the typical violet pigmentation of the genus and also to decipher its survival strategy at the aquatic ecosystem of high elevation. The PacBio RSII sequencing generated genome of 5,168,928 bp with 4575 protein-coding genes and 118 RNA genes. Whole genome-based multilocus sequence analysis clustering, in silico DDH similarity value of 95.1% and, the ANI value of 99.25% established the identity of the strain ERGS5:01 (MCC 2953) as a non-violacein producing J. lividum . The genome comparisons across genus Janthinobacterium revealed an open pan-genome with the scope of the addition of new orthologous cluster to complete the genomic inventory. The genomic insight provided the genetic basis of freezing and frequent freeze-thaw cycle tolerance and, for industrially important enzymes. Extended insight into the genome provided clues of crucial genes associated with adaptation in the harsh aquatic ecosystem of high altitude.
In vitro downregulated hypoxia transcriptome is associated with poor prognosis in breast cancer.
Abu-Jamous, Basel; Buffa, Francesca M; Harris, Adrian L; Nandi, Asoke K
2017-06-15
Hypoxia is a characteristic of breast tumours indicating poor prognosis. Based on the assumption that those genes which are up-regulated under hypoxia in cell-lines are expected to be predictors of poor prognosis in clinical data, many signatures of poor prognosis were identified. However, it was observed that cell line data do not always concur with clinical data, and therefore conclusions from cell line analysis should be considered with caution. As many transcriptomic cell-line datasets from hypoxia related contexts are available, integrative approaches which investigate these datasets collectively, while not ignoring clinical data, are required. We analyse sixteen heterogeneous breast cancer cell-line transcriptomic datasets in hypoxia-related conditions collectively by employing the unique capabilities of the method, UNCLES, which integrates clustering results from multiple datasets and can address questions that cannot be answered by existing methods. This has been demonstrated by comparison with the state-of-the-art iCluster method. From this collection of genome-wide datasets include 15,588 genes, UNCLES identified a relatively high number of genes (>1000 overall) which are consistently co-regulated over all of the datasets, and some of which are still poorly understood and represent new potential HIF targets, such as RSBN1 and KIAA0195. Two main, anti-correlated, clusters were identified; the first is enriched with MYC targets participating in growth and proliferation, while the other is enriched with HIF targets directly participating in the hypoxia response. Surprisingly, in six clinical datasets, some sub-clusters of growth genes are found consistently positively correlated with hypoxia response genes, unlike the observation in cell lines. Moreover, the ability to predict bad prognosis by a combined signature of one sub-cluster of growth genes and one sub-cluster of hypoxia-induced genes appears to be comparable and perhaps greater than that of known hypoxia signatures. We present a clustering approach suitable to integrate data from diverse experimental set-ups. Its application to breast cancer cell line datasets reveals new hypoxia-regulated signatures of genes which behave differently when in vitro (cell-line) data is compared with in vivo (clinical) data, and are of a prognostic value comparable or exceeding the state-of-the-art hypoxia signatures.
GreenPhylDB v2.0: comparative and functional genomics in plants.
Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G
2011-01-01
GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.
Dover, Nir; Barash, Jason R.; Burke, Julianne N.; ...
2014-05-22
Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bontmore » gene that is part of a toxin gene cluster that includes several accessory genes. In this paper, we sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ 70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. Finally, this TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.« less
Susca, Antonia; Proctor, Robert H; Butchko, Robert A E; Haidukowski, Miriam; Stea, Gaetano; Logrieco, Antonio; Moretti, Antonio
2014-12-01
The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack six fum genes, but nonproducing isolates of Aspergillus niger do not. In the current study, analyses of black aspergilli from grapes from the Mediterranean Basin indicate that the genomic context of the fum cluster is the same in isolates of A. niger and A. welwitschiae regardless of fumonisin-production ability and that full-length clusters occur in producing isolates of both species and nonproducing isolates of A. niger. In contrast, the cluster has undergone an eight-gene deletion in fumonisin-nonproducing isolates of A. welwitschiae. Phylogenetic analyses suggest each species consists of a mixed population of fumonisin-producing and nonproducing individuals, and that existence of both production phenotypes may provide a selective advantage to these species. Differences in gene content of fum cluster homologues and phylogenetic relationships of fum genes suggest that the mutation(s) responsible for the nonproduction phenotype differs, and therefore arose independently, in the two species. Partial fum cluster homologues were also identified in genome sequences of four other black Aspergillus species. Gene content of these partial clusters and phylogenetic relationships of fum sequences indicate that non-random partial deletion of the cluster has occurred multiple times among the species. This in turn suggests that an intact cluster and fumonisin production were once more widespread among black aspergilli. Copyright © 2014 Elsevier Inc. All rights reserved.
Microarray characterization of gene expression changes in blood during acute ethanol exposure
2013-01-01
Background As part of the civil aviation safety program to define the adverse effects of ethanol on flying performance, we performed a DNA microarray analysis of human whole blood samples from a five-time point study of subjects administered ethanol orally, followed by breathalyzer analysis, to monitor blood alcohol concentration (BAC) to discover significant gene expression changes in response to the ethanol exposure. Methods Subjects were administered either orange juice or orange juice with ethanol. Blood samples were taken based on BAC and total RNA was isolated from PaxGene™ blood tubes. The amplified cDNA was used in microarray and quantitative real-time polymerase chain reaction (RT-qPCR) analyses to evaluate differential gene expression. Microarray data was analyzed in a pipeline fashion to summarize and normalize and the results evaluated for relative expression across time points with multiple methods. Candidate genes showing distinctive expression patterns in response to ethanol were clustered by pattern and further analyzed for related function, pathway membership and common transcription factor binding within and across clusters. RT-qPCR was used with representative genes to confirm relative transcript levels across time to those detected in microarrays. Results Microarray analysis of samples representing 0%, 0.04%, 0.08%, return to 0.04%, and 0.02% wt/vol BAC showed that changes in gene expression could be detected across the time course. The expression changes were verified by qRT-PCR. The candidate genes of interest (GOI) identified from the microarray analysis and clustered by expression pattern across the five BAC points showed seven coordinately expressed groups. Analysis showed function-based networks, shared transcription factor binding sites and signaling pathways for members of the clusters. These include hematological functions, innate immunity and inflammation functions, metabolic functions expected of ethanol metabolism, and pancreatic and hepatic function. Five of the seven clusters showed links to the p38 MAPK pathway. Conclusions The results of this study provide a first look at changing gene expression patterns in human blood during an acute rise in blood ethanol concentration and its depletion because of metabolism and excretion, and demonstrate that it is possible to detect changes in gene expression using total RNA isolated from whole blood. The analysis approach for this study serves as a workflow to investigate the biology linked to expression changes across a time course and from these changes, to identify target genes that could serve as biomarkers linked to pilot performance. PMID:23883607
Siano, Marco; Espeli, Vittoria; Mach, Nicolas; Bossi, Paolo; Licitra, Lisa; Ghielmini, Michele; Frattini, Milo; Canevari, Silvana; De Cecco, Loris
2018-07-01
Platinum-based chemotherapy plus the anti-EGFR monoclonal antibody (mAb) cetuximab is used to treat recurrent/metastatic (RM) head-neck squamous cell carcinoma (HNSCC). Recently, we defined Cluster3 gene-expression signature as a potential predictor of favorable progression-free survival (PFS) in cetuximab-treated RM-HNSCC patients and predictor of partial metabolic FDG-PET response in an afatinib window-of-opportunity trial. Another anti-EGFR-mAb (panitumumab) was used as the treatment agent in RM-HNSCC patients in the phase II PANI01trial. PANI01 tumor samples were analyzed using functional genomics to explore response predictors to anti-EGFR therapy. Whole-gene expression and real-time PCR analyses were applied to pre-treatment samples from 25 PANI01 patients. Three gene signatures (Cluster3 score, RAS onco-signature, microenvironment score) and seven selected miRNAs were separately analyzed for association with panitumumab efficacy. Cluster3 expression levels had a profile with a significant bimodal separation of samples (P = 3.08 E-13). Higher RAS activation, microenvironment score, and miRNA expression were associated with low-Cluster3 patients. The same biomarkers were separately associated with PFS. Patients with high-Cluster3 had significantly longer PFS than patients with low-Cluster3 (median PFS: 174 versus 51 days; log-rank P = 0.0021). ROC analysis demonstrated accuracy in predicting PFS (AUC = 0.877). Despite differences in clinical settings and anti-EGFR inhibitors used for treatment, response prediction by the Cluster3 signature and selected miRNAs was essentially the same. Translation into a useful clinical assay requires validation in a broader setting. Copyright © 2018 Elsevier Ltd. All rights reserved.
2010-01-01
Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079
Miyamoto, Kiyoko T.; Komatsu, Mamoru
2014-01-01
Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. PMID:24907338
Miyamoto, Kiyoko T; Komatsu, Mamoru; Ikeda, Haruo
2014-08-01
Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Cardoza, R. E.; Malmierca, M. G.; Hermosa, M. R.; Alexander, N. J.; McCormick, S. P.; Proctor, R. H.; Tijerino, A. M.; Rumbero, A.; Monte, E.; Gutiérrez, S.
2011-01-01
Trichothecenes are mycotoxins produced by Trichoderma, Fusarium, and at least four other genera in the fungal order Hypocreales. Fusarium has a trichothecene biosynthetic gene (TRI) cluster that encodes transport and regulatory proteins as well as most enzymes required for the formation of the mycotoxins. However, little is known about trichothecene biosynthesis in the other genera. Here, we identify and characterize TRI gene orthologues (tri) in Trichoderma arundinaceum and Trichoderma brevicompactum. Our results indicate that both Trichoderma species have a tri cluster that consists of orthologues of seven genes present in the Fusarium TRI cluster. Organization of genes in the cluster is the same in the two Trichoderma species but differs from the organization in Fusarium. Sequence and functional analysis revealed that the gene (tri5) responsible for the first committed step in trichothecene biosynthesis is located outside the cluster in both Trichoderma species rather than inside the cluster as it is in Fusarium. Heterologous expression analysis revealed that two T. arundinaceum cluster genes (tri4 and tri11) differ in function from their Fusarium orthologues. The Tatri4-encoded enzyme catalyzes only three of the four oxygenation reactions catalyzed by the orthologous enzyme in Fusarium. The Tatri11-encoded enzyme catalyzes a completely different reaction (trichothecene C-4 hydroxylation) than the Fusarium orthologue (trichothecene C-15 hydroxylation). The results of this study indicate that although some characteristics of the tri/TRI cluster have been conserved during evolution of Trichoderma and Fusarium, the cluster has undergone marked changes, including gene loss and/or gain, gene rearrangement, and divergence of gene function. PMID:21642405
Yang, Ze-Hui; Zheng, Rui; Gao, Yuan; Zhang, Qiang
2016-09-01
With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies. We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods. Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity. Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01). Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research. © 2015 John Wiley & Sons Ltd.
2015-12-01
group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met- ric. For...differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as
USDA-ARS?s Scientific Manuscript database
Cyclopiazonic acid (CPA), an indole-tetramic acid toxin, is produced by many species of Aspergillus and Penicillium. In addition to CPA Aspergillus flavus produces polyketide-derived carcinogenic aflatoxins (AFs). AF biosynthesis genes form a gene cluster in a subtelomeric region. Isolates of A. fla...
Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan
2014-01-01
Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.
Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan
2014-01-01
Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417
Kang, Hahk-Soo
2017-02-01
Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.
Annotation of gene function in citrus using gene expression information and co-expression networks
2014-01-01
Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870
Leptokurtic pollen-flow, non-leptokurtic gene-flow in a wind-pollinated herb, Plantago lanceolata L.
Tonsor, Stephen J
1985-10-01
The purpose of this study was to simultaneously measure pollen dispersal distance and actual pollen-mediated gene-flow distance in a wind-pollinated herb, Plantago lanceolata. The pollen dispersal distribution, measured as pollen deposition in a wind tunnel, is leptokurtic, as expected from previous studies of wind-pollinated plants. Gene-flow, measured as seeds produced on rows of male-sterile inflorescences in the wind tunnel, is non-leptokurtic, peaking at an intermediate distance. The difference between the two distributions results from the tendency of the pollen grains to cluster. These pollen clusters are the units of gene dispersal, with clusters of intermediate and large size contributing disproportionately to gene-flow. Since many wind-pollinated species show pollen clustering (see text), the common assumption for wind-pollinated plants that gene-flow is leptokurtic requires re-examination. Gene-flow was also measured in an artifical outdoor population of male-steriles, containing a single pollen source plant in the center of the array. The gene flow distribution is significantly platykurtic, and has the same general properties outdoors, where wind speed and turbulence are uncontrolled, as it does in the wind tunnel. I estimated genetic neighborhood size based on my measure of gene-flow in the outdoor population. The estimate shows that populations of Plantago lanceolata will vary in effective number from a few tens of plants to more than five hundred plants, depending on the density of the population in question. Thus, the measured pollen-mediated gene-flow distribution and population density will interact to produce effective population sizes ranging from those in which there is no random genetic drift to those in which random genetic drift plays an important role in determining gene frequencies within and among populations. Despite the platykurtosis in the distribution, pollen-mediated gene dispersal distances are still quite limited, and considerable within and among-population genetic differentiation is to be expected in this species.
Ehrlich, Kenneth C; Mack, Brian M
2014-06-23
Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity.
Ehrlich, Kenneth C.; Mack, Brian M.
2014-01-01
Fifty six secondary metabolite biosynthesis gene clusters are predicted to be in the Aspergillus flavus genome. In spite of this, the biosyntheses of only seven metabolites, including the aflatoxins, kojic acid, cyclopiazonic acid and aflatrem, have been assigned to a particular gene cluster. We used RNA-seq to compare expression of secondary metabolite genes in gene clusters for the closely related fungi A. parasiticus, A. oryzae, and A. flavus S and L sclerotial morphotypes. The data help to refine the identification of probable functional gene clusters within these species. Our results suggest that A. flavus, a prevalent contaminant of maize, cottonseed, peanuts and tree nuts, is capable of producing metabolites which, besides aflatoxin, could be an underappreciated contributor to its toxicity. PMID:24960201
Gene Cluster Encoding Cholate Catabolism in Rhodococcus spp.
Wilbrink, Maarten H.; Casabon, Israël; Stewart, Gordon R.; Liu, Jie; van der Geize, Robert; Eltis, Lindsay D.
2012-01-01
Bile acids are highly abundant steroids with important functions in vertebrate digestion. Their catabolism by bacteria is an important component of the carbon cycle, contributes to gut ecology, and has potential commercial applications. We found that Rhodococcus jostii RHA1 grows well on cholate, as well as on its conjugates, taurocholate and glycocholate. The transcriptome of RHA1 growing on cholate revealed 39 genes upregulated on cholate, occurring in a single gene cluster. Reverse transcriptase quantitative PCR confirmed that selected genes in the cluster were upregulated 10-fold on cholate versus on cholesterol. One of these genes, kshA3, encoding a putative 3-ketosteroid-9α-hydroxylase, was deleted and found essential for growth on cholate. Two coenzyme A (CoA) synthetases encoded in the cluster, CasG and CasI, were heterologously expressed. CasG was shown to transform cholate to cholyl-CoA, thus initiating side chain degradation. CasI was shown to form CoA derivatives of steroids with isopropanoyl side chains, likely occurring as degradation intermediates. Orthologous gene clusters were identified in all available Rhodococcus genomes, as well as that of Thermomonospora curvata. Moreover, Rhodococcus equi 103S, Rhodococcus ruber Chol-4 and Rhodococcus erythropolis SQ1 each grew on cholate. In contrast, several mycolic acid bacteria lacking the gene cluster were unable to grow on cholate. Our results demonstrate that the above-mentioned gene cluster encodes cholate catabolism and is distinct from a more widely occurring gene cluster encoding cholesterol catabolism. PMID:23024343
Clustering gene expression regulators: new approach to disease subtyping.
Pyatnitskiy, Mikhail; Mazo, Ilya; Shkrob, Maria; Schwartz, Elena; Kotelnikova, Ekaterina
2014-01-01
One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.
Clustering Gene Expression Regulators: New Approach to Disease Subtyping
Pyatnitskiy, Mikhail; Mazo, Ilya; Shkrob, Maria; Schwartz, Elena; Kotelnikova, Ekaterina
2014-01-01
One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient. PMID:24416320
Tumour-associated and non-tumour-associated microbiota in colorectal cancer
Flemer, Burkhardt; Lynch, Denise B; Brown, Jillian M R; Jeffery, Ian B; Ryan, Feargal J; Claesson, Marcus J; O'Riordain, Micheal; Shanahan, Fergus; O'Toole, Paul W
2017-01-01
Objective A signature that unifies the colorectal cancer (CRC) microbiota across multiple studies has not been identified. In addition to methodological variance, heterogeneity may be caused by both microbial and host response differences, which was addressed in this study. Design We prospectively studied the colonic microbiota and the expression of specific host response genes using faecal and mucosal samples (‘ON’ and ‘OFF’ the tumour, proximal and distal) from 59 patients undergoing surgery for CRC, 21 individuals with polyps and 56 healthy controls. Microbiota composition was determined by 16S rRNA amplicon sequencing; expression of host genes involved in CRC progression and immune response was quantified by real-time quantitative PCR. Results The microbiota of patients with CRC differed from that of controls, but alterations were not restricted to the cancerous tissue. Differences between distal and proximal cancers were detected and faecal microbiota only partially reflected mucosal microbiota in CRC. Patients with CRC can be stratified based on higher level structures of mucosal-associated bacterial co-abundance groups (CAGs) that resemble the previously formulated concept of enterotypes. Of these, Bacteroidetes Cluster 1 and Firmicutes Cluster 1 were in decreased abundance in CRC mucosa, whereas Bacteroidetes Cluster 2, Firmicutes Cluster 2, Pathogen Cluster and Prevotella Cluster showed increased abundance in CRC mucosa. CRC-associated CAGs were differentially correlated with the expression of host immunoinflammatory response genes. Conclusions CRC-associated microbiota profiles differ from those in healthy subjects and are linked with distinct mucosal gene-expression profiles. Compositional alterations in the microbiota are not restricted to cancerous tissue and differ between distal and proximal cancers. PMID:26992426
Matsumi, Rie; Manabe, Kenji; Fukui, Toshiaki; Atomi, Haruyuki; Imanaka, Tadayuki
2007-04-01
We have developed a gene disruption system in the hyperthermophilic archaeon Thermococcus kodakaraensis using the antibiotic simvastatin and a fusion gene designed to overexpress the 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase gene (hmg(Tk)) with the glutamate dehydrogenase promoter. With this system, we disrupted the T. kodakaraensis amylopullulanase gene (apu(Tk)) or a gene cluster which includes apu(Tk) and genes encoding components of a putative sugar transporter. Disruption plasmids were introduced into wild-type T. kodakaraensis KOD1 cells, and transformants exhibiting resistance to 4 microM simvastatin were isolated. The transformants exhibited growth in the presence of 20 microM simvastatin, and we observed a 30-fold increase in intracellular HMG-CoA reductase activity. The expected gene disruption via double-crossover recombination occurred at the target locus, but we also observed recombination events at the hmg(Tk) locus when the endogenous hmg(Tk) gene was used. This could be avoided by using the corresponding gene from Pyrococcus furiosus (hmg(Pf)) or by linearizing the plasmid prior to transformation. While both gene disruption strains displayed normal growth on amino acids or pyruvate, cells without the sugar transporter genes could not grow on maltooligosaccharides or polysaccharides, indicating that the gene cluster encodes the only sugar transporter involved in the uptake of these compounds. The Deltaapu(Tk) strain could not grow on pullulan and displayed only low levels of growth on amylose, suggesting that Apu(Tk) is a major polysaccharide-degrading enzyme in T. kodakaraensis.
Holland, Peter W H
2013-01-01
Many homeobox genes encode transcription factors with regulatory roles in animal and plant development. Homeobox genes are found in almost all eukaryotes, and have diversified into 11 gene classes and over 100 gene families in animal evolution, and 10 to 14 gene classes in plants. The largest group in animals is the ANTP class which includes the well-known Hox genes, plus other genes implicated in development including ParaHox (Cdx, Xlox, Gsx), Evx, Dlx, En, NK4, NK3, Msx, and Nanog. Genomic data suggest that the ANTP class diversified by extensive tandem duplication to generate a large array of genes, including an NK gene cluster and a hypothetical ProtoHox gene cluster that duplicated to generate Hox and ParaHox genes. Expression and functional data suggest that NK, Hox, and ParaHox gene clusters acquired distinct roles in patterning the mesoderm, nervous system, and gut. The PRD class is also diverse and includes Pax2/5/8, Pax3/7, Pax4/6, Gsc, Hesx, Otx, Otp, and Pitx genes. PRD genes are not generally arranged in ancient genomic clusters, although the Dux, Obox, and Rhox gene clusters arose in mammalian evolution as did several non-clustered PRD genes. Tandem duplication and genome duplication expanded the number of homeobox genes, possibly contributing to the evolution of developmental complexity, but homeobox gene loss must not be ignored. Evolutionary changes to homeobox gene expression have also been documented, including Hox gene expression patterns shifting in concert with segmental diversification in vertebrates and crustaceans, and deletion of a Pitx1 gene enhancer in pelvic-reduced sticklebacks. WIREs Dev Biol 2013, 2:31-45. doi: 10.1002/wdev.78 For further resources related to this article, please visit the WIREs website. The author declares that he has no conflicts of interest. Copyright © 2012 Wiley Periodicals, Inc.
Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis
Koh, Esther G. L.; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V.; Brenner, Sydney; Venkatesh, Byrappa
2003-01-01
The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes. PMID:12547909
Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis.
Koh, Esther G L; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V; Brenner, Sydney; Venkatesh, Byrappa
2003-02-04
The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Lee, Sunhee; Reth, Alexander; Meletzus, Dietmar; Sevilla, Myrna; Kennedy, Christina
2000-01-01
A major 30.5-kb cluster of nif and associated genes of Acetobacter diazotrophicus (syn. Gluconacetobacter diazotrophicus), a nitrogen-fixing endophyte of sugarcane, was sequenced and analyzed. This cluster represents the largest assembly of contiguous nif-fix and associated genes so far characterized in any diazotrophic bacterial species. Northern blots and promoter sequence analysis indicated that the genes are organized into eight transcriptional units. The overall arrangement of genes is most like that of the nif-fix cluster in Azospirillum brasilense, while the individual gene products are more similar to those in species of Rhizobiaceae or in Rhodobacter capsulatus. PMID:11092875
Athey, Taryn B T; Vaillancourt, Katy; Frenette, Michel; Fittipaldi, Nahuel; Gottschalk, Marcelo; Grenier, Daniel
2016-01-01
Recently, we reported the purification and characterization of three distinct lantibiotics (named suicin 90-1330, suicin 3908, and suicin 65) produced by Streptococcus suis . In this study, we investigated the distribution of the three suicin lantibiotic gene clusters among serotype 2 S. suis strains belonging to sequence type (ST) 25 and ST28, the two dominant STs identified in North America. The genomes of 102 strains were interrogated for the presence of suicin gene clusters encoding suicins 90-1330, 3908, and 65. The gene cluster encoding suicin 65 was the most prevalent and mainly found among ST25 strains. In contrast, none of the genes related to suicin 90-1330 production were identified in 51 ST25 strains nor in 35/51 ST28 strains. However, the complete suicin 90-1330 gene cluster was found in ten ST28 strains, although some genes in the cluster were truncated in three of these isolates. The vast majority (101/102) of S. suis strains did not possess any of the genes encoding suicin 3908. In conclusion, this study indicates heterogeneous distribution of suicin genes in S. suis .
Bessonov, Kyrylo; Walkey, Christopher J.; Shelp, Barry J.; van Vuuren, Hennie J. J.; Chiu, David; van der Merwe, George
2013-01-01
Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples. PMID:24130853
COGNAT: a web server for comparative analysis of genomic neighborhoods.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
2017-11-22
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
Polycistronic gene expression in Aspergillus niger.
Schuetze, Tabea; Meyer, Vera
2017-09-25
Genome mining approaches predict dozens of biosynthetic gene clusters in each of the filamentous fungal genomes sequenced so far. However, the majority of these gene clusters still remain cryptic because they are not expressed in their natural host. Simultaneous expression of all genes belonging to a biosynthetic pathway in a heterologous host is one approach to activate biosynthetic gene clusters and to screen the metabolites produced for bioactivities. Polycistronic expression of all pathway genes under control of a single and tunable promoter would be the method of choice, as this does not only simplify cloning procedures, but also offers control on timing and strength of expression. However, polycistronic gene expression is a feature not commonly found in eukaryotic host systems, such as Aspergillus niger. In this study, we tested the suitability of the viral P2A peptide for co-expression of three genes in A. niger. Two genes descend from Fusarium oxysporum and are essential to produce the secondary metabolite enniatin (esyn1, ekivR). The third gene (luc) encodes the reporter luciferase which was included to study position effects. Expression of the polycistronic gene cassette was put under control of the Tet-On system to ensure tunable gene expression in A. niger. In total, three polycistronic expression cassettes which differed in the position of luc were constructed and targeted to the pyrG locus in A. niger. This allowed direct comparison of the luciferase activity based on the position of the luciferase gene. Doxycycline-mediated induction of the Tet-On expression cassettes resulted in the production of one long polycistronic mRNA as proven by Northern analyses, and ensured comparable production of enniatin in all three strains. Notably, gene position within the polycistronic expression cassette matters, as, luciferase activity was lowest at position one and had a comparable activity at positions two and three. The P2A peptide can be used to express at least three genes polycistronically in A. niger. This approach can now be applied to heterologously express entire secondary metabolite gene clusters polycistronically or to co-express any genes of interest in equimolar amounts.
Schmid, D; Allerberger, F; Huhulescu, S; Pietzka, A; Amar, C; Kleta, S; Prager, R; Preußel, K; Aichinger, E; Mellmann, A
2014-05-01
A cluster of seven human cases of listeriosis occurred in Austria and in Germany between April 2011 and July 2013. The Listeria monocytogenes serovar (SV) 1/2b isolates shared pulsed-field gel electrophoresis (PFGE) and fluorescent amplified fragment length polymorphism (fAFLP) patterns indistinguishable from those from five food producers. The seven human isolates, a control strain with a different PFGE/fAFLP profile and ten food isolates were subjected to whole genome sequencing (WGS) in a blinded fashion. A gene-by-gene comparison (multilocus sequence typing (MLST)+) was performed, and the resulting whole genome allelic profiles were compared using SeqSphere(+) software version 1.0. On analysis of 2298 genes, the four human outbreak isolates from 2012 to 2013 had different alleles at ≤6 genes, i.e. differed by ≤6 genes from each other; the dendrogram placed these isolates in between five Austrian unaged soft cheese isolates from producer A (≤19-gene difference from the human cluster) and two Austrian ready-to-eat meat isolates from producer B (≤8-gene difference from the human cluster). Both food products appeared on grocery bills prospectively collected by these outbreak cases after hospital discharge. Epidemiological results on food consumption and MLST+ clearly separated the three cases in 2011 from the four 2012-2013 outbreak cases (≥48 different genes). We showed that WGS is capable of discriminating L. monocytogenes SV1/2b clones not distinguishable by PFGE and fAFLP. The listeriosis outbreak described clearly underlines the potential of sequence-based typing methods to offer enhanced resolution and comparability of typing systems for public health applications. © 2014 The Authors Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.
Mihali, Troco K.; Carmichael, Wayne W.; Neilan, Brett A.
2011-01-01
Saxitoxin and its analogs cause the paralytic shellfish-poisoning syndrome, adversely affecting human health and coastal shellfish industries worldwide. Here we report the isolation, sequencing, annotation, and predicted pathway of the saxitoxin biosynthetic gene cluster in the cyanobacterium Lyngbya wollei. The gene cluster spans 36 kb and encodes enzymes for the biosynthesis and export of the toxins. The Lyngbya wollei saxitoxin gene cluster differs from previously identified saxitoxin clusters as it contains genes that are unique to this cluster, whereby the carbamoyltransferase is truncated and replaced by an acyltransferase, explaining the unique toxin profile presented by Lyngbya wollei. These findings will enable the creation of toxin probes, for water monitoring purposes, as well as proof-of-concept for the combinatorial biosynthesis of these natural occurring alkaloids for the production of novel, biologically active compounds. PMID:21347365
Cheng, Keding; Chui, Huixia; Domish, Larissa; Sloan, Angela; Hernandez, Drexler; McCorrister, Stuart; Robinson, Alyssia; Walker, Matthew; Peterson, Lorea A M; Majcher, Miles; Ratnam, Sam; Haldane, David J M; Bekal, Sadjia; Wylie, John; Chui, Linda; Tyler, Shaun; Xu, Bianli; Reimer, Aleisha; Nadon, Celine; Knox, J David; Wang, Gehua
2016-08-01
Mass spectrometry-based phenotypic H-antigen typing (MS-H) combined with whole-genome-sequencing-based genetic identification of H antigens, O antigens, and toxins (WGS-HOT) was used to type 60 clinical Escherichia coli isolates, 43 of which were previously identified as nonmotile, H type undetermined, or O rough by serotyping or having shown discordant MS-H and serotyping results. Whole-genome sequencing confirmed that MS-H was able to provide more accurate data regarding H antigen expression than serotyping. Further, enhanced and more confident O antigen identification resulted from gene cluster based typing in combination with conventional typing based on the gene pair comprising wzx and wzy and that comprising wzm and wzt The O antigen was identified in 94.6% of the isolates when the two genetic O typing approaches (gene pair and gene cluster) were used in conjunction, in comparison to 78.6% when the gene pair database was used alone. In addition, 98.2% of the isolates showed the existence of genes for various toxins and/or virulence factors, among which verotoxins (Shiga toxin 1 and/or Shiga toxin 2) were 100% concordant with conventional PCR based testing results. With more applications of mass spectrometry and whole-genome sequencing in clinical microbiology laboratories, this combined phenotypic and genetic typing platform (MS-H plus WGS-HOT) should be ideal for pathogenic E. coli typing. Copyright © 2016 Cheng et al.
The WRKY Transcription Factor Genes in Lotus japonicus.
Song, Hui; Wang, Pengfei; Nan, Zhibiao; Wang, Xingjun
2014-01-01
WRKY transcription factor genes play critical roles in plant growth and development, as well as stress responses. WRKY genes have been examined in various higher plants, but they have not been characterized in Lotus japonicus. The recent release of the L. japonicus whole genome sequence provides an opportunity for a genome wide analysis of WRKY genes in this species. In this study, we identified 61 WRKY genes in the L. japonicus genome. Based on the WRKY protein structure, L. japonicus WRKY (LjWRKY) genes can be classified into three groups (I-III). Investigations of gene copy number and gene clusters indicate that only one gene duplication event occurred on chromosome 4 and no clustered genes were detected on chromosomes 3 or 6. Researchers previously believed that group II and III WRKY domains were derived from the C-terminal WRKY domain of group I. Our results suggest that some WRKY genes in group II originated from the N-terminal domain of group I WRKY genes. Additional evidence to support this hypothesis was obtained by Medicago truncatula WRKY (MtWRKY) protein motif analysis. We found that LjWRKY and MtWRKY group III genes are under purifying selection, suggesting that WRKY genes will become increasingly structured and functionally conserved.
A cross-species bi-clustering approach to identifying conserved co-regulated genes.
Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo
2016-06-15
A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.
Roelen, Bernard A J; de Graaff, Wim; Forlani, Sylvie; Deschamps, Jacqueline
2002-11-01
The molecular mechanism underlying the 3' to 5' polarity of induction of mouse Hox genes is still elusive. While relief from a cluster-encompassing repression was shown to lead to all Hoxd genes being expressed like the 3'most of them, Hoxd1 (Kondo and Duboule, 1999), the molecular basis of initial activation of this 3'most gene, is not understood yet. We show that, already before primitive streak formation, prior to initial expression of the first Hox gene, a dramatic transcriptional stimulation of the 3'most genes, Hoxb1 and Hoxb2, is observed upon a short pulse of exogenous retinoic acid (RA), whereas it is not in the case for more 5', cluster-internal, RA-responsive Hoxb genes. In contrast, the RA-responding Hoxb1lacZ transgene that faithfully mimics the endogenous gene (Marshall et al., 1994) did not exhibit the sensitivity of Hoxb1 to precocious activation. We conclude that polarity in initial activation of Hoxb genes reflects a greater availability of 3'Hox genes for transcription, suggesting a pre-existing (susceptibility to) opening of the chromatin structure at the 3' extremity of the cluster. We discuss the data in the context of prevailing models involving differential chromatin opening in the directionality of clustered Hox gene transcription, and regarding the importance of the cluster context for correct timing of initial Hox gene expression.Interestingly, Cdx1 manifested the same early transcriptional availability as Hoxb1. Copyright 2002 Elsevier Science Ireland Ltd.
Coral comparative genomics reveal expanded Hox cluster in the cnidarian-bilaterian ancestor.
DuBuc, Timothy Q; Ryan, Joseph F; Shinzato, Chuya; Satoh, Nori; Martindale, Mark Q
2012-12-01
The key developmental role of the Hox cluster of genes was established prior to the last common ancestor of protostomes and deuterostomes and the subsequent evolution of this cluster has played a major role in the morphological diversity exhibited in extant bilaterians. Despite 20 years of research into cnidarian Hox genes, the nature of the cnidarian-bilaterian ancestral Hox cluster remains unclear. In an attempt to further elucidate this critical phylogenetic node, we have characterized the Hox cluster of the recently sequenced Acropora digitifera genome. The A. digitifera genome contains two anterior Hox genes (PG1 and PG2) linked to an Eve homeobox gene and an Anthox1A gene, which is thought to be either a posterior or posterior/central Hox gene. These data show that the Hox cluster of the cnidarian-bilaterian ancestor was more extensive than previously thought. The results are congruent with the existence of an ancient set of constraints on the Hox cluster and reinforce the importance of incorporating a wide range of animal species to reconstruct critical ancestral nodes.
Broad spectrum antibiotic compounds and use thereof
Koglin, Alexander; Strieker, Matthias
2016-07-05
The discovery of a non-ribosomal peptide synthetase (NRPS) gene cluster in the genome of Clostridium thermocellum that produces a secondary metabolite that is assembled outside of the host membrane is described. Also described is the identification of homologous NRPS gene clusters from several additional microorganisms. The secondary metabolites produced by the NRPS gene clusters exhibit broad spectrum antibiotic activity. Thus, antibiotic compounds produced by the NRPS gene clusters, and analogs thereof, their use for inhibiting bacterial growth, and methods of making the antibiotic compounds are described.
Transcriptome and gene expression analysis during flower blooming in Rosa chinensis 'Pallida'.
Yan, Huijun; Zhang, Hao; Chen, Min; Jian, Hongying; Baudino, Sylvie; Caissard, Jean-Claude; Bendahmane, Mohammed; Li, Shubin; Zhang, Ting; Zhou, Ningning; Qiu, Xianqin; Wang, Qigang; Tang, Kaixue
2014-04-25
Rosa chinensis 'Pallida' (Rosa L.) is one of the most important ancient rose cultivars originating from China. It contributed the 'tea scent' trait to modern roses. However, little information is available on the gene regulatory networks involved in scent biosynthesis and metabolism in Rosa. In this study, the transcriptome of R. chinensis 'Pallida' petals at different developmental stages, from flower buds to senescent flowers, was investigated using Illumina sequencing technology. De novo assembly generated 89,614 clusters with an average length of 428bp. Based on sequence similarity search with known proteins, 62.9% of total clusters were annotated. Out of these annotated transcripts, 25,705 and 37,159 sequences were assigned to gene ontology and clusters of orthologous groups, respectively. The dataset provides information on transcripts putatively associated with known scent metabolic pathways. Digital gene expression (DGE) was obtained using RNA samples from flower bud, open flower and senescent flower stages. Comparative DGE and quantitative real time PCR permitted the identification of five transcripts encoding proteins putatively associated with scent biosynthesis in roses. The study provides a foundation for scent-related gene discovery in roses. Copyright © 2014. Published by Elsevier B.V.
Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong
2013-10-18
Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.
Analysis of Chromobacterium sp. natural isolates from different Brazilian ecosystems
Lima-Bittencourt, Cláudia I; Astolfi-Filho, Spartaco; Chartone-Souza, Edmar; Santos, Fabrício R; Nascimento, Andréa MA
2007-01-01
Background Chromobacterium violaceum is a free-living bacterium able to survive under diverse environmental conditions. In this study we evaluate the genetic and physiological diversity of Chromobacterium sp. isolates from three Brazilian ecosystems: Brazilian Savannah (Cerrado), Atlantic Rain Forest and Amazon Rain Forest. We have analyzed the diversity with molecular approaches (16S rRNA gene sequences and amplified ribosomal DNA restriction analysis) and phenotypic surveys of antibiotic resistance and biochemistry profiles. Results In general, the clusters based on physiological profiles included isolates from two or more geographical locations indicating that they are not restricted to a single ecosystem. The isolates from Brazilian Savannah presented greater physiologic diversity and their biochemical profile was the most variable of all groupings. The isolates recovered from Amazon and Atlantic Rain Forests presented the most similar biochemical characteristics to the Chromobacterium violaceum ATCC 12472 strain. Clusters based on biochemical profiles were congruent with clusters obtained by the 16S rRNA gene tree. According to the phylogenetic analyses, isolates from the Amazon Rain Forest and Savannah displayed a closer relationship to the Chromobacterium violaceum ATCC 12472. Furthermore, 16S rRNA gene tree revealed a good correlation between phylogenetic clustering and geographic origin. Conclusion The physiological analyses clearly demonstrate the high biochemical versatility found in the C. violaceum genome and molecular methods allowed to detect the intra and inter-population diversity of isolates from three Brazilian ecosystems. PMID:17584942
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.
Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa
2011-05-26
Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
Siegel, Nicol; Hoegg, Simone; Salzburger, Walter; Braasch, Ingo; Meyer, Axel
2007-01-01
Background The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. Results We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. Conclusion There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular – but possibly clusters of genes more generally – might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters. PMID:17822543
Clustering Algorithms: Their Application to Gene Expression Data
Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel
2016-01-01
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
Shahdoust, Maryam; Hajizadeh, Ebrahim; Mozdarani, Hossein; Chehrei, Ali
2013-01-01
Cigarette smoking is the major risk factor for development of lung cancer. Identification of effects of tobacco on airway gene expression may provide insight into the causes. This research aimed to compare gene expression of large airway epithelium cells in normal smokers (n=13) and non-smokers (n=9) in order to find genes which discriminate the two groups and assess cigarette smoking effects on large airway epithelium cells. Genes discriminating smokers from non-smokers were identified by applying a neural network clustering method, growing self-organizing maps (GSOM), to microarray data according to class discrimination scores. An index was computed based on differentiation between each mean of gene expression in the two groups. This clustering approach provided the possibility of comparing thousands of genes simultaneously. The applied approach compared the mean of 7,129 genes in smokers and non-smokers simultaneously and classified the genes of large airway epithelium cells which had differently expressed in smokers comparing with non-smokers. Seven genes were identified which had the highest different expression in smokers compared with the non-smokers group: NQO1, H19, ALDH3A1, AKR1C1, ABHD2, GPX2 and ADH7. Most (NQO1, ALDH3A1, AKR1C1, H19 and GPX2) are known to be clinically notable in lung cancer studies. Furthermore, statistical discriminate analysis showed that these genes could classify samples in smokers and non-smokers correctly with 100% accuracy. With the performed GSOM map, other nodes with high average discriminate scores included genes with alterations strongly related to the lung cancer such as AKR1C3, CYP1B1, UCHL1 and AKR1B10. This clustering by comparing expression of thousands of genes at the same time revealed alteration in normal smokers. Most of the identified genes were strongly relevant to lung cancer in the existing literature. The genes may be utilized to identify smokers with increased risk for lung cancer. A large sample study is now recommended to determine relations between the genes ABHD2 and ADH7 and smoking.
2014-01-01
Background Lateral Gene Transfer (LGT) has recently gained recognition as an important contributor to some eukaryote proteomes, but the mechanisms of acquisition and fixation in eukaryotic genomes are still uncertain. A previously defined norm for LGTs in microbial eukaryotes states that the majority are genes involved in metabolism, the LGTs are typically localized one by one, surrounded by vertically inherited genes on the chromosome, and phylogenetics shows that a broad collection of bacterial lineages have contributed to the transferome. Results A unique 34 kbp long fragment with 27 clustered genes (TvLF) of prokaryote origin was identified in the sequenced genome of the protozoan parasite Trichomonas vaginalis. Using a PCR based approach we confirmed the presence of the orthologous fragment in four additional T. vaginalis strains. Detailed sequence analyses unambiguously suggest that TvLF is the result of one single, recent LGT event. The proposed donor is a close relative to the firmicute bacterium Peptoniphilus harei. High nucleotide sequence similarity between T. vaginalis strains, as well as to P. harei, and the absence of homologs in other Trichomonas species, suggests that the transfer event took place after the radiation of the genus Trichomonas. Some genes have undergone pseudogenization and degradation, indicating that they may not be retained in the future. Functional annotations reveal that genes involved in informational processes are particularly prone to degradation. Conclusions We conclude that, although the majority of eukaryote LGTs are single gene occurrences, they may be acquired in clusters of several genes that are subsequently cleansed of evolutionarily less advantageous genes. PMID:24898731
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
Genomic analyses of bacterial porin-cytochrome gene clusters
Shi, Liang; Fredrickson, James K.; Zachara, John M.
2014-11-26
In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteriamore » from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III) and Mn(IV) oxides.« less
Evolutionary conservation of sequence and secondary structures inCRISPR repeats
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip
Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeatsmore » identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.« less
Ochirkhuu, Nyamsuren; Konnai, Satoru; Odbileg, Raadan; Murata, Shiro; Ohashi, Kazuhiko
2017-08-01
Anaplasma species are obligate intracellular rickettsial pathogens that cause great economic loss to the animal industry. Few studies on Anaplasma infections in Mongolian livestock have been conducted. This study examined the prevalence of Anaplasma marginale, Anaplasma ovis, Anaplasma phagocytophilum, and Anaplasma bovis by polymerase chain reaction assay in 928 blood samples collected from native cattle and dairy cattle (Bos taurus), yaks (Bos grunniens), sheep (Ovis aries), and goats (Capra aegagrus hircus) in four provinces of Ulaanbaatar city in Mongolia. We genetically characterized positive samples through sequencing analysis based on the heat-shock protein groEL, major surface protein 4 (msp4), and 16S rRNA genes. Only A. ovis was detected in Mongolian livestock (cattle, yaks, sheep, and goats), with 413 animals (44.5%) positive for groEL and 308 animals (33.2%) positive for msp4 genes. In the phylogenetic tree, we separated A. ovis sequences into two distinct clusters based on the groEL gene. One cluster comprised sequences derived mainly from sheep and goats, which was similar to that in A. ovis isolates from other countries. The other divergent cluster comprised sequences derived from cattle and yaks and appeared to be newly branched from that in previously published single isolates in Mongolian cattle. In addition, the msp4 gene of A. ovis using same and different samples with groEL gene of the pathogen demonstrated that all sequences derived from all animal species, except for three sequences derived from cattle and yak, were clustered together, and were identical or similar to those in isolates from other countries. We used 16S rRNA gene sequences to investigate the genetically divergent A. ovis and identified high homology of 99.3-100%. However, the sequences derived from cattle did not match those derived from sheep and goats. The results of this study on the prevalence and molecular characterization of A. ovis in Mongolian livestock can facilitate the control of infectious diseases in livestock.
Uhong Lü, Yuhong; Liu, Xiaoli; Wang, Miao; Li, Yuanyuan; Liu, Ning; Bao, Yuxin; Liu, Minghao; Li, Xiaoqian; Wang, Yinyin; Qian, Shenyan; Yue, Changwu; Huang, Ying
2016-09-01
In order to obtain the natural products synthesized by the three putative xiamycin biosynthesis gene clusters which were predicted via antiSMASH during the genome mining of marine Streptomyces sp. FXJ 7.388, Streptomyces sp. FXJ 8.012, and Streptomyces olivaceus FXJ 7.023. Sixteen genes involved in xiamycin assembly, modification, and regulation with higher identity than the newest reported xiamycin biosynthetic gene cluster from marine Streptomyces sp. SCSIO 02999, Streptomyces sp. HKI0576, and Streptomyces sp. FXJ 7.388 were discovered via gene cluster comparative analysis. A ribosome engineering strategy was adopted to activate such cryptic gene clusters with different final concentrations antibiotics that act on the ribosome, and two indolosesquiterpenes were isolated from idlethaldose streptomycin-resistant Streptomyces sp. FXJ 7.388 strains. However, no such product was detected in Streptomyces sp. FXJ 8.012 and Streptomyces olivaceus FXJ 7.023 under the same treatment. This result suggested that these genes might hold the least gene content for xiamycin biosynthesis.
Genes encoding cuticular proteins are components of the Nimrod gene cluster in Drosophila.
Cinege, Gyöngyi; Zsámboki, János; Vidal-Quadras, Maite; Uv, Anne; Csordás, Gábor; Honti, Viktor; Gábor, Erika; Hegedűs, Zoltán; Varga, Gergely I B; Kovács, Attila L; Juhász, Gábor; Williams, Michael J; Andó, István; Kurucz, Éva
2017-08-01
The Nimrod gene cluster, located on the second chromosome of Drosophila melanogaster, is the largest synthenic unit of the Drosophila genome. Nimrod genes show blood cell specific expression and code for phagocytosis receptors that play a major role in fruit fly innate immune functions. We previously identified three homologous genes (vajk-1, vajk-2 and vajk-3) located within the Nimrod cluster, which are unrelated to the Nimrod genes, but are homologous to a fourth gene (vajk-4) located outside the cluster. Here we show that, unlike the Nimrod candidates, the Vajk proteins are expressed in cuticular structures of the late embryo and the late pupa, indicating that they contribute to cuticular barrier functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
2013-10-01
correct group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...centering of log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met...A) The 108 differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with
Cluster and propensity based approximation of a network
2013-01-01
Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424
Klett, Hagen; Fuellgraf, Hannah; Levit-Zerdoun, Ella; Hussung, Saskia; Kowar, Silke; Küsters, Simon; Bronsert, Peter; Werner, Martin; Wittel, Uwe; Fritsch, Ralph; Busch, Hauke; Boerries, Melanie
2018-01-01
Late diagnosis and systemic dissemination essentially contribute to the invariably poor prognosis of pancreatic ductal adenocarcinoma (PDAC). Therefore, the development of diagnostic biomarkers for PDAC are urgently needed to improve patient stratification and outcome in the clinic. By studying the transcriptomes of independent PDAC patient cohorts of tumor and non-tumor tissues, we identified 81 robustly regulated genes, through a novel, generally applicable meta-analysis. Using consensus clustering on co-expression values revealed four distinct clusters with genes originating from exocrine/endocrine pancreas, stromal and tumor cells. Three clusters were strongly associated with survival of PDAC patients based on TCGA database underlining the prognostic potential of the identified genes. With the added information of impact of survival and the robustness within the meta-analysis, we extracted a 17-gene subset for further validation. We show that it did not only discriminate PDAC from non-tumor tissue and stroma in fresh-frozen as well as formalin-fixed paraffin embedded samples, but also detected pancreatic precursor lesions and singled out pancreatitis samples. Moreover, the classifier discriminated PDAC from other cancers in the TCGA database. In addition, we experimentally validated the classifier in PDAC patients on transcript level using qPCR and exemplify the usage on protein level for three proteins (AHNAK2, LAMC2, TFF1) using immunohistochemistry and for two secreted proteins (TFF1, SERPINB5) using ELISA-based protein detection in blood-plasma. In conclusion, we present a novel robust diagnostic and prognostic gene signature for PDAC with future potential applicability in the clinic.
Klett, Hagen; Fuellgraf, Hannah; Levit-Zerdoun, Ella; Hussung, Saskia; Kowar, Silke; Küsters, Simon; Bronsert, Peter; Werner, Martin; Wittel, Uwe; Fritsch, Ralph; Busch, Hauke; Boerries, Melanie
2018-01-01
Late diagnosis and systemic dissemination essentially contribute to the invariably poor prognosis of pancreatic ductal adenocarcinoma (PDAC). Therefore, the development of diagnostic biomarkers for PDAC are urgently needed to improve patient stratification and outcome in the clinic. By studying the transcriptomes of independent PDAC patient cohorts of tumor and non-tumor tissues, we identified 81 robustly regulated genes, through a novel, generally applicable meta-analysis. Using consensus clustering on co-expression values revealed four distinct clusters with genes originating from exocrine/endocrine pancreas, stromal and tumor cells. Three clusters were strongly associated with survival of PDAC patients based on TCGA database underlining the prognostic potential of the identified genes. With the added information of impact of survival and the robustness within the meta-analysis, we extracted a 17-gene subset for further validation. We show that it did not only discriminate PDAC from non-tumor tissue and stroma in fresh-frozen as well as formalin-fixed paraffin embedded samples, but also detected pancreatic precursor lesions and singled out pancreatitis samples. Moreover, the classifier discriminated PDAC from other cancers in the TCGA database. In addition, we experimentally validated the classifier in PDAC patients on transcript level using qPCR and exemplify the usage on protein level for three proteins (AHNAK2, LAMC2, TFF1) using immunohistochemistry and for two secreted proteins (TFF1, SERPINB5) using ELISA-based protein detection in blood-plasma. In conclusion, we present a novel robust diagnostic and prognostic gene signature for PDAC with future potential applicability in the clinic. PMID:29675033
Shark IgW C region diversification through RNA processing and isotype switching.
Zhang, Cecilia; Du Pasquier, Louis; Hsu, Ellen
2013-09-15
Sharks and skates represent the earliest vertebrates with an adaptive immune system based on lymphocyte Ag receptors generated by V(D)J recombination. Shark B cells express two classical Igs, IgM and IgW, encoded by an early, alternative gene organization consisting of numerous autonomous miniloci, where the individual gene cluster carries a few rearranging gene segments and one C region, μ or ω. We have characterized eight distinct Ig miniloci encoding the nurse shark ω H chain. Each cluster consists of VH, D, and JH segments and six to eight C domain exons. Two interspersed secretory exons, in addition to the 3'-most C exon with tailpiece, provide the gene cluster with the ability to generate at least six secreted isoforms that differ as to polypeptide length and C domain combination. All clusters appear to be functional, as judged by the capability for rearrangement and absence of defects in the deduced amino acid sequence. We previously showed that IgW VDJ can perform isotype switching to μ C regions; in this study, we found that switching also occurs between ω clusters. Thus, C region diversification for any IgW VDJ can take place at the DNA level by switching to other ω or μ C regions, as well as by RNA processing to generate different C isoforms. The wide array of pathogens recognized by Abs requires different disposal pathways, and our findings demonstrate complex and unique pathways for C effector function diversity that evolved independently in cartilaginous fishes.
Phylum-wide comparative genomics unravel the diversity of secondary metabolism in Cyanobacteria
Calteau, Alexandra; Fewer, David P.; Latifi, Amel; ...
2014-11-18
Cyanobacteria are an ancient lineage of photosynthetic bacteria from which hundreds of natural products have been described, including many notorious toxins but also potent natural products of interest to the pharmaceutical and biotechnological industries. Many of these compounds are the products of non-ribosomal peptide synthetase (NRPS) or polyketide synthase (PKS) pathways. However, current understanding of the diversification of these pathways is largely based on the chemical structure of the bioactive compounds, while the evolutionary forces driving their remarkable chemical diversity are poorly understood. We carried out a phylum-wide investigation of genetic diversification of the cyanobacterial NRPS and PKS pathways formore » the production of bioactive compounds. 452 NRPS and PKS gene clusters were identified from 89 cyanobacterial genomes, revealing a clear burst in late-branching lineages. Our genomic analysis further grouped the clusters into 286 highly diversified cluster families (CF) of pathways. Some CFs appeared vertically inherited, while others presented a more complex evolutionary history. Only a few horizontal gene transfers were evidenced amongst strongly conserved CFs in the phylum, while several others have undergone drastic gene shuffling events, which could result in the observed diversification of the pathways. In addition to toxin production, several NRPS and PKS gene clusters are devoted to important cellular processes of these bacteria such as nitrogen fixation and iron uptake. The majority of the biosynthetic clusters identified here have unknown end products, highlighting the power of genome mining for the discovery of new natural products.« less
Phylum-wide comparative genomics unravel the diversity of secondary metabolism in Cyanobacteria
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calteau, Alexandra; Fewer, David P.; Latifi, Amel
Cyanobacteria are an ancient lineage of photosynthetic bacteria from which hundreds of natural products have been described, including many notorious toxins but also potent natural products of interest to the pharmaceutical and biotechnological industries. Many of these compounds are the products of non-ribosomal peptide synthetase (NRPS) or polyketide synthase (PKS) pathways. However, current understanding of the diversification of these pathways is largely based on the chemical structure of the bioactive compounds, while the evolutionary forces driving their remarkable chemical diversity are poorly understood. We carried out a phylum-wide investigation of genetic diversification of the cyanobacterial NRPS and PKS pathways formore » the production of bioactive compounds. 452 NRPS and PKS gene clusters were identified from 89 cyanobacterial genomes, revealing a clear burst in late-branching lineages. Our genomic analysis further grouped the clusters into 286 highly diversified cluster families (CF) of pathways. Some CFs appeared vertically inherited, while others presented a more complex evolutionary history. Only a few horizontal gene transfers were evidenced amongst strongly conserved CFs in the phylum, while several others have undergone drastic gene shuffling events, which could result in the observed diversification of the pathways. In addition to toxin production, several NRPS and PKS gene clusters are devoted to important cellular processes of these bacteria such as nitrogen fixation and iron uptake. The majority of the biosynthetic clusters identified here have unknown end products, highlighting the power of genome mining for the discovery of new natural products.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy
Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less
High-throughput platform for the discovery of elicitors of silent bacterial gene clusters.
Seyedsayamdost, Mohammad R
2014-05-20
Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as "cryptic" or "silent" to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria.
Schyth, Brian Dall; Bela-ong, Dennis Berbulla; Jalali, Seyed Amir Hossein; Kristensen, Lasse Bøgelund Juel; Einer-Jensen, Katja; Pedersen, Finn Skou; Lorenzen, Niels
2015-01-01
MicroRNAs (miRNAs) are ~22 base pair-long non-coding RNAs which regulate gene expression in the cytoplasm of eukaryotic cells by binding to specific target regions in mRNAs to mediate transcriptional blocking or mRNA cleavage. Through their fundamental roles in cellular pathways, gene regulation mediated by miRNAs has been shown to be involved in almost all biological phenomena, including development, metabolism, cell cycle, tumor formation, and host-pathogen interactions. To address the latter in a primitive vertebrate host, we here used an array platform to analyze the miRNA response in rainbow trout (Oncorhynchus mykiss) following inoculation with the virulent fish rhabdovirus Viral hemorrhagic septicaemia virus. Two clustered miRNAs, miR-462 and miR-731 (herein referred to as miR-462 cluster), described only in teleost fishes, were found to be strongly upregulated, indicating their involvement in fish-virus interactions. We searched for homologues of the two teleost miRNAs in other vertebrate species and investigated whether findings related to ours have been reported for these homologues. Gene synteny analysis along with gene sequence conservation suggested that the teleost fish miR-462 and miR-731 had evolved from the ancestral miR-191 and miR-425 (herein called miR-191 cluster), respectively. Whereas the miR-462 cluster locus is found between two protein-coding genes (intergenic) in teleost fish genomes, the miR-191 cluster locus is found within an intron of a protein-coding gene (intragenic) in the human genome. Interferon (IFN)-inducible and immune-related promoter elements found upstream of the teleost miR-462 cluster locus suggested roles in immune responses to viral pathogens in fish, while in humans, the miR-191 cluster functionally associated with cell cycle regulation. Stimulation of fish cell cultures with the IFN inducer poly I:C accordingly upregulated the expression of miR-462 and miR-731, while no stimulatory effect on miR-191 and miR-425 expression was observed in human cell lines. Despite high sequence conservation, evolution has thus resulted in different regulation and presumably also different functional roles of these orthologous miRNA clusters in different vertebrate lineages. PMID:26207374
A conserved gene cluster as a putative functional unit in insect innate immunity.
Somogyi, Kálmán; Sipos, Botond; Pénzes, Zsolt; Andó, István
2010-11-05
The Nimrod gene superfamily is an important component of the innate immune response. The majority of its member genes are located in close proximity within the Drosophila melanogaster genome and they lie in a larger conserved cluster ("Nimrod cluster"), made up of non-related groups (families, superfamilies) of genes. This cluster has been a part of the Arthropod genomes for about 300-350 million years. The available data suggest that the Nimrod cluster is a functional module of the insect innate immune response. Copyright © 2010 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Zhang, Xiujun; Parry, Ronald J.
2007-01-01
The pyrrolomycins are a family of polyketide antibiotics, some of which contain a nitro group. To gain insight into the nitration mechanism associated with the formation of these antibiotics, the pyrrolomycin biosynthetic gene cluster from Actinosporangium vitaminophilum was cloned. Sequencing of ca. 56 kb of A. vitaminophilum DNA revealed 35 open reading frames (ORFs). Sequence analysis revealed a clear relationship between some of these ORFs and the biosynthetic gene cluster for pyoluteorin, a structurally related antibiotic. Since a gene transfer system could not be devised for A. vitaminophilum, additional proof for the identity of the cloned gene cluster was sought by cloning the pyrrolomycin gene cluster from Streptomyces sp. strain UC 11065, a transformable pyrrolomycin producer. Sequencing of ca. 26 kb of UC 11065 DNA revealed the presence of 17 ORFs, 15 of which exhibit strong similarity to ORFs in the A. vitaminophilum cluster as well as a nearly identical organization. Single-crossover disruption of two genes in the UC 11065 cluster abolished pyrrolomycin production in both cases. These results confirm that the genetic locus cloned from UC 11065 is essential for pyrrolomycin production, and they also confirm that the highly similar locus in A. vitaminophilum encodes pyrrolomycin biosynthetic genes. Sequence analysis revealed that both clusters contain genes encoding the two components of an assimilatory nitrate reductase. This finding suggests that nitrite is required for the formation of the nitrated pyrrolomycins. However, sequence analysis did not provide additional insights into the nitration process, suggesting the operation of a novel nitration mechanism. PMID:17158935
Evolution of Chemical Diversity in Echinocandin Lipopeptide Antifungal Metabolites
Yue, Qun; Chen, Li; Zhang, Xiaoling; Li, Kuan; Sun, Jingzu; Liu, Xingzhong
2015-01-01
The echinocandins are a class of antifungal drugs that includes caspofungin, micafungin, and anidulafungin. Gene clusters encoding most of the structural complexity of the echinocandins provided a framework for hypotheses about the evolutionary history and chemical logic of echinocandin biosynthesis. Gene orthologs among echinocandin-producing fungi were identified. Pathway genes, including the nonribosomal peptide synthetases (NRPSs), were analyzed phylogenetically to address the hypothesis that these pathways represent descent from a common ancestor. The clusters share cooperative gene contents and linkages among the different strains. Individual pathway genes analyzed in the context of similar genes formed unique echinocandin-exclusive phylogenetic lineages. The echinocandin NRPSs, along with the NRPS from the inp gene cluster in Aspergillus nidulans and its orthologs, comprise a novel lineage among fungal NRPSs. NRPS adenylation domains from different species exhibited a one-to-one correspondence between modules and amino acid specificity that is consistent with models of tandem duplication and subfunctionalization. Pathway gene trees and Ascomycota phylogenies are congruent and consistent with the hypothesis that the echinocandin gene clusters have a common origin. The disjunct Eurotiomycete-Leotiomycete distribution appears to be consistent with a scenario of vertical descent accompanied by incomplete lineage sorting and loss of the clusters from most lineages of the Ascomycota. We present evidence for a single evolutionary origin of the echinocandin family of gene clusters and a progression of structural diversification in two fungal classes that diverged approximately 290 to 390 million years ago. Lineage-specific gene cluster evolution driven by selection of new chemotypes contributed to diversification of the molecular functionalities. PMID:26024901
Clusters of antibiotic resistance genes enriched together stay together in swine agriculture
Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; ...
2016-04-12
Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if they are genetically linked. No links to bacterial membership were observed for these clusters of resistance genes. These findings urge deeper understanding of colocalization of resistance genes and mobile genetic elements in resistance islands and their distribution throughout antibiotic-exposed microbiomes. In addition, as governments seek to combat the rise in antibiotic resistance, a balance is sought between ensuring proper animal health and welfare and preserving medically important antibiotics for therapeutic use. Metagenomic and genomic monitoring will be critical to determine if resistance genes can be reduced in animal microbiomes, or if these gene clusters will continue to be coselected by antibiotics not deemed medically important for human health but used for growth promotion or by medically important antibiotics used therapeutically.« less
Clusters of antibiotic resistance genes enriched together stay together in swine agriculture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong
Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if they are genetically linked. No links to bacterial membership were observed for these clusters of resistance genes. These findings urge deeper understanding of colocalization of resistance genes and mobile genetic elements in resistance islands and their distribution throughout antibiotic-exposed microbiomes. In addition, as governments seek to combat the rise in antibiotic resistance, a balance is sought between ensuring proper animal health and welfare and preserving medically important antibiotics for therapeutic use. Metagenomic and genomic monitoring will be critical to determine if resistance genes can be reduced in animal microbiomes, or if these gene clusters will continue to be coselected by antibiotics not deemed medically important for human health but used for growth promotion or by medically important antibiotics used therapeutically.« less
Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.
Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M
2016-04-12
Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if they are genetically linked. No links to bacterial membership were observed for these clusters of resistance genes. These findings urge deeper understanding of colocalization of resistance genes and mobile genetic elements in resistance islands and their distribution throughout antibiotic-exposed microbiomes. As governments seek to combat the rise in antibiotic resistance, a balance is sought between ensuring proper animal health and welfare and preserving medically important antibiotics for therapeutic use. Metagenomic and genomic monitoring will be critical to determine if resistance genes can be reduced in animal microbiomes, or if these gene clusters will continue to be coselected by antibiotics not deemed medically important for human health but used for growth promotion or by medically important antibiotics used therapeutically. Copyright © 2016 Johnson et al.
Yamaguchi-Kabata, Yumi; Nakazono, Kazuyuki; Takahashi, Atsushi; Saito, Susumu; Hosono, Naoya; Kubo, Michiaki; Nakamura, Yusuke; Kamatani, Naoyuki
2008-10-01
Because population stratification can cause spurious associations in case-control studies, understanding the population structure is important. Here, we examined Japanese population structure by "Eigenanalysis," using the genotypes for 140,387 SNPs in 7003 Japanese individuals, along with 60 European, 60 African, and 90 East-Asian individuals, in the HapMap project. Most Japanese individuals fell into two main clusters, Hondo and Ryukyu; the Hondo cluster includes most of the individuals from the main islands in Japan, and the Ryukyu cluster includes most of the individuals from Okinawa. The SNPs with the greatest frequency differences between the Hondo and Ryukyu clusters were found in the HLA region in chromosome 6. The nonsynonymous SNPs with the greatest frequency differences between the Hondo and Ryukyu clusters were the Val/Ala polymorphism (rs3827760) in the EDAR gene, associated with hair thickness, and the Gly/Ala polymorphism (rs17822931) in the ABCC11 gene, associated with ear-wax type. Genetic differentiation was observed, even among different regions in Honshu Island, the largest island of Japan. Simulation studies showed that the inclusion of different proportions of individuals from different regions of Japan in case and control groups can lead to an inflated rate of false-positive results when the sample sizes are large.
Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun
2017-12-01
Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Gene Discovery in Bladder Cancer Progression using cDNA Microarrays
Sanchez-Carbayo, Marta; Socci, Nicholas D.; Lozano, Juan Jose; Li, Wentian; Charytonowicz, Elizabeth; Belbin, Thomas J.; Prystowsky, Michael B.; Ortiz, Angel R.; Childs, Geoffrey; Cordon-Cardo, Carlos
2003-01-01
To identify gene expression changes along progression of bladder cancer, we compared the expression profiles of early-stage and advanced bladder tumors using cDNA microarrays containing 17,842 known genes and expressed sequence tags. The application of bootstrapping techniques to hierarchical clustering segregated early-stage and invasive transitional carcinomas into two main clusters. Multidimensional analysis confirmed these clusters and more importantly, it separated carcinoma in situ from papillary superficial lesions and subgroups within early-stage and invasive tumors displaying different overall survival. Additionally, it recognized early-stage tumors showing gene profiles similar to invasive disease. Different techniques including standard t-test, single-gene logistic regression, and support vector machine algorithms were applied to identify relevant genes involved in bladder cancer progression. Cytokeratin 20, neuropilin-2, p21, and p33ING1 were selected among the top ranked molecular targets differentially expressed and validated by immunohistochemistry using tissue microarrays (n = 173). Their expression patterns were significantly associated with pathological stage, tumor grade, and altered retinoblastoma (RB) expression. Moreover, p33ING1 expression levels were significantly associated with overall survival. Analysis of the annotation of the most significant genes revealed the relevance of critical genes and pathways during bladder cancer progression, including the overexpression of oncogenic genes such as DEK in superficial tumors or immune response genes such as Cd86 antigen in invasive disease. Gene profiling successfully classified bladder tumors based on their progression and clinical outcome. The present study has identified molecular biomarkers of potential clinical significance and critical molecular targets associated with bladder cancer progression. PMID:12875971
The structure of a gene co-expression network reveals biological functions underlying eQTLs.
Villa-Vialaneix, Nathalie; Liaubet, Laurence; Laurent, Thibault; Cherel, Pierre; Gamot, Adrien; SanCristobal, Magali
2013-01-01
What are the commonalities between genes, whose expression level is partially controlled by eQTL, especially with regard to biological functions? Moreover, how are these genes related to a phenotype of interest? These issues are particularly difficult to address when the genome annotation is incomplete, as is the case for mammalian species. Moreover, the direct link between gene expression and a phenotype of interest may be weak, and thus difficult to handle. In this framework, the use of a co-expression network has proven useful: it is a robust approach for modeling a complex system of genetic regulations, and to infer knowledge for yet unknown genes. In this article, a case study was conducted with a mammalian species. It showed that the use of a co-expression network based on partial correlation, combined with a relevant clustering of nodes, leads to an enrichment of biological functions of around 83%. Moreover, the use of a spatial statistics approach allowed us to superimpose additional information related to a phenotype; this lead to highlighting specific genes or gene clusters that are related to the network structure and the phenotype. Three main results are worth noting: first, key genes were highlighted as a potential focus for forthcoming biological experiments; second, a set of biological functions, which support a list of genes under partial eQTL control, was set up by an overview of the global structure of the gene expression network; third, pH was found correlated with gene clusters, and then with related biological functions, as a result of a spatial analysis of the network topology.
Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin
2018-01-01
Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405
Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...
2018-01-05
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
ORGANIZATION OF THE nif GENES OF THE NONHETEROCYSTOUS CYANOBACTERIUM TRICHODESMIUM SP. IMS101.
Dominic, Benny; Zani, Sabino; Chen, Yi-Bu; Mellon, Mark T; Zehr, Jonathan P
2000-08-26
An approximately 16-kb fragment of the Trichodesmium sp. IMS101 (a nonheterocystous filamentous cyanobacterium) "conventional"nif gene cluster was cloned and sequenced. The gene organization of the Trichodesmium and Anabaena variabilis vegetative (nif 2) nitrogenase gene clusters spanning the region from nif B to nif W are similar except for the absence of two open reading frames (ORF3 and ORF1) in Trichodesmium. The Trichodesmium nif EN genes encode a fused Nif EN polypeptide that does not appear to be processed into individual Nif E and Nif N polypeptides. Fused nif EN genes were previously found in the A. variabilis nif 2 genes, but we have found that fused nif EN genes are widespread in the nonheterocystous cyanobacteria. Although the gene organization of the nonheterocystous filamentous Trichodesmium nif gene cluster is very similar to that of the A. variabilis vegetative nif 2 gene cluster, phylogenetic analysis of nif sequences do not support close relatedness of Trichodesmium and A. variabilis vegetative (nif 2) nitrogenase genes.
Integrating Gene Transcription-Based Biomarkers to Understand Desert Tortoise and Ecosystem Health.
Bowen, Lizabeth; Miles, A Keith; Drake, K Kristina; Waters, Shannon C; Esque, Todd C; Nussear, Kenneth E
2015-09-01
Tortoises are susceptible to a wide variety of environmental stressors, and the influence of human disturbances on health and survival of tortoises is difficult to detect. As an addition to current diagnostic methods for desert tortoises, we have developed the first leukocyte gene transcription biomarker panel for the desert tortoise (Gopherus agassizii), enhancing the ability to identify specific environmental conditions potentially linked to declining animal health. Blood leukocyte transcript profiles have the potential to identify physiologically stressed animals in lieu of clinical signs. For desert tortoises, the gene transcript profile included a combination of immune or detoxification response genes with the potential to be modified by biological or physical injury and consequently provide information on the type and magnitude of stressors present in the animal's habitat. Blood from 64 wild adult tortoises at three sites in Clark County, NV, and San Bernardino, CA, and from 19 captive tortoises in Clark County, NV, was collected and evaluated for genes indicative of physiological status. Statistical analysis using a priori groupings indicated significant differences among groups for several genes, while multidimensional scaling and cluster analyses of transcription C T values indicated strong differentiation of a large cluster and multiple outlying individual tortoises or small clusters in multidimensional space. These analyses highlight the effectiveness of the gene panel at detecting environmental perturbations as well as providing guidance in determining the health of the desert tortoise.
Integrating gene transcription-based biomarkers to understand desert tortoise and ecosystem health
Bowen, Lizabeth; Miles, A. Keith; Drake, Karla K.; Waters, Shannon C.; Esque, Todd C.; Nussear, Kenneth E.
2015-01-01
Tortoises are susceptible to a wide variety of environmental stressors, and the influence of human disturbances on health and survival of tortoises is difficult to detect. As an addition to current diagnostic methods for desert tortoises, we have developed the first leukocyte gene transcription biomarker panel for the desert tortoise (Gopherus agassizii), enhancing the ability to identify specific environmental conditions potentially linked to declining animal health. Blood leukocyte transcript profiles have the potential to identify physiologically stressed animals in lieu of clinical signs. For desert tortoises, the gene transcript profile included a combination of immune or detoxification response genes with the potential to be modified by biological or physical injury and consequently provide information on the type and magnitude of stressors present in the animal’s habitat. Blood from 64 wild adult tortoises at three sites in Clark County, NV, and San Bernardino, CA, and from 19 captive tortoises in Clark County, NV, was collected and evaluated for genes indicative of physiological status. Statistical analysis using a priori groupings indicated significant differences among groups for several genes, while multidimensional scaling and cluster analyses of transcriptionC T values indicated strong differentiation of a large cluster and multiple outlying individual tortoises or small clusters in multidimensional space. These analyses highlight the effectiveness of the gene panel at detecting environmental perturbations as well as providing guidance in determining the health of the desert tortoise.
Vikram, Surendra; Pandey, Janmejay; Kumar, Shailesh; Raghava, Gajendra Pal Singh
2013-01-01
Biodegradation of para-Nitrophenol (PNP) proceeds via two distinct pathways, having 1,2,3-benzenetriol (BT) and hydroquinone (HQ) as their respective terminal aromatic intermediates. Genes involved in these pathways have already been studied in different PNP degrading bacteria. Burkholderia sp. strain SJ98 degrades PNP via both the pathways. Earlier, we have sequenced and analyzed a ~41 kb fragment from the genomic library of strain SJ98. This DNA fragment was found to harbor all the lower pathway genes; however, genes responsible for the initial transformation of PNP could not be identified within this fragment. Now, we have sequenced and annotated the whole genome of strain SJ98 and found two ORFs (viz., pnpA and pnpB) showing maximum identity at amino acid level with p-nitrophenol 4-monooxygenase (PnpM) and p-benzoquinone reductase (BqR). Unlike the other PNP gene clusters reported earlier in different bacteria, these two ORFs in SJ98 genome are physically separated from the other genes of PNP degradation pathway. In order to ascertain the identity of ORFs pnpA and pnpB, we have performed in-vitro assays using recombinant proteins heterologously expressed and purified to homogeneity. Purified PnpA was found to be a functional PnpM and transformed PNP into benzoquinone (BQ), while PnpB was found to be a functional BqR which catalyzed the transformation of BQ into hydroquinone (HQ). Noticeably, PnpM from strain SJ98 could also transform a number of PNP analogues. Based on the above observations, we propose that the genes for PNP degradation in strain SJ98 are arranged differentially in form of non-contiguous gene clusters. This is the first report for such arrangement for gene clusters involved in PNP degradation. Therefore, we propose that PNP degradation in strain SJ98 could be an important model system for further studies on differential evolution of PNP degradation functions. PMID:24376843
Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters.
Zhang, Huixian; Ravi, Vydianathan; Tay, Boon-Hui; Tohari, Sumanty; Pillai, Nisha E; Prasad, Aravind; Lin, Qiang; Brenner, Sydney; Venkatesh, Byrappa
2017-08-22
ParaHox genes ( Gsx , Pdx , and Cdx ) are an ancient family of developmental genes closely related to the Hox genes. They play critical roles in the patterning of brain and gut. The basal chordate, amphioxus, contains a single ParaHox cluster comprising one member of each family, whereas nonteleost jawed vertebrates contain four ParaHox genomic loci with six or seven ParaHox genes. Teleosts, which have experienced an additional whole-genome duplication, contain six ParaHox genomic loci with six ParaHox genes. Jawless vertebrates, represented by lampreys and hagfish, are the most ancient group of vertebrates and are crucial for understanding the origin and evolution of vertebrate gene families. We have previously shown that lampreys contain six Hox gene loci. Here we report that lampreys contain only two ParaHox gene clusters (designated as α- and β-clusters) bearing five ParaHox genes ( Gsxα , Pdxα , Cdxα , Gsxβ , and Cdxβ ). The order and orientation of the three genes in the α-cluster are identical to that of the single cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homologs in jawed vertebrates, which are expressed mainly in the brain. The lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate Pdx genes, indicating that the pancreatic expression of Pdx was acquired before the divergence of jawless and jawed vertebrate lineages. It is likely that the lamprey Pdxα plays a crucial role in pancreas specification and insulin production similar to the Pdx of jawed vertebrates.
Gao, Haiyan; Yang, Mei; Zhang, Xiaolan
2018-04-01
The present study aimed to investigate potential recurrence-risk biomarkers based on significant pathways for Luminal A breast cancer through gene expression profile analysis. Initially, the gene expression profiles of Luminal A breast cancer patients were downloaded from The Cancer Genome Atlas database. The differentially expressed genes (DEGs) were identified using a Limma package and the hierarchical clustering analysis was conducted for the DEGs. In addition, the functional pathways were screened using Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses and rank ratio calculation. The multigene prognostic assay was exploited based on the statistically significant pathways and its prognostic function was tested using train set and verified using the gene expression data and survival data of Luminal A breast cancer patients downloaded from the Gene Expression Omnibus. A total of 300 DEGs were identified between good and poor outcome groups, including 176 upregulated genes and 124 downregulated genes. The DEGs may be used to effectively distinguish Luminal A samples with different prognoses verified by hierarchical clustering analysis. There were 9 pathways screened as significant pathways and a total of 18 DEGs involved in these 9 pathways were identified as prognostic biomarkers. According to the survival analysis and receiver operating characteristic curve, the obtained 18-gene prognostic assay exhibited good prognostic function with high sensitivity and specificity to both the train and test samples. In conclusion the 18-gene prognostic assay including the key genes, transcription factor 7-like 2, anterior parietal cortex and lymphocyte enhancer factor-1 may provide a new method for predicting outcomes and may be conducive to the promotion of precision medicine for Luminal A breast cancer.
Saavedra, Lucila; Minahk, Carlos; de Ruiz Holgado, Aída P.; Sesma, Fernando
2004-01-01
The enterocin CRL35 biosynthetic gene cluster was cloned and sequenced. The sequence was revealed to be highly identical to that of the mundticin KS gene cluster (S. Kawamoto, J. Shima, R. Sato, T. Eguchi, S. Ohmomo, J. Shibato, N. Horikoshi, K. Takeshita, and T. Sameshima, Appl. Environ. Microbiol. 68:3830-3840, 2002). Short synthetic peptides were designed based on the bacteriocin sequence and were evaluated in antimicrobial competitive assays. The peptide KYYGNGVSCNKKGCS produced an enhancement of enterocin CRL35 antimicrobial activity in a buffer system. PMID:15215149
Fragmentation of an aflatoxin-like gene cluster in a forest pathogen
USDA-ARS?s Scientific Manuscript database
Secondary metabolic pathway genes are typically clustered in fungi. An exception to this paradigm is seen for genes required for the production of dothistromin, an aflatoxin-like virulence factor produced by the pine needle pathogen Dothistroma septosporum. In contrast to the tight clustering of gen...
Zhou, Zhenxing; Xu, Qingqing; Bu, Qingting; Guo, Yuanyang; Liu, Shuiping; Liu, Yu; Du, Yiling; Li, Yongquan
2015-02-09
Genomic sequencing of actinomycetes has revealed the presence of numerous gene clusters seemingly capable of natural product biosynthesis, yet most clusters are cryptic under laboratory conditions. Bioinformatics analysis of the completely sequenced genome of Streptomyces chattanoogensis L10 (CGMCC 2644) revealed a silent angucycline biosynthetic gene cluster. The overexpression of a pathway-specific activator gene under the constitutive ermE* promoter successfully triggered the expression of the angucycline biosynthetic genes. Two novel members of the angucycline antibiotic family, chattamycins A and B, were further isolated and elucidated. Biological activity assays demonstrated that chattamycin B possesses good antitumor activities against human cancer cell lines and moderate antibacterial activities. The results presented here provide a feasible method to activate silent angucycline biosynthetic gene clusters to discover potential new drug leads. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The ergot alkaloid gene cluster: functional analyses and evolutionary aspects.
Lorenz, Nicole; Haarmann, Thomas; Pazoutová, Sylvie; Jung, Manfred; Tudzynski, Paul
2009-01-01
Ergot alkaloids and their derivatives have been traditionally used as therapeutic agents in migraine, blood pressure regulation and help in childbirth and abortion. Their production in submerse culture is a long established biotechnological process. Ergot alkaloids are produced mainly by members of the genus Claviceps, with Claviceps purpurea as best investigated species concerning the biochemistry of ergot alkaloid synthesis (EAS). Genes encoding enzymes involved in EAS have been shown to be clustered; functional analyses of EAS cluster genes have allowed to assign specific functions to several gene products. Various Claviceps species differ with respect to their host specificity and their alkaloid content; comparison of the ergot alkaloid clusters in these species (and of clavine alkaloid clusters in other genera) yields interesting insights into the evolution of cluster structure. This review focuses on recently published and also yet unpublished data on the structure and evolution of the EAS gene cluster and on the function and regulation of cluster genes. These analyses have also significant biotechnological implications: the characterization of non-ribosomal peptide synthetases (NRPS) involved in the synthesis of the peptide moiety of ergopeptines opened interesting perspectives for the synthesis of ergot alkaloids; on the other hand, defined mutants could be generated producing interesting intermediates or only single peptide alkaloids (instead of the alkaloid mixtures usually produced by industrial strains).
Comparisons of non-Gaussian statistical models in DNA methylation analysis.
Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-06-16
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis
Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun
2014-01-01
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Transcriptome database resource and gene expression atlas for the rose
2012-01-01
Background For centuries roses have been selected based on a number of traits. Little information exists on the genetic and molecular basis that contributes to these traits, mainly because information on expressed genes for this economically important ornamental plant is scarce. Results Here, we used a combination of Illumina and 454 sequencing technologies to generate information on Rosa sp. transcripts using RNA from various tissues and in response to biotic and abiotic stresses. A total of 80714 transcript clusters were identified and 76611 peptides have been predicted among which 20997 have been clustered into 13900 protein families. BLASTp hits in closely related Rosaceae species revealed that about half of the predicted peptides in the strawberry and peach genomes have orthologs in Rosa dataset. Digital expression was obtained using RNA samples from organs at different development stages and under different stress conditions. qPCR validated the digital expression data for a selection of 23 genes with high or low expression levels. Comparative gene expression analyses between the different tissues and organs allowed the identification of clusters that are highly enriched in given tissues or under particular conditions, demonstrating the usefulness of the digital gene expression analysis. A web interface ROSAseq was created that allows data interrogation by BLAST, subsequent analysis of DNA clusters and access to thorough transcript annotation including best BLAST matches on Fragaria vesca, Prunus persica and Arabidopsis. The rose peptides dataset was used to create the ROSAcyc resource pathway database that allows access to the putative genes and enzymatic pathways. Conclusions The study provides useful information on Rosa expressed genes, with thorough annotation and an overview of expression patterns for transcripts with good accuracy. PMID:23164410
Harvala, Heli; Wiman, Åsa; Wallensten, Anders; Zakikhany, Katherina; Englund, Hélène; Brytting, Maria
2016-02-15
It is increasingly difficult to differentiate measles viruses (MeVs) relating to certain outbreaks on the basis of the nucleoprotein (N) gene sequence only, as the diversity of circulating MeV strains has decreased. We studied genomic regions that could provide better molecular discrimination between epidemiologically linked and unlinked MeV variants identified in Sweden during 2013-2014. The hemagglutinin (H) gene and hypervariable region between the fusion and matrix genes (MF-HVR) from 53 MeV-positive samples were amplified and sequenced. Data on phylogenetic clustering of MeVs on the basis of N, H, and MF-HVR sequences were compared to epidemiological data. MeVs were genotyped: 27 were B3, and 26 were D8. One genotype B3 cluster based on the N gene sequence contained epidemiologically unrelated viruses from 4 outbreaks, whereas analysis of H and MF-HVR sequences separated them into phylogenetic clusters consistent with the epidemiological data. Similarly, the single cluster of viruses with a genotype D8 N gene could be divided into the 5 outbreak groups on the basis of the phylogeny of MF-HVR sequences. A detailed picture of MeV circulation with more-defined links between outbreaks was obtained by sequencing the H gene and MF-HVR. Further identification and better genetic characterization of MeVs internationally is essential in identifying sources and routes of MeV spread within and beyond Europe in the elimination end game. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Ehler, Martin; Rajapakse, Vinodh; Zeeberg, Barry; Brooks, Brian; Brown, Jacob; Czaja, Wojciech; Bonner, Robert F.
The gene networks underlying closure of the optic fissure during vertebrate eye development are poorly understood. We used a novel clustering method based on Laplacian Eigenmaps, a nonlinear dimension reduction method, to analyze microarray data from laser capture microdissected (LCM) cells at the site and developmental stages (days 10.5 to 12.5) of optic fissure closure. Our new method provided greater biological specificity than classical clustering algorithms in terms of identifying more biological processes and functions related to eye development as defined by Gene Ontology at lower false discovery rates. This new methodology builds on the advantages of LCM to isolate pure phenotypic populations within complex tissues and allows improved ability to identify critical gene products expressed at lower copy number. The combination of LCM of embryonic organs, gene expression microarrays, and extracting spatial and temporal co-variations appear to be a powerful approach to understanding the gene regulatory networks that specify mammalian organogenesis.
Rzehak, Peter; Thijs, Carel; Standl, Marie; Mommers, Monique; Glaser, Claudia; Jansen, Eugène; Klopp, Norman; Koppelman, Gerard H.; Singmann, Paula; Postma, Dirkje S.; Sausenthaler, Stefanie; Dagnelie, Pieter C.; van den Brandt, Piet A.; Koletzko, Berthold; Heinrich, Joachim
2010-01-01
Background Association of genetic-variants in the FADS1-FADS2-gene-cluster with fatty-acid-composition in blood of adult-populations is well established. We analyze this genetic-association in two children-cohort-studies. In addition, the association between variants in the FADS-gene-cluster and blood-fatty-acid-composition with eczema was studied. Methods and Principal Findings Data of two population-based-birth-cohorts in the Netherlands and Germany (KOALA, LISA) were pooled (n = 879) and analyzed by (logistic) regression regarding the mutual influence of single-nucleotide-polymorphisms (SNPs) in the FADS-gene-cluster (rs174545, rs174546, rs174556, rs174561, rs3834458), on polyunsaturated fatty acids (PUFA) in blood and parent-reported eczema until the age of 2 years. All SNPs were highly significantly associated with all PUFAs except for alpha-linolenic-acid and eicosapentaenoic-acid, also after correction for multiple-testing. All tested SNPs showed associations with eczema in the LISA-study, but not in the KOALA-study. None of the PUFAs was significantly associated with eczema neither in the pooled nor in the analyses stratified by study-cohort. Conclusions and Significance PUFA-composition in young children's blood is under strong control of the FADS-gene-cluster. Inconsistent results were found for a link between these genetic-variants with eczema. PUFA in blood was not associated with eczema. Thus the hypothesis of an inflammatory-link between PUFA and eczema by the metabolic-pathway of LC-PUFAs as precursors for inflammatory prostaglandins and leukotrienes could not be confirmed by these data. PMID:20948998
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.
Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V
2017-09-30
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species
Andersen, Ethan J.; Neupane, Surendra; Benson, Benjamin V.
2017-01-01
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence. PMID:28973974
Deciphering the Anti-Aflatoxinogenic Properties of Eugenol Using a Large-Scale q-PCR Approach
Caceres, Isaura; El Khoury, Rhoda; Medina, Ángel; Lippi, Yannick; Naylies, Claire; Atoui, Ali; El Khoury, André; Oswald, Isabelle P.; Bailly, Jean-Denis; Puel, Olivier
2016-01-01
Produced by several species of Aspergillus, Aflatoxin B1 (AFB1) is a carcinogenic mycotoxin contaminating many crops worldwide. The utilization of fungicides is currently one of the most common methods; nevertheless, their use is not environmentally or economically sound. Thus, the use of natural compounds able to block aflatoxinogenesis could represent an alternative strategy to limit food and feed contamination. For instance, eugenol, a 4-allyl-2-methoxyphenol present in many essential oils, has been identified as an anti-aflatoxin molecule. However, its precise mechanism of action has yet to be clarified. The production of AFB1 is associated with the expression of a 70 kB cluster, and not less than 21 enzymatic reactions are necessary for its production. Based on former empirical data, a molecular tool composed of 60 genes targeting 27 genes of aflatoxin B1 cluster and 33 genes encoding the main regulatory factors potentially involved in its production, was developed. We showed that AFB1 inhibition in Aspergillus flavus following eugenol addition at 0.5 mM in a Malt Extract Agar (MEA) medium resulted in a complete inhibition of the expression of all but one gene of the AFB1 biosynthesis cluster. This transcriptomic effect followed a down-regulation of the complex composed by the two internal regulatory factors, AflR and AflS. This phenomenon was also influenced by an over-expression of veA and mtfA, two genes that are directly linked to AFB1 cluster regulation. PMID:27128940
Bushel, Pierre R; Wolfinger, Russell D; Gibson, Greg
2007-01-01
Background Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. Results We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. Conclusion The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable. PMID:17408499
NASA Astrophysics Data System (ADS)
Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan
2015-03-01
Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning ``plug-and-play'' approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.
Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S; Qian, Pei-Yuan
2015-03-24
Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning "plug-and-play" approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.
Peña, Alejandro; Del Carratore, Francesco; Cummings, Matthew; Takano, Eriko; Breitling, Rainer
2017-12-18
The rapid increase of publicly available microbial genome sequences has highlighted the presence of hundreds of thousands of biosynthetic gene clusters (BGCs) encoding valuable secondary metabolites. The experimental characterization of new BGCs is extremely laborious and struggles to keep pace with the in silico identification of potential BGCs. Therefore, the prioritisation of promising candidates among computationally predicted BGCs represents a pressing need. Here, we propose an output ordering and prioritisation system (OOPS) which helps sorting identified BGCs by a wide variety of custom-weighted biological and biochemical criteria in a flexible and user-friendly interface. OOPS facilitates a judicious prioritisation of BGCs using G+C content, coding sequence length, gene number, cluster self-similarity and codon bias parameters, as well as enabling the user to rank BGCs based upon BGC type, novelty, and taxonomic distribution. Effective prioritisation of BGCs will help to reduce experimental attrition rates and improve the breadth of bioactive metabolites characterized.
Analysis of gene expression levels in individual bacterial cells without image segmentation.
Kwak, In Hae; Son, Minjun; Hagen, Stephen J
2012-05-11
Studies of stochasticity in gene expression typically make use of fluorescent protein reporters, which permit the measurement of expression levels within individual cells by fluorescence microscopy. Analysis of such microscopy images is almost invariably based on a segmentation algorithm, where the image of a cell or cluster is analyzed mathematically to delineate individual cell boundaries. However segmentation can be ineffective for studying bacterial cells or clusters, especially at lower magnification, where outlines of individual cells are poorly resolved. Here we demonstrate an alternative method for analyzing such images without segmentation. The method employs a comparison between the pixel brightness in phase contrast vs fluorescence microscopy images. By fitting the correlation between phase contrast and fluorescence intensity to a physical model, we obtain well-defined estimates for the different levels of gene expression that are present in the cell or cluster. The method reveals the boundaries of the individual cells, even if the source images lack the resolution to show these boundaries clearly. Copyright © 2012 Elsevier Inc. All rights reserved.
Scott, Barry; Young, Carolyn A.; Saikia, Sanjay; McMillan, Lisa K.; Monahan, Brendon J.; Koulman, Albert; Astin, Jonathan; Eaton, Carla J.; Bryant, Andrea; Wrenn, Ruth E.; Finch, Sarah C.; Tapper, Brian A.; Parker, Emily J.; Jameson, Geoffrey B.
2013-01-01
The indole-diterpene paxilline is an abundant secondary metabolite synthesized by Penicillium paxilli. In total, 21 genes have been identified at the PAX locus of which six have been previously confirmed to have a functional role in paxilline biosynthesis. A combination of bioinformatics, gene expression and targeted gene replacement analyses were used to define the boundaries of the PAX gene cluster. Targeted gene replacement identified seven genes, paxG, paxA, paxM, paxB, paxC, paxP and paxQ that were all required for paxilline production, with one additional gene, paxD, required for regular prenylation of the indole ring post paxilline synthesis. The two putative transcription factors, PP104 and PP105, were not co-regulated with the pax genes and based on targeted gene replacement, including the double knockout, did not have a role in paxilline production. The relationship of indole dimethylallyl transferases involved in prenylation of indole-diterpenes such as paxilline or lolitrem B, can be found as two disparate clades, not supported by prenylation type (e.g., regular or reverse). This paper provides insight into the P. paxilli indole-diterpene locus and reviews the recent advances identified in paxilline biosynthesis. PMID:23949005
Discovery of a Phosphonoacetic Acid Derived Natural Product by Pathway Refactoring.
Freestone, Todd S; Ju, Kou-San; Wang, Bin; Zhao, Huimin
2017-02-17
The activation of silent natural product gene clusters is a synthetic biology problem of great interest. As the rate at which gene clusters are identified outpaces the discovery rate of new molecules, this unknown chemical space is rapidly growing, as too are the rewards for developing technologies to exploit it. One class of natural products that has been underrepresented is phosphonic acids, which have important medical and agricultural uses. Hundreds of phosphonic acid biosynthetic gene clusters have been identified encoding for unknown molecules. Although methods exist to elicit secondary metabolite gene clusters in native hosts, they require the strain to be amenable to genetic manipulation. One method to circumvent this is pathway refactoring, which we implemented in an effort to discover new phosphonic acids from a gene cluster from Streptomyces sp. strain NRRL F-525. By reengineering this cluster for expression in the production host Streptomyces lividans, utility of refactoring is demonstrated with the isolation of a novel phosphonic acid, O-phosphonoacetic acid serine, and the characterization of its biosynthesis. In addition, a new biosynthetic branch point is identified with a phosphonoacetaldehyde dehydrogenase, which was used to identify additional phosphonic acid gene clusters that share phosphonoacetic acid as an intermediate.
The intact dupA cluster is a more reliable Helicobacter pylori virulence marker than dupA alone.
Jung, Sung Woo; Sugimoto, Mitsushige; Shiota, Seiji; Graham, David Y; Yamaoka, Yoshio
2012-01-01
The duodenal ulcer promoting (dupA) gene, located in the plasticity region of Helicobacter pylori, is associated with duodenal ulcer development. dupA was predicted to form a type IV secretory system (T4SS) with vir genes around dupA (dupA cluster). We investigated the prevalence of dupA and dupA clusters and clarified associations between the dupA cluster status and clinical outcomes in the U.S. population. In all, 245 H. pylori strains were examined using PCR to evaluate the status of dupA and the adjacent vir genes predicted to form T4SS, in addition to the status of cag pathogenicity island (PAI). The associations between dupA cluster status and interleukin-8 (IL-8) and IL-12 production were also examined. The presence of dupA and all adjacent vir genes were defined as a complete dupA cluster. Many variations related to the status of dupA and dupA cluster genes were identified. Concurrent H. pylori infection and the presence of a complete dupA cluster increases duodenal ulcer risk compared to H. pylori infection with incomplete dupA cluster or without the dupA gene independent on the cag PAI status (adjusted odds ratio, 2.13; 95% confidence interval, 1.13 to 4.03). Gastric mucosal IL-8 levels were also significantly higher in the complete dupA cluster group than in other groups (P=0.01). In conclusion, although the causal relationship between the dupA cluster and duodenal ulcer development is not proved, the presence of a complete dupA cluster but not dupA alone, is associated with duodenal ulcer development.
The Intact dupA Cluster Is a More Reliable Helicobacter pylori Virulence Marker than dupA Alone
Jung, Sung Woo; Sugimoto, Mitsushige; Shiota, Seiji; Graham, David Y.
2012-01-01
The duodenal ulcer promoting (dupA) gene, located in the plasticity region of Helicobacter pylori, is associated with duodenal ulcer development. dupA was predicted to form a type IV secretory system (T4SS) with vir genes around dupA (dupA cluster). We investigated the prevalence of dupA and dupA clusters and clarified associations between the dupA cluster status and clinical outcomes in the U.S. population. In all, 245 H. pylori strains were examined using PCR to evaluate the status of dupA and the adjacent vir genes predicted to form T4SS, in addition to the status of cag pathogenicity island (PAI). The associations between dupA cluster status and interleukin-8 (IL-8) and IL-12 production were also examined. The presence of dupA and all adjacent vir genes were defined as a complete dupA cluster. Many variations related to the status of dupA and dupA cluster genes were identified. Concurrent H. pylori infection and the presence of a complete dupA cluster increases duodenal ulcer risk compared to H. pylori infection with incomplete dupA cluster or without the dupA gene independent on the cag PAI status (adjusted odds ratio, 2.13; 95% confidence interval, 1.13 to 4.03). Gastric mucosal IL-8 levels were also significantly higher in the complete dupA cluster group than in other groups (P = 0.01). In conclusion, although the causal relationship between the dupA cluster and duodenal ulcer development is not proved, the presence of a complete dupA cluster but not dupA alone, is associated with duodenal ulcer development. PMID:22038914
Campbell, Elsie L; Hagen, Kari D; Chen, Rui; Risser, Douglas D; Ferreira, Daniela P; Meeks, John C
2015-02-15
In cyanobacterial Nostoc species, substratum-dependent gliding motility is confined to specialized nongrowing filaments called hormogonia, which differentiate from vegetative filaments as part of a conditional life cycle and function as dispersal units. Here we confirm that Nostoc punctiforme hormogonia are positively phototactic to white light over a wide range of intensities. N. punctiforme contains two gene clusters (clusters 2 and 2i), each of which encodes modular cyanobacteriochrome-methyl-accepting chemotaxis proteins (MCPs) and other proteins that putatively constitute a basic chemotaxis-like signal transduction complex. Transcriptional analysis established that all genes in clusters 2 and 2i, plus two additional clusters (clusters 1 and 3) with genes encoding MCPs lacking cyanobacteriochrome sensory domains, are upregulated during the differentiation of hormogonia. Mutational analysis determined that only genes in cluster 2i are essential for positive phototaxis in N. punctiforme hormogonia; here these genes are designated ptx (for phototaxis) genes. The cluster is unusual in containing complete or partial duplicates of genes encoding proteins homologous to the well-described chemotaxis elements CheY, CheW, MCP, and CheA. The cyanobacteriochrome-MCP gene (ptxD) lacks transmembrane domains and has 7 potential binding sites for bilins. The transcriptional start site of the ptx genes does not resemble a sigma 70 consensus recognition sequence; moreover, it is upstream of two genes encoding gas vesicle proteins (gvpA and gvpC), which also are expressed only in the hormogonium filaments of N. punctiforme. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Identifying a gene expression signature of cluster headache in blood
Eising, Else; Pelzer, Nadine; Vijfhuizen, Lisanne S.; Vries, Boukje de; Ferrari, Michel D.; ‘t Hoen, Peter A. C.; Terwindt, Gisela M.; van den Maagdenberg, Arn M. J. M.
2017-01-01
Cluster headache is a relatively rare headache disorder, typically characterized by multiple daily, short-lasting attacks of excruciating, unilateral (peri-)orbital or temporal pain associated with autonomic symptoms and restlessness. To better understand the pathophysiology of cluster headache, we used RNA sequencing to identify differentially expressed genes and pathways in whole blood of patients with episodic (n = 19) or chronic (n = 20) cluster headache in comparison with headache-free controls (n = 20). Gene expression data were analysed by gene and by module of co-expressed genes with particular attention to previously implicated disease pathways including hypocretin dysregulation. Only moderate gene expression differences were identified and no associations were found with previously reported pathogenic mechanisms. At the level of functional gene sets, associations were observed for genes involved in several brain-related mechanisms such as GABA receptor function and voltage-gated channels. In addition, genes and modules of co-expressed genes showed a role for intracellular signalling cascades, mitochondria and inflammation. Although larger study samples may be required to identify the full range of involved pathways, these results indicate a role for mitochondria, intracellular signalling and inflammation in cluster headache. PMID:28074859
Neubauer, Lisa; Dopstadt, Julian; Humpf, Hans-Ulrich; Tudzynski, Paul
2016-01-01
Claviceps purpurea is a phytopathogenic fungus infecting a broad range of grasses including economically important cereal crop plants. The infection cycle ends with the formation of the typical purple-black pigmented sclerotia containing the toxic ergot alkaloids. Besides these ergot alkaloids little is known about the secondary metabolism of the fungus. Red anthraquinone derivatives and yellow xanthone dimers (ergochromes) have been isolated from sclerotia and described as ergot pigments, but the corresponding gene cluster has remained unknown. Fungal pigments gain increasing interest for example as environmentally friendly alternatives to existing dyes. Furthermore, several pigments show biological activities and may have some pharmaceutical value. This study identified the gene cluster responsible for the synthesis of the ergot pigments. Overexpression of the cluster-specific transcription factor led to activation of the gene cluster and to the production of several known ergot pigments. Knock out of the cluster key enzyme, a nonreducing polyketide synthase, clearly showed that this cluster is responsible for the production of red anthraquinones as well as yellow ergochromes. Furthermore, a tentative biosynthetic pathway for the ergot pigments is proposed. By changing the culture conditions, pigment production was activated in axenic culture so that high concentration of phosphate and low concentration of sucrose induced pigment syntheses. This is the first functional analysis of a secondary metabolite gene cluster in the ergot fungus besides that for the classical ergot alkaloids. We demonstrated that this gene cluster is responsible for the typical purple-black color of the ergot sclerotia and showed that the red and yellow ergot pigments are products of the same biosynthetic pathway. Activation of the gene cluster in axenic culture opened up new possibilities for biotechnological applications like the dye production or the development of new pharmaceuticals.
2013-01-01
Background Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods. PMID:24373308
2012-01-01
Background Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. Results We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. Conclusions Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data. PMID:23151154
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.
Saeed, Mohammad
2017-05-01
Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.
Origin of the Allyl Group in FK506 Biosynthesis*
Goranovič, Dušan; Kosec, Gregor; Mrak, Peter; Fujs, Štefan; Horvat, Jaka; Kuščer, Enej; Kopitar, Gregor; Petković, Hrvoje
2010-01-01
FK506 (tacrolimus) is a secondary metabolite with a potent immunosuppressive activity, currently registered for use as immunosuppressant after organ transplantation. FK506 and FK520 are biogenetically related natural products that are synthesized by combined polyketide synthase/nonribosomal peptide synthetase systems. The entire gene cluster for biosynthesis of FK520 from Streptomyces hygroscopicus var. ascomyceticus has been cloned and sequenced. On the other hand, the FK506 gene cluster from Streptomyces sp. MA6548 (ATCC55098) was sequenced only partially, and it was reasonable to expect that additional genes would be required for the provision of substrate supply. Here we report the identification of a previously unknown region of the FK506 gene cluster from Streptomyces tsukubaensis NRRL 18488 containing genes encoding the provision of unusual building blocks for FK506 biosynthesis as well as a regulatory gene. Among others, we identified a group of genes encoding biosynthesis of the extender unit that forms the allyl group at carbon 21 of FK506. Interestingly, we have identified a small independent diketide synthase system involved in the biosynthesis of the allyl group. Inactivation of one of these genes, encoding an unusual ketosynthase domain, resulted in an FK506 nonproducing strain, and the production was restored when a synthetic analog of the allylmalonyl-CoA extender unit was added to the cultivation medium. Based on our results, we propose a biosynthetic pathway for the provision of an unusual five-carbon extender unit, which is carried out by a novel diketide synthase complex. PMID:20194504
Chiang-Ni, Chuan; Zheng, Po-Xing; Wang, Shu-Ying; Tsai, Pei-Jane; Chuang, Woei-Jer; Lin, Yee-Shin; Liu, Ching-Chuan; Wu, Jiunn-Jong
2016-01-01
emm typing is the most widely used molecular typing method for the human pathogen Streptococcus pyogenes (group A streptococcus [GAS]). emm typing is based on a small variable region of the emm gene; however, the emm cluster typing system defines GAS types according to the nearly complete sequence of the emm gene. Therefore, emm cluster typing is considered to provide more information regarding the functional and structural properties of M proteins in different emm types of GAS. In the present study, 677 isolates collected between 1994 and 2008 in a hospital in southern Taiwan were analyzed by the emm cluster typing system. emm clusters A-C4, E1, E6, and A-C3 were the most prevalent emm cluster types and accounted for 67.4% of total isolates. emm clusters A-C4 and E1 were associated with noninvasive diseases, whereas E6 was significantly associated with both invasive and noninvasive manifestations. In addition, emm clusters D4, E2, and E3 were significantly associated with invasive manifestations. Furthermore, we found that the functional properties of M protein, including low fibrinogen-binding and high IgG-binding activities, were correlated significantly with invasive manifestations. In summary, the present study provides updated epidemiological information on GAS emm cluster types in southern Taiwan. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Kato, Hiroki; Tsunematsu, Yuta; Yamamoto, Tsuyoshi; Namiki, Takuya; Kishimoto, Shinji; Noguchi, Hiroshi; Watanabe, Kenji
2016-07-01
To rapidly identify novel natural products and their associated biosynthetic genes from underutilized and genetically difficult-to-manipulate microbes, we developed a method that uses (1) chemical screening to isolate novel microbial secondary metabolites, (2) bioinformatic analyses to identify a potential biosynthetic gene cluster and (3) heterologous expression of the genes in a convenient host to confirm the identity of the gene cluster and the proposed biosynthetic mechanism. The chemical screen was achieved by searching known natural product databases with data from liquid chromatographic and high-resolution mass spectrometric analyses collected on the extract from a target microbe culture. Using this method, we were able to isolate two new meroterpenes, subglutinols C (1) and D (2), from an entomopathogenic filamentous fungus Metarhizium robertsii ARSEF 23. Bioinformatics analysis of the genome allowed us to identify a gene cluster likely to be responsible for the formation of subglutinols. Heterologous expression of three genes from the gene cluster encoding a polyketide synthase, a prenyltransferase and a geranylgeranyl pyrophosphate synthase in Aspergillus nidulans A1145 afforded an α-pyrone-fused uncyclized diterpene, the expected intermediate of the subglutinol biosynthesis, thereby confirming the gene cluster to be responsible for the subglutinol biosynthesis. These results indicate the usefulness of our methodology in isolating new natural products and identifying their associated biosynthetic gene cluster from microbes that are not amenable to genetic manipulation. Our method should facilitate the natural product discovery efforts by expediting the identification of new secondary metabolites and their associated biosynthetic genes from a wider source of microbes.
Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.
2013-01-01
The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role of secondary metabolite gene clusters and their metabolites in fungal biology. PMID:23818858
Maurer, Lisa M; Yohannes, Elizabeth; Bondurant, Sandra S; Radmacher, Michael; Slonczewski, Joan L
2005-01-01
Gene expression profiles of Escherichia coli K-12 W3110 were compared as a function of steady-state external pH. Cultures were grown to an optical density at 600 nm of 0.3 in potassium-modified Luria-Bertani medium buffered at pH 5.0, 7.0, and 8.7. For each of the three pH conditions, cDNA from RNA of five independent cultures was hybridized to Affymetrix E. coli arrays. Analysis of variance with an alpha level of 0.001 resulted in 98% power to detect genes showing a twofold difference in expression. Normalized expression indices were calculated for each gene and intergenic region (IG). Differential expression among the three pH classes was observed for 763 genes and 353 IGs. Hierarchical clustering yielded six well-defined clusters of pH profiles, designated Acid High (highest expression at pH 5.0), Acid Low (lowest expression at pH 5.0), Base High (highest at pH 8.7), Base Low (lowest at pH 8.7), Neutral High (highest at pH 7.0, lower in acid or base), and Neutral Low (lowest at pH 7.0, higher at both pH extremes). Flagellar and chemotaxis genes were repressed at pH 8.7 (Base Low cluster), where the cell's transmembrane proton potential is diminished by the maintenance of an inverted pH gradient. High pH also repressed the proton pumps cytochrome o (cyo) and NADH dehydrogenases I and II. By contrast, the proton-importing ATP synthase F1Fo and the microaerophilic cytochrome d (cyd), which minimizes proton export, were induced at pH 8.7. These observations are consistent with a model in which high pH represses synthesis of flagella, which expend proton motive force, while stepping up electron transport and ATPase components that keep protons inside the cell. Acid-induced genes, on the other hand, were coinduced by conditions associated with increased metabolic rate, such as oxidative stress. All six pH-dependent clusters included envelope and periplasmic proteins, which directly experience external pH. Overall, this study showed that (i) low pH accelerates acid consumption and proton export, while coinducing oxidative stress and heat shock regulons; (ii) high pH accelerates proton import, while repressing the energy-expensive flagellar and chemotaxis regulons; and (iii) pH differentially regulates a large number of periplasmic and envelope proteins.
Inglin, Raffael C; Meile, Leo; Stevens, Marc J A
2018-04-24
Bacterial taxonomy aims to classify bacteria based on true evolutionary events and relies on a polyphasic approach that includes phenotypic, genotypic and chemotaxonomic analyses. Until now, complete genomes are largely ignored in taxonomy. The genus Lactobacillus consists of 173 species and many genomes are available to study taxonomy and evolutionary events. We analyzed and clustered 98 completely sequenced genomes of the genus Lactobacillus and 234 draft genomes of 5 different Lactobacillus species, i.e. L. reuteri, L. delbrueckii, L. plantarum, L. rhamnosus and L. helveticus. The core-genome of the genus Lactobacillus contains 266 genes and the pan-genome 20'800 genes. Clustering of the Lactobacillus pan- and core-genome resulted in two highly similar trees. This shows that evolutionary history is traceable in the core-genome and that clustering of the core-genome is sufficient to explore relationships. Clustering of core- and pan-genomes at species' level resulted in similar trees as well. Detailed analyses of the core-genomes showed that the functional class "genetic information processing" is conserved in the core-genome but that "signaling and cellular processes" is not. The latter class encodes functions that are involved in environmental interactions. Evolution of lactobacilli seems therefore directed by the environment. The type species L. delbrueckii was analyzed in detail and its pan-genome based tree contained two major clades whose members contained different genes yet identical functions. In addition, evidence for horizontal gene transfer between strains of L. delbrueckii, L. plantarum, and L. rhamnosus, and between species of the genus Lactobacillus is presented. Our data provide evidence for evolution of some lactobacilli according to a parapatric-like model for species differentiation. Core-genome trees are useful to detect evolutionary relationships in lactobacilli and might be useful in taxonomic analyses. Lactobacillus' evolution is directed by the environment and HGT.
Cancer Detection in Microarray Data Using a Modified Cat Swarm Optimization Clustering Approach
M, Pandi; R, Balamurugan; N, Sadhasivam
2017-12-29
Objective: A better understanding of functional genomics can be obtained by extracting patterns hidden in gene expression data. This could have paramount implications for cancer diagnosis, gene treatments and other domains. Clustering may reveal natural structures and identify interesting patterns in underlying data. The main objective of this research was to derive a heuristic approach to detection of highly co-expressed genes related to cancer from gene expression data with minimum Mean Squared Error (MSE). Methods: A modified CSO algorithm using Harmony Search (MCSO-HS) for clustering cancer gene expression data was applied. Experiment results are analyzed using two cancer gene expression benchmark datasets, namely for leukaemia and for breast cancer. Result: The results indicated MCSO-HS to be better than HS and CSO, 13% and 9% with the leukaemia dataset. For breast cancer dataset improvement was by 22% and 17%, respectively, in terms of MSE. Conclusion: The results showed MCSO-HS to outperform HS and CSO with both benchmark datasets. To validate the clustering results, this work was tested with internal and external cluster validation indices. Also this work points to biological validation of clusters with gene ontology in terms of function, process and component. Creative Commons Attribution License
Fox, Ellen M.; Gardiner, Donald M.; Keller, Nancy P.; Howlett, Barbara J.
2008-01-01
A gene, sirZ, encoding a Zn(II)2Cys6 DNA binding protein is present in a cluster of genes responsible for the biosynthesis of the epipolythiodioxopiperazine (ETP) toxin, sirodesmin PL in the ascomycete plant pathogen, Leptosphaeria maculans. RNA-mediated silencing of sirZ gives rise to transformants that produce only residual amounts of sirodesmin PL and display a decrease in the transcription of several sirodesmin PL biosynthetic genes. This indicates that SirZ is a major regulator of this gene cluster. Proteins similar to SirZ are encoded in the gliotoxin biosynthetic gene cluster of Aspergillus fumigatus (gliZ) and in an ETP-like cluster in Penicillium lilacinoechinulatum (PlgliZ). Despite its high level of sequence similarity to gliZ, PlgliZ is unable to complement the gliotoxin-deficiency of a mutant of gliZ in A. fumigatus. Putative binding sites for these regulatory proteins in the promoters of genes in these clusters were predicted using bioinformatic analysis. These sites are similar to those commonly bound by other proteins with Zn(II)2Cys6 DNA binding domains. PMID:18023597
Evidence against the selfish operon theory.
Pál, Csaba; Hurst, Laurence D
2004-06-01
According to the selfish operon hypothesis, the clustering of genes and their subsequent organization into operons is beneficial for the constituent genes because it enables the horizontal gene transfer of weakly selected, functionally coupled genes. The majority of these are expected to be non-essential genes. From our analysis of the Escherichia coli genome, we conclude that the selfish operon hypothesis is unlikely to provide a general explanation for clustering nor can it account for the gene composition of operons. Contrary to expectations, essential genes with related functions have an especially strong tendency to cluster, even if they are not in operons. Moreover, essential genes are particularly abundant in operons.
USDA-ARS?s Scientific Manuscript database
In this study, 30 hard red spring (HRS) wheat cultivars released between 1910 and 2013 were analyzed to determine how they cluster in terms of parentage and protein data, analyzed by reverse-phase HPLC (RP-HPLC) of gliadins, and size-exclusion HPLC (SE-HPLC) of unreduced proteins. Dwarfing genes in...
The bacterial species definition in the genomic era
Konstantinidis, Konstantinos T; Ramette, Alban; Tiedje, James M
2006-01-01
The bacterial species definition, despite its eminent practical significance for identification, diagnosis, quarantine and diversity surveys, remains a very difficult issue to advance. Genomics now offers novel insights into intra-species diversity and the potential for emergence of a more soundly based system. Although we share the excitement, we argue that it is premature for a universal change to the definition because current knowledge is based on too few phylogenetic groups and too few samples of natural populations. Our analysis of five important bacterial groups suggests, however, that more stringent standards for species may be justifiable when a solid understanding of gene content and ecological distinctiveness becomes available. Our analysis also reveals what is actually encompassed in a species according to the current standards, in terms of whole-genome sequence and gene-content diversity, and shows that this does not correspond to coherent clusters for the environmental Burkholderia and Shewanella genera examined. In contrast, the obligatory pathogens, which have a very restricted ecological niche, do exhibit clusters. Therefore, the idea of biologically meaningful clusters of diversity that applies to most eukaryotes may not be universally applicable in the microbial world, or if such clusters exist, they may be found at different levels of distinction. PMID:17062412
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.
Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel
2017-09-18
The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
Reyes-Dominguez, Yazmid; Boedi, Stefan; Sulyok, Michael; Wiesenberger, Gerlinde; Stoppacher, Norbert; Krska, Rudolf; Strauss, Joseph
2012-01-01
Chromatin modifications and heterochromatic marks have been shown to be involved in the regulation of secondary metabolism gene clusters in the fungal model system Aspergillus nidulans. We examine here the role of HEP1, the heterochromatin protein homolog of Fusarium graminearum, for the production of secondary metabolites. Deletion of Hep1 in a PH-1 background strongly influences expression of genes required for the production of aurofusarin and the main tricothecene metabolite DON. In the Hep1 deletion strains AUR genes are highly up-regulated and aurofusarin production is greatly enhanced suggesting a repressive role for heterochromatin on gene expression of this cluster. Unexpectedly, gene expression and metabolites are lower for the trichothecene cluster suggesting a positive function of Hep1 for DON biosynthesis. However, analysis of histone modifications in chromatin of AUR and DON gene promoters reveals that in both gene clusters the H3K9me3 heterochromatic mark is strongly reduced in the Hep1 deletion strain. This, and the finding that a DON-cluster flanking gene is up-regulated, suggests that the DON biosynthetic cluster is repressed by HEP1 directly and indirectly. Results from this study point to a conserved mode of secondary metabolite (SM) biosynthesis regulation in fungi by chromatin modifications and the formation of facultative heterochromatin. PMID:22100541
Wolf, Timo; Droste, Julian; Gren, Tetiana; Ortseifen, Vera; Schneiker-Bekel, Susanne; Zemke, Till; Pühler, Alfred; Kalinowski, Jörn
2017-07-25
Acarbose is used in the treatment of diabetes mellitus type II and is produced by Actinoplanes sp. SE50/110. Although the biosynthesis of acarbose has been intensively studied, profound knowledge about transcription factors involved in acarbose biosynthesis and their binding sites has been missing until now. In contrast to acarbose biosynthetic gene clusters in Streptomyces spp., the corresponding gene cluster of Actinoplanes sp. SE50/110 lacks genes for transcriptional regulators. The acarbose regulator C (AcrC) was identified through an in silico approach by aligning the LacI family regulators of acarbose biosynthetic gene clusters in Streptomyces spp. with the Actinoplanes sp. SE50/110 genome. The gene for acrC, located in a head-to-head arrangement with the maltose/maltodextrin ABC transporter malEFG operon, was deleted by introducing PCR targeting for Actinoplanes sp. SE50/110. Characterization was carried out through cultivation experiments, genome-wide microarray hybridizations, and RT-qPCR as well as electrophoretic mobility shift assays for the elucidation of binding motifs. The results show that AcrC binds to the intergenic region between acbE and acbD in Actinoplanes sp. SE50/110 and acts as a transcriptional repressor on these genes. The transcriptomic profile of the wild type was reconstituted through a complementation of the deleted acrC gene. Additionally, regulatory sequence motifs for the binding of AcrC were identified in the intergenic region of acbE and acbD. It was shown that AcrC expression influences acarbose formation in the early growth phase. Interestingly, AcrC does not regulate the malEFG operon. This study characterizes the first known transcription factor of the acarbose biosynthetic gene cluster in Actinoplanes sp. SE50/110. It therefore represents an important step for understanding the regulatory network of this organism. Based on this work, rational strain design for improving the biotechnological production of acarbose can now be implemented.
2011-01-01
Background Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency. Results Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories. Conclusion We present the first comprehensive genome-wide transcript profile study of C. arabica and C. canephora, which can be freely assessed by the scientific community at http://www.lge.ibi.unicamp.br/coffea. Our data reveal the presence of species-specific/prevalent genes in coffee that may help to explain particular characteristics of these two crops. The identification of differentially expressed transcripts offers a starting point for the correlation between gene expression profiles and Coffea spp. developmental traits, providing valuable insights for coffee breeding and biotechnology, especially concerning sugar metabolism and stress tolerance. PMID:21303543
López, Camilo E; Acosta, Iván F; Jara, Carlos; Pedraza, Fabio; Gaitán-Solís, Eliana; Gallego, Gerardo; Beebe, Steve; Tohme, Joe
2003-01-01
ABSTRACT A polymerase chain reaction approach using degenerate primers that targeted the conserved domains of cloned plant disease resistance genes (R genes) was used to isolate a set of 15 resistance gene analogs (RGAs) from common bean (Phaseolus vulgaris). Eight different classes of RGAs were obtained from nucleotide binding site (NBS)-based primers and seven from not previously described Toll/Interleukin-1 receptor-like (TIR)-based primers. Putative amino acid sequences of RGAs were significantly similar to R genes and contained additional conserved motifs. The NBS-type RGAs were classified in two subgroups according to the expected final residue in the kinase-2 motif. Eleven RGAs were mapped at 19 loci on eight linkage groups of the common bean genetic map constructed at Centro Internacional de Agricultura Tropical. Genetic linkage was shown for eight RGAs with partial resistance to anthracnose, angular leaf spot (ALS) and Bean golden yellow mosaic virus (BGYMV). RGA1 and RGA2 were associated with resistance loci to anthracnose and BGYMV and were part of two clusters of R genes previously described. A new major cluster was detected by RGA7 and explained up to 63.9% of resistance to ALS and has a putative contribution to anthracnose resistance. These results show the usefulness of RGAs as candidate genes to detect and eventually isolate numerous R genes in common bean.
Liu, Shi-Huo; Li, Hong-Fei; Yang, Yang; Yang, Rui-Lin; Yang, Wen-Jia; Jiang, Hong-Bo; Dou, Wei; Smagghe, Guy; Wang, Jin-Jun
2018-05-01
Chitinases (Chts) and chitin deacetylases (CDAs) are important enzymes required for chitin metabolism in insects. In this study, 12 Cht-related genes (including seven Cht genes and five imaginal disc growth factor genes) and 6 CDA genes (encoding seven proteins) were identified in Bactrocera dorsalis using genome-wide searching and transcript profiling. Based on the conserved sequences and phylogenetic relationships, 12 Cht-related proteins were clustered into eight groups (group I-V and VII-IX). Further domain architecture analysis showed that all contained at least one chitinase catalytic domain, however, only four (BdCht5, BdCht7, BdCht8 and BdCht10) possessed chitin-binding domains. The subsequent phylogenetic analysis revealed that seven CDAs were clustered into five groups (group I-V), and all had one chitin deacetylase catalytic domain. However, only six exhibited chitin-binding domains. Finally, the development- and tissue-specific expression profiling showed that transcript levels of the 12 Cht-related genes and 6 CDA genes varied considerably among eggs, larvae, pupae and adults, as well as among different tissues of larvae and adults. Our findings illustrate the structural differences and expression patterns of Cht and CDA genes in B. dorsalis, and provide important information for the development of new pest control strategies based on these vital enzymes. Copyright © 2018. Published by Elsevier Inc.
The WRKY Transcription Factor Genes in Lotus japonicus
Wang, Pengfei; Wang, Xingjun
2014-01-01
WRKY transcription factor genes play critical roles in plant growth and development, as well as stress responses. WRKY genes have been examined in various higher plants, but they have not been characterized in Lotus japonicus. The recent release of the L. japonicus whole genome sequence provides an opportunity for a genome wide analysis of WRKY genes in this species. In this study, we identified 61 WRKY genes in the L. japonicus genome. Based on the WRKY protein structure, L. japonicus WRKY (LjWRKY) genes can be classified into three groups (I–III). Investigations of gene copy number and gene clusters indicate that only one gene duplication event occurred on chromosome 4 and no clustered genes were detected on chromosomes 3 or 6. Researchers previously believed that group II and III WRKY domains were derived from the C-terminal WRKY domain of group I. Our results suggest that some WRKY genes in group II originated from the N-terminal domain of group I WRKY genes. Additional evidence to support this hypothesis was obtained by Medicago truncatula WRKY (MtWRKY) protein motif analysis. We found that LjWRKY and MtWRKY group III genes are under purifying selection, suggesting that WRKY genes will become increasingly structured and functionally conserved. PMID:24745006
The ability to anchor chemical class-based gene expression changes to phenotypic lesions and to describe these changes as a function of dose and time can inform mode of action and improve quantitative risk assessment. Previous research identified a 330-gene cluster commonly resp...
Ovar-DRB1 haplotypes *2001 and *0301 are associated with sheep growth and ewe lifetime prolificacy
USDA-ARS?s Scientific Manuscript database
Background: The major histocompatibility complex (MHC) is an organized cluster of tightly linked vertebrate genes with immunological and non-immunological functions. While the important MHC gene DRB1 has been examined in regard to many sheep infectious disease traits, only one study, based on micros...
Patel, Vidushi S; Ezaz, Tariq; Deakin, Janine E; Graves, Jennifer A Marshall
2010-12-01
The haemoglobin protein, required for oxygen transportation in the body, is encoded by α- and β-globin genes that are arranged in clusters. The transpositional model for the evolution of distinct α-globin and β-globin clusters in amniotes is much simpler than the previously proposed whole genome duplication model. According to this model, all jawed vertebrates share one ancient region containing α- and β-globin genes and several flanking genes in the order MPG-C16orf35-(α-β)-GBY-LUC7L that has been conserved for more than 410 million years, whereas amniotes evolved a distinct β-globin cluster by insertion of a transposed β-globin gene from this ancient region into a cluster of olfactory receptors flanked by CCKBR and RRM1. It could not be determined whether this organisation is conserved in all amniotes because of the paucity of information from non-avian reptiles. To fill in this gap, we examined globin gene organisation in a squamate reptile, the Australian bearded dragon lizard, Pogona vitticeps (Agamidae). We report here that the α-globin cluster (HBK, HBA) is flanked by C16orf35 and GBY and is located on a pair of microchromosomes, whereas the β-globin cluster is flanked by RRM1 on the 3' end and is located on the long arm of chromosome 3. However, the CCKBR gene that flanks the β-globin cluster on the 5' end in other amniotes is located on the short arm of chromosome 5 in P. vitticeps, indicating that a chromosomal break between the β-globin cluster and CCKBR occurred at least in the agamid lineage. Our data from a reptile species provide further evidence to support the transpositional model for the evolution of β-globin gene cluster in amniotes.
Pan-genome and phylogeny of Bacillus cereus sensu lato.
Bazinet, Adam L
2017-08-02
Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.
Sugii, Yuh; Kasai, Tomonari; Ikeda, Masashi; Vaidyanath, Arun; Kumon, Kazuki; Mizutani, Akifumi; Seno, Akimasa; Tokutaka, Heizo; Kudoh, Takayuki; Seno, Masaharu
2016-01-01
To identify cell-specific markers, we designed a DNA microarray platform with oligonucleotide probes for human membrane-anchored proteins. Human glioma cell lines were analyzed using microarray and compared with normal and fetal brain tissues. For the microarray analysis, we employed a spherical self-organizing map, which is a clustering method suitable for the conversion of multidimensional data into two-dimensional data and displays the relationship on a spherical surface. Based on the gene expression profile, the cell surface characteristics were successfully mirrored onto the spherical surface, thereby distinguishing normal brain tissue from the disease model based on the strength of gene expression. The clustered glioma-specific genes were further analyzed by polymerase chain reaction procedure and immunocytochemical staining of glioma cells. Our platform and the following procedure were successfully demonstrated to categorize the genes coding for cell surface proteins that are specific to glioma cells. Our assessment demonstrates that a spherical self-organizing map is a valuable tool for distinguishing cell surface markers and can be employed in marker discovery studies for the treatment of cancer.
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.
Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi
2015-01-01
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability--the basis of cluster generation--is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.
Assiri, Abdullah S; El-Gamal, Basiouny A; Hafez, Elsayed E; Haidara, Mohamed A
2014-12-01
To produce an effective recombinant streptokinase (rSK) from pathogenic Streptococcus pyogenes isolate in yeast, and evaluate its potential for thrombolytic therapy. This study was conducted from November 2012 to December 2013 at King Khalid University, Abha, Kingdom of Saudi Arabia (KSA). Throat swabs collected from 45 pharyngitis patients in Asser Central Hospital, Abha, KSA were used to isolate Streptococcus pyogenes. The bacterial DNA was used for amplification of the streptokinase gene (1200 bp). The gene was cloned and in vitro transcribed in an eukaryotic expression vector that was transformed into yeast Pichia pastoris SMD1168, and the rSK protein was purified and tested for its thrombolytic activity. The Streptococcus pyogenes strain was isolated and its DNA nucleotide sequence revealed similarity to other Streptococcus pyogenes in the Gene bank. Sequencing of the amplified gene based on DNA nucleotide sequence revealed a SK gene closely related to other SK genes in the Gene bank. However, based on deduced amino acids sequence, the gene formed a separate cluster different from clusters formed by other examined genes, suggesting a new bacterial isolate and accordingly a new gene. The purified protein showed 82% clot lysis compared to a commercial SK (81%) at an enzyme concentration of 2000 U/ml. The present yeast rSK showed similar thrombolytic activity in vitro as that of a commercial SK, suggesting its potential for thrombolytic therapy and large scale production.
Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.
Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P
2017-11-23
The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In addition, the data provides additional evidence in favor of and against the similarity-based functions assigned to uncharacterized genes.
Heterologous Production of a Novel Cyclic Peptide Compound, KK-1, in Aspergillus oryzae.
Yoshimi, Akira; Yamaguchi, Sigenari; Fujioka, Tomonori; Kawai, Kiyoshi; Gomi, Katsuya; Machida, Masayuki; Abe, Keietsu
2018-01-01
A novel cyclic peptide compound, KK-1, was originally isolated from the plant-pathogenic fungus Curvularia clavata . It consists of 10 amino acid residues, including five N -methylated amino acid residues, and has potent antifungal activity. Recently, the genome-sequencing analysis of C. clavata was completed, and the biosynthetic genes involved in KK-1 production were predicted by using a novel gene cluster mining tool, MIDDAS-M. These genes form an approximately 75-kb cluster, which includes nine open reading frames, containing a non-ribosomal peptide synthetase (NRPS) gene. To determine whether the predicted genes were responsible for the biosynthesis of KK-1, we performed heterologous production of KK-1 in Aspergillus oryzae by introduction of the cluster genes into the genome of A. oryzae . The NRPS gene was split in two fragments and then reconstructed in the A. oryzae genome, because the gene was quite large (approximately 40 kb). The remaining seven genes in the cluster, excluding the regulatory gene kkR , were simultaneously introduced into the strain of A. oryzae in which NRPS had already been incorporated. To evaluate the heterologous production of KK-1 in A. oryzae , gene expression was analyzed by RT-PCR and KK-1 productivity was quantified by HPLC. KK-1 was produced in variable quantities by a number of transformed strains, along with expression of the cluster genes. The amount of KK-1 produced by the strain with the greatest expression of all genes was lower than that produced by the original producer, C. clavata . Therefore, expression of the cluster genes is necessary and sufficient for the heterologous production of KK-1 in A. oryzae , although there may be unknown factors limiting productivity in this species.
Heterologous Production of a Novel Cyclic Peptide Compound, KK-1, in Aspergillus oryzae
Yoshimi, Akira; Yamaguchi, Sigenari; Fujioka, Tomonori; Kawai, Kiyoshi; Gomi, Katsuya; Machida, Masayuki; Abe, Keietsu
2018-01-01
A novel cyclic peptide compound, KK-1, was originally isolated from the plant-pathogenic fungus Curvularia clavata. It consists of 10 amino acid residues, including five N-methylated amino acid residues, and has potent antifungal activity. Recently, the genome-sequencing analysis of C. clavata was completed, and the biosynthetic genes involved in KK-1 production were predicted by using a novel gene cluster mining tool, MIDDAS-M. These genes form an approximately 75-kb cluster, which includes nine open reading frames, containing a non-ribosomal peptide synthetase (NRPS) gene. To determine whether the predicted genes were responsible for the biosynthesis of KK-1, we performed heterologous production of KK-1 in Aspergillus oryzae by introduction of the cluster genes into the genome of A. oryzae. The NRPS gene was split in two fragments and then reconstructed in the A. oryzae genome, because the gene was quite large (approximately 40 kb). The remaining seven genes in the cluster, excluding the regulatory gene kkR, were simultaneously introduced into the strain of A. oryzae in which NRPS had already been incorporated. To evaluate the heterologous production of KK-1 in A. oryzae, gene expression was analyzed by RT-PCR and KK-1 productivity was quantified by HPLC. KK-1 was produced in variable quantities by a number of transformed strains, along with expression of the cluster genes. The amount of KK-1 produced by the strain with the greatest expression of all genes was lower than that produced by the original producer, C. clavata. Therefore, expression of the cluster genes is necessary and sufficient for the heterologous production of KK-1 in A. oryzae, although there may be unknown factors limiting productivity in this species. PMID:29686660
Darbani, Behrooz; Motawia, Mohammed Saddik; Olsen, Carl Erik; Nour-Eldin, Hussam H.; Møller, Birger Lindberg; Rook, Fred
2016-01-01
Genomic gene clusters for the biosynthesis of chemical defence compounds are increasingly identified in plant genomes. We previously reported the independent evolution of biosynthetic gene clusters for cyanogenic glucoside biosynthesis in three plant lineages. Here we report that the gene cluster for the cyanogenic glucoside dhurrin in Sorghum bicolor additionally contains a gene, SbMATE2, encoding a transporter of the multidrug and toxic compound extrusion (MATE) family, which is co-expressed with the biosynthetic genes. The predicted localisation of SbMATE2 to the vacuolar membrane was demonstrated experimentally by transient expression of a SbMATE2-YFP fusion protein and confocal microscopy. Transport studies in Xenopus laevis oocytes demonstrate that SbMATE2 is able to transport dhurrin. In addition, SbMATE2 was able to transport non-endogenous cyanogenic glucosides, but not the anthocyanin cyanidin 3-O-glucoside or the glucosinolate indol-3-yl-methyl glucosinolate. The genomic co-localisation of a transporter gene with the biosynthetic genes producing the transported compound is discussed in relation to the role self-toxicity of chemical defence compounds may play in the formation of gene clusters. PMID:27841372
Clustered Integrin Ligands as a Novel Approach for the Targeting of Non-Viral Vectors
NASA Astrophysics Data System (ADS)
Ng, Quinn Kwan Tai
Gene transfer or gene delivery is described as the process in which foreign DNA is introduced into cells. Over the years, gene delivery has gained the attention of many researchers and has been developed as powerful tools for use in biotechnology and medicine. With the completion of the Human Genome Project, such advances in technology allowed for the identification of diseases ranging from hereditary disorders to acquired ones (cancer) which were thought to be incurable. Gene therapy provides the means necessary to treat or eliminate genetic diseases from its origin, unlike traditional medicine which only treat symptoms. With ongoing clinical trials for gene therapy increasing, the greatest difficulty still lies in developing safe systems which can target cells of interest to provide efficient delivery. Nature, over millions of years of evolution, has provided an example of one of the most efficient delivery systems: viruses. Although the use of viruses for gene delivery has been well studied, the safety issues involving immunogenicity, insertional mutagenesis, high cost, and poor reproducibility has provided problems for their clinical application. From understanding viruses, we gain insight to designing new systems for non-viral gene delivery. One of these techniques utilized by adenoviruses is the clustering of ligands on its surface through the use of a protein called a penton base. Through the use of nanotechnology we can mimic this basic concept in non-viral gene delivery systems. This dissertation research is focused on developing and applying a novel system for displaying the integrin binding ligand (RGD) in a constrained manner to form a clustered integrin ligand binding platform to be used to enhance the targeting and efficiency of non-viral gene delivery vectors. Peptide mixed monolayer protected gold nanoparticles provides a suitable surface for ligand clustering. A relationship between the peptide ratios in the reaction solution used to form these ligand clusters compared to the reacted amounts on the surface of the particle was studied. This provided us the ability to control the size of the clusters formed and the spacing between the integrins for gold nanoparticles of various sizes. We then applied the clustered ligand binding system for targeting of DNA/PEI polyplexes and demonstrated that the use of RGD nanoclusters enhances gene transfer up to 35-fold which was dependent on the density of alphavbeta3 integrins on the cell surface. Cell integrin sensitivity was shown in which cells with higher alpha vbeta3 densities resulting in higher luciferase transgene expression. The targeting of RGD nanoclusters for DNA/PEI polyplexes was further shown in vivo using PET/CT technology which displayed improved targeting towards high level alphavbeta3 integrin expression (U87MG) tumors over medium level alphavbeta 3 integrin expression (HeLa). In addition to studying the clustered integrin binding system, the current non-viral vectors used suffer from stability and toxicity issues in vitro and in vivo. We have applied a new chemistry for synthesizing nanogels utilizing a Traut's reagent initiated Michael addition reaction for modification of diamine containing crosslikers which will allow for the development of stable and cell demanded release of oligonucleotides. We have shown bulk gels made were capable of encapsulating and holding DNA within the gel and were able to synthesize them into nanogels. The combined research shown here using clustered integrin ligands and a new type of nanogel synthesis provides an ideal system for gene delivery in the future.
Analysis of bHLH coding genes using gene co-expression network approach.
Srivastava, Swati; Sanchita; Singh, Garima; Singh, Noopur; Srivastava, Gaurava; Sharma, Ashok
2016-07-01
Network analysis provides a powerful framework for the interpretation of data. It uses novel reference network-based metrices for module evolution. These could be used to identify module of highly connected genes showing variation in co-expression network. In this study, a co-expression network-based approach was used for analyzing the genes from microarray data. Our approach consists of a simple but robust rank-based network construction. The publicly available gene expression data of Solanum tuberosum under cold and heat stresses were considered to create and analyze a gene co-expression network. The analysis provide highly co-expressed module of bHLH coding genes based on correlation values. Our approach was to analyze the variation of genes expression, according to the time period of stress through co-expression network approach. As the result, the seed genes were identified showing multiple connections with other genes in the same cluster. Seed genes were found to be vary in different time periods of stress. These analyzed seed genes may be utilized further as marker genes for developing the stress tolerant plant species.
Spohn, Marius; Kirchner, Norbert; Kulik, Andreas; Jochim, Angelika; Wolf, Felix; Muenzer, Patrick; Borst, Oliver; Gross, Harald; Wohlleben, Wolfgang
2014-01-01
The emergence of antibiotic-resistant pathogenic bacteria within the last decades is one reason for the urgent need for new antibacterial agents. A strategy to discover new anti-infective compounds is the evaluation of the genetic capacity of secondary metabolite producers and the activation of cryptic gene clusters (genome mining). One genus known for its potential to synthesize medically important products is Amycolatopsis. However, Amycolatopsis japonicum does not produce an antibiotic under standard laboratory conditions. In contrast to most Amycolatopsis strains, A. japonicum is genetically tractable with different methods. In order to activate a possible silent glycopeptide cluster, we introduced a gene encoding the transcriptional activator of balhimycin biosynthesis, the bbr gene from Amycolatopsis balhimycina (bbrAba), into A. japonicum. This resulted in the production of an antibiotically active compound. Following whole-genome sequencing of A. japonicum, 29 cryptic gene clusters were identified by genome mining. One of these gene clusters is a putative glycopeptide biosynthesis gene cluster. Using bioinformatic tools, ristomycin (syn. ristocetin), a type III glycopeptide, which has antibacterial activity and which is used for the diagnosis of von Willebrand disease and Bernard-Soulier syndrome, was deduced as a possible product of the gene cluster. Chemical analyses by high-performance liquid chromatography and mass spectrometry (HPLC-MS), tandem mass spectrometry (MS/MS), and nuclear magnetic resonance (NMR) spectroscopy confirmed the in silico prediction that the recombinant A. japonicum/pRM4-bbrAba synthesizes ristomycin A. PMID:25114137
Elmore, M Holly; McGary, Kriston L; Wisecaver, Jennifer H; Slot, Jason C; Geiser, David M; Sink, Stacy; O'Donnell, Kerry; Rokas, Antonis
2015-02-06
Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trace its evolution across Ascomycetes, and examine the evolutionary dynamics of its spread among lineages of the Fusarium oxysporum species complex (hereafter referred to as the FOSC), a cosmopolitan clade of purportedly clonal vascular wilt plant pathogens. Phylogenetic analysis of fungal cyanase and carbonic anhydrase genes reveals that the CCA gene cluster arose independently at least twice and is now present in three lineages, namely Cochliobolus lunatus, Oidiodendron maius, and the FOSC. Genome-wide surveys within the FOSC indicate that the CCA gene cluster varies in copy number across isolates, is always located on accessory chromosomes, and is absent in FOSC's closest relatives. Phylogenetic reconstruction of the CCA gene cluster in 163 FOSC strains from a wide variety of hosts suggests a recent history of rampant transfers between isolates. We hypothesize that the independent formation of the CCA gene cluster in different fungal lineages and its spread across FOSC strains may be associated with resistance to plant-produced cyanates or to use of cyanate fungicides in agriculture. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Jensen, Philip J; Fazio, Gennaro; Altman, Naomi; Praul, Craig; McNellis, Timothy W
2014-04-04
Apple tree breeding is slow and difficult due to long generation times, self-incompatibility, and complex genetics. The identification of molecular markers linked to traits of interest is a way to expedite the breeding process. In the present study, we aimed to identify genes whose steady-state transcript abundance was associated with inheritance of specific traits segregating in an apple (Malus × domestica) rootstock F1 breeding population, including resistance to powdery mildew (Podosphaera leucotricha) disease and woolly apple aphid (Eriosoma lanigerum). Transcription profiling was performed for 48 individual F1 apple trees from a cross of two highly heterozygous parents, using RNA isolated from healthy, actively-growing shoot tips and a custom apple DNA oligonucleotide microarray representing 26,000 unique transcripts. Genome-wide expression profiles were not clear indicators of powdery mildew or woolly apple aphid resistance phenotype. However, standard differential gene expression analysis between phenotypic groups of trees revealed relatively small sets of genes with trait-associated expression levels. For example, thirty genes were identified that were differentially expressed between trees resistant and susceptible to powdery mildew. Interestingly, the genes encoding twenty-four of these transcripts were physically clustered on chromosome 12. Similarly, seven genes were identified that were differentially expressed between trees resistant and susceptible to woolly apple aphid, and the genes encoding five of these transcripts were also clustered, this time on chromosome 17. In each case, the gene clusters were in the vicinity of previously identified major quantitative trait loci for the corresponding trait. Similar results were obtained for a series of molecular traits. Several of the differentially expressed genes were used to develop DNA polymorphism markers linked to powdery mildew disease and woolly apple aphid resistance. Gene expression profiling and trait-associated transcript analysis using an apple F1 population readily identified genes physically linked to powdery mildew disease resistance and woolly apple aphid resistance loci. This result was especially useful in apple, where extreme levels of heterozygosity make the development of reliable DNA markers quite difficult. The results suggest that this approach could prove effective in crops with complicated genetics, or for which few genomic information resources are available.
de Jonge, Ronnie; Ebert, Malaika K; Huitt-Roehl, Callie R; Pal, Paramita; Suttle, Jeffrey C; Spanner, Rebecca E; Neubauer, Jonathan D; Jurick, Wayne M; Stott, Karina A; Secor, Gary A; Thomma, Bart P H J; Van de Peer, Yves; Townsend, Craig A; Bolton, Melvin D
2018-06-12
Species in the genus Cercospora cause economically devastating diseases in sugar beet, maize, rice, soy bean, and other major food crops. Here, we sequenced the genome of the sugar beet pathogen Cercospora beticola and found it encodes 63 putative secondary metabolite gene clusters, including the cercosporin toxin biosynthesis ( CTB ) cluster. We show that the CTB gene cluster has experienced multiple duplications and horizontal transfers across a spectrum of plant pathogenic fungi, including the wide-host range Colletotrichum genus as well as the rice pathogen Magnaporthe oryzae Although cercosporin biosynthesis has been thought to rely on an eight-gene CTB cluster, our phylogenomic analysis revealed gene collinearity adjacent to the established cluster in all CTB cluster-harboring species. We demonstrate that the CTB cluster is larger than previously recognized and includes cercosporin facilitator protein, previously shown to be involved with cercosporin autoresistance, and four additional genes required for cercosporin biosynthesis, including the final pathway enzymes that install the unusual cercosporin methylenedioxy bridge. Lastly, we demonstrate production of cercosporin by Colletotrichum fioriniae , the first known cercosporin producer within this agriculturally important genus. Thus, our results provide insight into the intricate evolution and biology of a toxin critical to agriculture and broaden the production of cercosporin to another fungal genus containing many plant pathogens of important crops worldwide. Copyright © 2018 the Author(s). Published by PNAS.
Mumps virus F gene and HN gene sequencing as a molecular tool to study mumps virus transmission.
Gouma, Sigrid; Cremer, Jeroen; Parkkali, Saara; Veldhuijzen, Irene; van Binnendijk, Rob S; Koopmans, Marion P G
2016-11-01
Various mumps outbreaks have occurred in the Netherlands since 2004, particularly among persons who had received 2 doses of measles, mumps, and rubella (MMR) vaccination. Genomic typing of pathogens can be used to track outbreaks, but the established genotyping of mumps virus based on the small hydrophobic (SH) gene sequences did not provide sufficient resolution. Therefore, we expanded the sequencing to include fusion (F) gene and haemagglutinin-neuraminidase (HN) gene sequences in addition to the SH gene sequences from 109 mumps virus genotype G strains obtained between 2004 and mid 2015 in the Netherlands. When the molecular information from these 3 genes was combined, we were able to identify separate mumps virus clusters and track mumps virus transmission. The analyses suggested that multiple mumps virus introductions occurred in the Netherlands between 2004 and 2015 resulting in several mumps outbreaks throughout this period, whereas during some local outbreaks the molecular data pointed towards endemic circulation. Combined analysis of epidemiological data and sequence data collected in 2015 showed good support for the phylogenetic clustering. Copyright © 2016 Elsevier B.V. All rights reserved.
Frequent gene flow blurred taxonomic boundaries of sections in Lilium L. (Liliaceae)
Liu, Shih-Hui; Chiang, Tzen-Yuh
2017-01-01
Gene flow between species may last a long time in plants. Reticulation inevitably causes difficulties in phylogenetic reconstruction. In this study, we looked into the genetic divergence and phylogeny of 20 Lilium species based on multilocus analyses of 8 genes of chloroplast DNA (cpDNA), the internally transcribed nuclear ribosomal DNA (nrITS) spacer and 20 loci extracted from the expressed sequence tag (EST) libraries of L. longiflorum Thunb. and L. formosanum Wallace. The phylogeny based on the combined data of the maternally inherited cpDNA and nrITS was largely consistent with the taxonomy of Lilium sections. This phylogeny was deemed the hypothetical species tree and uncovered three groups, i.e., Cluster A consisting of 4 taxa from the sections Pseudolirium and Liriotypus, Cluster B consisting of the 4 taxa from the sections Leucolirion, Archelirion and Daurolirion, and Cluster C comprising 10 taxa mostly from the sections Martagon and Sinomartagon. In contrast, systematic inconsistency occurred across the EST loci, with up to 19 genes (95%) displaying tree topologies deviating from the hypothetical species tree. The phylogenetic incongruence was likely attributable to the frequent genetic exchanges between species/sections, as indicated by the high levels of genetic recombination and the IMa analyses with the EST loci. Nevertheless, multilocus analysis could provide complementary information among the loci on the species split and the extent of gene flow between the species. In conclusion, this study not only detected frequent gene flow among Lilium sections that resulted in phylogenetic incongruence but also reconstructed a hypothetical species tree that gave insights into the nature of the complex relationships among Lilium species. PMID:28841664
NASA Astrophysics Data System (ADS)
Ballantyne, F.; Medeiros, P. M.; Moran, M. A.; Song, C.; Whitman, W. B.; Washington, B.; Yu, M.; Lee, J.
2017-12-01
Despite the advent of methods enabling high resolution characterization of metabolic activity and of organic matter, linking microbial metabolism to organic matter transformations remains a challenge. By sequencing metatranscriptomes and using Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICR-MS) to characterize organic matter (OM) at the beginning and at the end of incubations of estuarine water across tide and season, we sought to link observed a changes in OM composition to microbial metabolism. We used linear models and K means clustering to identify clusters of genes that responded coherently across season, which accounted for most of the variability in gene expression, over tidal regime, which explained the majority of the remaining variation, and over time during the 24 hour incubations. We used an approach from the field of signal processing, that to our knowledge has not been used to analyze FTICR-MS data, to identify formulae of compounds that changed in concentration during the incubations. This approach, based on the discrete wavelet transform (DWT), allowed us to overcome some of the challenges associated with analyzing FTICR-MS data: variable ionization of organic compounds, signal suppression by high concentration compounds, and uncertainty about how to normalize changes across spectra. We were able to link clusters of metabolic and transporter genes to changes in OM composition, and uniquely identify genes based on their cross correlation with changes in FTICR mass spectra. Our approach for analyzing FTICR- MS data enables more robust inference about OM transformations, and linking high resolution changes in gene expression and in OM data during incubations represents an important step toward formulating models of microbial metabolism relevant for predicting biogeochemically relevant C fluxes.
Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G
2014-08-28
The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.
Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J
2008-06-18
Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.
A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.
Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe
2018-02-01
In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.
[Chromosomal large fragment deletion induced by CRISPR/Cas9 gene editing system].
Cheng, L H; Liu, Y; Niu, T
2017-05-14
Objective: Using CRISPR-Cas9 gene editing technology to achieve a number of genes co-deletion on the same chromosome. Methods: CRISPR-Cas9 lentiviral plasmid that could induce deletion of Aloxe3-Alox12b-Alox8 cluster genes located on mouse 11B3 chromosome was constructed via molecular clone. HEK293T cells were transfected to package lentivirus of CRISPR or Cas9 cDNA, then mouse NIH3T3 cells were infected by lentivirus and genomic DNA of these cells was extracted. The deleted fragment was amplified by PCR, TA clone, Sanger sequencing and other techniques were used to confirm the deletion of Aloxe3-Alox12b-Alox8 cluster genes. Results: The CRISPR-Cas9 lentiviral plasmid, which could induce deletion of Aloxe3-Alox12b-Alox8 cluster genes, was successfully constructed. Deletion of target chromosome fragment (Aloxe3-Alox12b-Alox8 cluster genes) was verified by PCR. The deletion of Aloxe3-Alox12b-Alox8 cluster genes was affirmed by TA clone, Sanger sequencing, and the breakpoint junctions of the CRISPR-Cas9 system mediate cutting events were accurately recombined, insertion mutation did not occur between two cleavage sites at all. Conclusion: Large fragment deletion of Aloxe3-Alox12b-Alox8 cluster genes located on mouse chromosome 11B3 was successfully induced by CRISPR-Cas9 gene editing system.
2013-01-01
Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the Δglnrps4 and Δglpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom. PMID:23688303
de Marcos, Alberto; Triviño, Magdalena; Pérez-Bueno, María Luisa; Ballesteros, Isabel; Barón, Matilde; Mena, Montaña; Fenoll, Carmen
2015-01-01
Loss of function of the positive stomata development regulators SPCH or MUTE in Arabidopsis thaliana renders stomataless plants; spch-3 and mute-3 mutants are extreme dwarfs, but produce cotyledons and tiny leaves, providing a system to interrogate plant life in the absence of stomata. To this end, we compared their cotyledon transcriptomes with that of wild-type plants. K-means clustering of differentially expressed genes generated four clusters: clusters 1 and 2 grouped genes commonly regulated in the mutants, while clusters 3 and 4 contained genes distinctively regulated in mute-3. Classification in functional categories and metabolic pathways of genes in clusters 1 and 2 suggested that both mutants had depressed secondary, nitrogen and sulfur metabolisms, while only a few photosynthesis-related genes were down-regulated. In situ quenching analysis of chlorophyll fluorescence revealed limited inhibition of photosynthesis. This and other fluorescence measurements matched the mutant transcriptomic features. Differential transcriptomes of both mutants were enriched in growth-related genes, including known stomata development regulators, which paralleled their epidermal phenotypes. Analysis of cluster 3 was not informative for developmental aspects of mute-3. Cluster 4 comprised genes differentially up−regulated in mute−3, 35% of which were direct targets for SPCH and may relate to the unique cell types of mute−3. A screen of T-DNA insertion lines in genes differentially expressed in the mutants identified a gene putatively involved in stomata development. A collection of lines for conditional overexpression of transcription factors differentially expressed in the mutants rendered distinct epidermal phenotypes, suggesting that these proteins may be novel stomatal development regulators. Thus, our transcriptome analysis represents a useful source of new genes for the study of stomata development and for characterizing physiology and growth in the absence of stomata. PMID:26157447
Lim, Wan'E; Kwan, Jia Lin; Goh, Liang Kee; Beuerman, Roger W; Barathi, Veluchamy A
2012-01-01
The aim of this study was to identify the genes and pathways underlying the growth of the mouse sclera during postnatal development. Total RNA was isolated from each of 30 single mouse sclera (n=30, 6 sclera each from 1-, 2-, 3-, 6-, and 8-week-old mice) and reverse-transcribed into cDNA using a T7-N(6) primer. The resulting cDNA was fragmented, labeled with biotin, and hybridized to a Mouse Gene 1.0 ST Array. ANOVA analysis was then performed using Partek Genomic Suite 6.5 beta and differentially expressed transcript clusters were filtered based on a selection criterion of ≥ 2 relative fold change at a false discovery rate of ≤ 5%. Genes identified as involved in the main biologic processes during postnatal scleral development were further confirmed using qPCR. A possible pathway that contributes to the postnatal development of the sclera was investigated using Ingenuity Pathway Analysis software. The hierarchical clustering of all time points showed that they did not cluster according to age. The highest number of differentially expressed transcript clusters was found when week 1 and week 2 old scleral tissues were compared. The peroxisome proliferator- activated receptor gamma coactivator 1-alpha (Ppargc1a) gene was found to be involved in the networks generated using Ingenuity Pathway Studio (IPA) from the differentially expressed transcript cluster lists of week 2 versus 1, week 3 versus 2, week 6 versus 3, and week 8 versus 6. The gene expression of Ppargc1a varied during scleral growth from week 1 to 2, week 2 to 3, week 3 to 6, and week 6 to 8 and was found to interact with a different set of genes at different scleral growth stages. Therefore, this indicated that Ppargc1a might play a role in scleral growth during postnatal weeks 1 to 8. Gene expression of eye diseases should be studied as early as postnatal weeks 1-2 to ensure that any changes in gene expression pattern during disease development are detected. In addition, we propose that Ppargc1a might play a role in regulating postnatal scleral development by interacting with a different set of genes at different scleral growth stages.
Evaluation of gene expression profiles and pathways underlying postnatal development in mouse sclera
Lim, Wan’E.; Kwan, Jia Lin; Goh, Liang Kee; Beuerman, Roger W.
2012-01-01
Purpose The aim of this study was to identify the genes and pathways underlying the growth of the mouse sclera during postnatal development. Methods Total RNA was isolated from each of 30 single mouse sclera (n=30, 6 sclera each from 1-, 2-, 3-, 6-, and 8-week-old mice) and reverse-transcribed into cDNA using a T7-N6 primer. The resulting cDNA was fragmented, labeled with biotin, and hybridized to a Mouse Gene 1.0 ST Array. ANOVA analysis was then performed using Partek Genomic Suite 6.5 beta and differentially expressed transcript clusters were filtered based on a selection criterion of ≥2 relative fold change at a false discovery rate of ≤5%. Genes identified as involved in the main biologic processes during postnatal scleral development were further confirmed using qPCR. A possible pathway that contributes to the postnatal development of the sclera was investigated using Ingenuity Pathway Analysis software. Results The hierarchical clustering of all time points showed that they did not cluster according to age. The highest number of differentially expressed transcript clusters was found when week 1 and week 2 old scleral tissues were compared. The peroxisome proliferator- activated receptor gamma coactivator 1-alpha (Ppargc1a) gene was found to be involved in the networks generated using Ingenuity Pathway Studio (IPA) from the differentially expressed transcript cluster lists of week 2 versus 1, week 3 versus 2, week 6 versus 3, and week 8 versus 6. The gene expression of Ppargc1a varied during scleral growth from week 1 to 2, week 2 to 3, week 3 to 6, and week 6 to 8 and was found to interact with a different set of genes at different scleral growth stages. Therefore, this indicated that Ppargc1a might play a role in scleral growth during postnatal weeks 1 to 8. Conclusions Gene expression of eye diseases should be studied as early as postnatal weeks 1–2 to ensure that any changes in gene expression pattern during disease development are detected. In addition, we propose that Ppargc1a might play a role in regulating postnatal scleral development by interacting with a different set of genes at different scleral growth stages. PMID:22736935
Serial analysis of gene expression (SAGE) in normal human trabecular meshwork.
Liu, Yutao; Munro, Drew; Layfield, David; Dellinger, Andrew; Walter, Jeffrey; Peterson, Katherine; Rickman, Catherine Bowes; Allingham, R Rand; Hauser, Michael A
2011-04-08
To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma. Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map. A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified. This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.
Lan, Hong; Chen, Hui; Chen, Li-Cheng; Wang, Bei-Bing; Sun, Li; Ma, Mei-Ying; Fang, Sheng-Guo; Wan, Qiu-Hong
2014-01-01
Defensins play a key role in the innate immunity of various organisms. Detailed genomic studies of the defensin cluster have only been reported in a limited number of birds. Herein, we present the first characterization of defensins in a Pelecaniformes species, the crested ibis (Nipponia nippon), which is one of the most endangered birds in the world. We constructed bacterial artificial chromosome libraries, including a 4D-PCR library and a reverse-4D library, which provide at least 40 equivalents of this rare bird's genome. A cluster including 14 β-defensin loci within 129 kb was assigned to chromosome 3 by FISH, and one gene duplication of AvBD1 was found. The ibis defensin genes are characterized by multiform gene organization ranging from two to four exons through extensive exon fusion. Splicing signal variations and alternative splice variants were also found. Comparative analysis of four bird species identified one common and multiple species-specific duplications, which might be associated with high GC content. Evolutionary analysis revealed birth-and-death mode and purifying selection for avian defensin evolution, resulting in different defensin gene numbers among bird species and functional conservation within orthologous genes, respectively. Additionally, we propose various directions for further research on genetic conservation in the crested ibis. PMID:25372018
USDA-ARS?s Scientific Manuscript database
Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trac...
The impact of polyploidy on the evolution of a complex NB-LRR resistance gene cluster in soybean
USDA-ARS?s Scientific Manuscript database
A comparative genomics approach was used to investigate the evolution of a complex NB-LRR gene cluster found in soybean (Glycine max), common bean (Phaseolus vulgaris), and other legumes. In soybean, the cluster is associated with several disease resistance (R) genes of known function including Rpg1...
Callejón, R; Halajian, A; de Rojas, M; Marrugal, A; Guevara, D; Cutillas, C
2012-05-25
Comparative morphological, biometrical and molecular studies of Trichuris discolor isolated from Bos taurus from Spain and Iran was carried out. Furthermore, Trichuris ovis isolated from B. taurus and Capra hircus from Spain has been, molecularly, analyzed. Morphological studies revealed clear differences between T. ovis and T. discolor isolated from B. taurus but differences were not observed between populations of T. discolor isolated from different geographical regions. Nevertheless, the molecular studies based on the amplification and sequencing of the internal transcribed spacers 1 and 2 ribosomal DNA and 16S partial gene mitochondrial DNA showed clear differences between both populations of T. discolor from Spain and Iran suggesting two cryptic species. Phylogenetic studies corroborated these data. Thus, phylogenetic trees based on ITS1, ITS2 and 16S partial gene sequences showed that individuals of T. discolor from B. taurus from Iran clustered together and separated, with high bootstrap values, of T. discolor isolated from B. taurus from Spain, while populations of T. ovis from B. taurus and C. hircus from Spain clustered together but separated with high bootstrap values of both populations of T. discolor. Furthermore, a comparative phylogenetic study has been carried out with the ITS1and ITS2 sequences of Trichuris species from different hosts. Three clades were observed: the first clustered all the species of Trichuris parasitizing herbivores (T. discolor, T. ovis, Trichuris leporis and Trichuris skrjabini), the second clustered all the species of Trichuris parasitizing omnivores (Trichuris trichiura and Trichuris suis) and finally, the third clustered species of Trichuris parasitizing carnivores (Trichuris muris, Trichuris arvicolae and Trichuris vulpis). Copyright © 2011 Elsevier B.V. All rights reserved.
Kim, Jun-Mo; Lim, Kyu-Sang; Byun, Mijeong; Lee, Kyung-Tai; Yang, Young-Rok; Park, Mina; Lim, Dajeong; Chai, Han-Ha; Bang, Han-Tae; Hwangbo, Jong; Choi, Yang-Ho; Cho, Yong-Min; Park, Jong-Eun
2017-11-01
White Pekin duck is an important meat resource in the livestock industries. However, the temperature increase due to global warming has become a serious environmental factor in duck production, because of hyperthermia. Therefore, identifying the gene regulations and understanding the molecular mechanism for adaptation to the warmer environment will provide insightful information on the acclimation system of ducks. This study examined transcriptomic responses to heat stress treatments (3 and 6 h at 35 °C) and control (C, 25 °C) using RNA-sequencing analysis of genes from the breast muscle tissue. Based on three distinct differentially expressed gene (DEG) sets (3H/C, 6H/C, and 6H/3H), the expression patterns of significant DEGs (absolute log2 > 1.0 and false discovery rate < 0.05) were clustered into three responsive gene groups divided into upregulated and downregulated genes. Next, we analyzed the clusters that showed relatively higher expression levels in 3H/C and lower levels in 6H/C with much lower or opposite levels in 6H/3H; we referred to these clusters as the adaptable responsive gene group. These genes were significantly enriched in the ErbB signaling pathway, neuroactive ligand-receptor interaction and type II diabetes mellitus in the KEGG pathways (P < 0.01). From the functional enrichment analysis and significantly regulated genes observed in the enriched pathways, we think that the adaptable responsive genes are responsible for the acclimation mechanism of ducks and suggest that the regulation of phosphoinositide 3-kinase genes including PIK3R6, PIK3R5, and PIK3C2B has an important relationship with the mechanisms of adaptation to heat stress in ducks.
Nasr Esfahani, Bahram; Moghim, Sharareh; Ghasemian Safaei, Hajieh; Moghoofei, Mohsen; Sedighi, Mansour; Hadifar, Shima
2016-01-01
Background Taxonomic and phylogenetic studies of Mycobacterium species have been based around the 16sRNA gene for many years. However, due to the high strain similarity between species in the Mycobacterium genus (94.3% - 100%), defining a valid phylogenetic tree is difficult; consequently, its use in estimating the boundaries between species is limited. The sequence of the rpoB gene makes it an appropriate gene for phylogenetic analysis, especially in bacteria with limited variation. Objectives In the present study, a 360bp sequence of rpoB was used for precise classification of Mycobacterium strains isolated in Isfahan, Iran. Materials and Methods From February to October 2013, 57 clinical and environmental isolates were collected, subcultured, and identified by phenotypic methods. After DNA extraction, a 360bp fragment was PCR-amplified and sequenced. The phylogenetic tree was constructed based on consensus sequence data, using MEGA5 software. Results Slow and fast-growing groups of the Mycobacterium strains were clearly differentiated based on the constructed tree of 56 common Mycobacterium isolates. Each species with a unique title in the tree was identified; in total, 13 nods with a bootstrap value of over 50% were supported. Among the slow-growing group was Mycobacterium kansasii, with M. tuberculosis in a cluster with a bootstrap value of 98% and M. gordonae in another cluster with a bootstrap value of 90%. In the fast-growing group, one cluster with a bootstrap value of 89% was defined, including all fast-growing members present in this study. Conclusions The results suggest that only the application of the rpoB gene sequence is sufficient for taxonomic categorization and definition of a new Mycobacterium species, due to its high resolution power and proper variation in its sequence (85% - 100%); the resulting tree has high validity. PMID:27284397
Wood, Gwendolyn E.; Haydock, Andrew K.; Leigh, John A.
2003-01-01
Methanococcus maripaludis is a mesophilic species of Archaea capable of producing methane from two substrates: hydrogen plus carbon dioxide and formate. To study the latter, we identified the formate dehydrogenase genes of M. maripaludis and found that the genome contains two gene clusters important for formate utilization. Phylogenetic analysis suggested that the two formate dehydrogenase gene sets arose from duplication events within the methanococcal lineage. The first gene cluster encodes homologs of formate dehydrogenase α (FdhA) and β (FdhB) subunits and a putative formate transporter (FdhC) as well as a carbonic anhydrase analog. The second gene cluster encodes only FdhA and FdhB homologs. Mutants lacking either fdhA gene exhibited a partial growth defect on formate, whereas a double mutant was completely unable to grow on formate as a sole methanogenic substrate. Investigation of fdh gene expression revealed that transcription of both gene clusters is controlled by the presence of H2 and not by the presence of formate. PMID:12670979
Liu, Yonghong; Liu, Yuanyuan; Wu, Jiaming; Roizman, Bernard; Zhou, Grace Guoying
2018-04-03
Analyses of the levels of mRNAs encoding IFIT1, IFI16, RIG-1, MDA5, CXCL10, LGP2, PUM1, LSD1, STING, and IFNβ in cell lines from which the gene encoding LGP2, LSD1, PML, HDAC4, IFI16, PUM1, STING, MDA5, IRF3, or HDAC 1 had been knocked out, as well as the ability of these cell lines to support the replication of HSV-1, revealed the following: ( i ) Cell lines lacking the gene encoding LGP2, PML, or HDAC4 (cluster 1) exhibited increased levels of expression of partially overlapping gene networks. Concurrently, these cell lines produced from 5 fold to 12 fold lower yields of HSV-1 than the parental cells. ( ii ) Cell lines lacking the genes encoding STING, LSD1, MDA5, IRF3, or HDAC 1 (cluster 2) exhibited decreased levels of mRNAs of partially overlapping gene networks. Concurrently, these cell lines produced virus yields that did not differ from those produced by the parental cell line. The genes up-regulated in cell lines forming cluster 1, overlapped in part with genes down-regulated in cluster 2. The key conclusions are that gene knockouts and subsequent selection for growth causes changes in expression of multiple genes, and hence the phenotype of the cell lines cannot be ascribed to a single gene; the patterns of gene expression may be shared by multiple knockouts; and the enhanced immunity to viral replication by cluster 1 knockout cell lines but not by cluster 2 cell lines suggests that in parental cells, the expression of innate resistance to infection is specifically repressed.
Zhai, Ying; Bai, Silei; Liu, Jingjing; Yang, Liyuan; Han, Li; Huang, Xueshi; He, Jing
2016-04-22
Dithiolopyrrolone group antibiotics characterized by an electronically unique dithiolopyrrolone heterobicyclic core are known for their antibacterial, antifungal, insecticidal and antitumor activities. Recently the biosynthetic gene clusters for two dithiolopyrrolone compounds, holomycin and thiomarinol, have been identified respectively in different bacterial species. Here, we report a novel dithiolopyrrolone biosynthetic gene cluster (aut) isolated from Streptomyces thioluteus DSM 40027 which produces two pyrrothine derivatives, aureothricin and thiolutin. By comparison with other characterized dithiolopyrrolone clusters, eight genes in the aut cluster were verified to be responsible for the assembly of dithiolopyrrolone core. The aut cluster was further confirmed by heterologous expression and in-frame gene deletion experiments. Intriguingly, we found that the heterogenetic thioesterase HlmK derived from the holomycin (hlm) gene cluster in Streptomyces clavuligerus significantly improved heterologous biosynthesis of dithiolopyrrolones in Streptomyces albus through coexpression with the aut cluster. In the previous studies, HlmK was considered invalid because it has a Ser to Gly point mutation within the canonical Ser-His-Asp catalytic triad of thioesterases. However, gene inactivation and complementation experiments in our study unequivocally demonstrated that HlmK is an active distinctive type II thioesterase that plays a beneficial role in dithiolopyrrolone biosynthesis. Copyright © 2016 Elsevier Inc. All rights reserved.
Post-genome research on the biosynthesis of ergot alkaloids.
Li, Shu-Ming; Unsöld, Inge A
2006-10-01
Genome sequencing provides new opportunities and challenges for identifying genes for the biosynthesis of secondary metabolites. A putative biosynthetic gene cluster of fumigaclavine C, an ergot alkaloid of the clavine type, was identified in the genome sequence of ASPERGILLUS FUMIGATUS by a bioinformatic approach. This cluster spans 22 kb of genomic DNA and comprises at least 11 open reading frames (ORFs). Seven of them are orthologous to genes from the biosynthetic gene cluster of ergot alkaloids in CLAVICEPS PURPUREA. Experimental evidence of the identified cluster was provided by heterologous expression and biochemical characterization of two ORFs, FgaPT1 and FgaPT2, in the cluster of A. FUMIGATUS, which show remarkable similarities to dimethylallyltryptophan synthase from C. PURPUREA and function as prenyltransferases. FgaPT2 converts L-tryptophan to dimethylallyltryptophan and thereby catalyzes the first step of ergot alkaloid biosynthesis, whilst FgaPT1 catalyzes the last step of the fumigaclavine C biosynthesis, i. e., the prenylation of fumigaclavine A at C-2 position of the indole nucleus. In addition to information obtained from the gene cluster of ergot alkaloids from C. PURPUREA, the identification of the biosynthetic gene cluster of fumigaclavine C in A. FUMIGATUS opens an alternative way to study the biosynthesis of ergot alkaloids in fungi.
Genome Engineering and Modification Toward Synthetic Biology for the Production of Antibiotics.
Zou, Xuan; Wang, Lianrong; Li, Zhiqiang; Luo, Jie; Wang, Yunfu; Deng, Zixin; Du, Shiming; Chen, Shi
2018-01-01
Antibiotic production is often governed by large gene clusters composed of genes related to antibiotic scaffold synthesis, tailoring, regulation, and resistance. With the expansion of genome sequencing, a considerable number of antibiotic gene clusters has been isolated and characterized. The emerging genome engineering techniques make it possible towards more efficient engineering of antibiotics. In addition to genomic editing, multiple synthetic biology approaches have been developed for the exploration and improvement of antibiotic natural products. Here, we review the progress in the development of these genome editing techniques used to engineer new antibiotics, focusing on three aspects of genome engineering: direct cloning of large genomic fragments, genome engineering of gene clusters, and regulation of gene cluster expression. This review will not only summarize the current uses of genomic engineering techniques for cloning and assembly of antibiotic gene clusters or for altering antibiotic synthetic pathways but will also provide perspectives on the future directions of rebuilding biological systems for the design of novel antibiotics. © 2017 Wiley Periodicals, Inc.
Novel genomic island modifies DNA with 7-deazaguanine derivatives
Thiaville, Jennifer J.; Kellner, Stefanie M.; Yuan, Yifeng; Hutinet, Geoffrey; Thiaville, Patrick C.; Jumpathong, Watthanachai; Mohapatra, Susovan; Brochier-Armanet, Celine; Letarov, Andrey V.; Hillebrand, Roman; Malik, Chanchal K.; Rizzo, Carmelo J.; Dedon, Peter C.; de Crécy-Lagard, Valérie
2016-01-01
The discovery of ∼20-kb gene clusters containing a family of paralogs of tRNA guanosine transglycosylase genes, called tgtA5, alongside 7-cyano-7-deazaguanine (preQ0) synthesis and DNA metabolism genes, led to the hypothesis that 7-deazaguanine derivatives are inserted in DNA. This was established by detecting 2’-deoxy-preQ0 and 2’-deoxy-7-amido-7-deazaguanosine in enzymatic hydrolysates of DNA extracted from the pathogenic, Gram-negative bacteria Salmonella enterica serovar Montevideo. These modifications were absent in the closely related S. enterica serovar Typhimurium LT2 and from a mutant of S. Montevideo, each lacking the gene cluster. This led us to rename the genes of the S. Montevideo cluster as dpdA-K for 7-deazapurine in DNA. Similar gene clusters were analyzed in ∼150 phylogenetically diverse bacteria, and the modifications were detected in DNA from other organisms containing these clusters, including Kineococcus radiotolerans, Comamonas testosteroni, and Sphingopyxis alaskensis. Comparative genomic analysis shows that, in Enterobacteriaceae, the cluster is a genomic island integrated at the leuX locus, and the phylogenetic analysis of the TgtA5 family is consistent with widespread horizontal gene transfer. Comparison of transformation efficiencies of modified or unmodified plasmids into isogenic S. Montevideo strains containing or lacking the cluster strongly suggests a restriction–modification role for the cluster in Enterobacteriaceae. Another preQ0 derivative, 2’-deoxy-7-formamidino-7-deazaguanosine, was found in the Escherichia coli bacteriophage 9g, as predicted from the presence of homologs of genes involved in the synthesis of the archaeosine tRNA modification. These results illustrate a deep and unexpected evolutionary connection between DNA and tRNA metabolism. PMID:26929322
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
Unsupervised text mining for assessing and augmenting GWAS results.
Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence
2016-04-01
Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
The effects of old and recent migration waves in the distribution of HBB*S globin gene haplotypes
Lindenau, Juliana D.; Wagner, Sandrine C.; de Castro, Simone M.; Hutz, Mara H.
2016-01-01
Abstract Sickle cell hemoglobin is the result of a mutation at the sixth amino acid position of the beta (β) globin chain. The HBB*S gene is in linkage disequilibrium with five main haplotypes in the β-globin-like gene cluster named according to their ethnic and geographic origins: Bantu (CAR), Benin (BEN), Senegal (SEN), Cameroon (CAM) and Arabian-Indian (ARAB). These haplotypes demonstrated that the sickle cell mutation arose independently at least five times in human history. The distribution of βS haplotypes among Brazilian populations showed a predominance of the CAR haplotype. American populations were clustered in two groups defined by CAR or BEN haplotype frequencies. This scenario is compatible with historical records about the slave trade in the Americas. When all world populations where the sickle cell gene occurs were analyzed, three clusters were disclosed based on CAR, BEN or ARAB haplotype predominance. These patterns may change in the next decades due to recent migrations waves. Since these haplotypes show different clinical characteristics, these recent migrations events raise the necessity to develop optimized public health programs for sickle cell disease screening and management. PMID:27706371
Identifying driving gene clusters in complex diseases through critical transition theory
NASA Astrophysics Data System (ADS)
Wolanyk, Nathaniel; Wang, Xujing; Hessner, Martin; Gao, Shouguo; Chen, Ye; Jia, Shuang
A novel approach of looking at the human body using critical transition theory has yielded positive results: clusters of genes that act in tandem to drive complex disease progression. This cluster of genes can be thought of as the first part of a large genetic force that pushes the body from a curable, but sick, point to an incurable diseased point through a catastrophic bifurcation. The data analyzed is time course microarray blood assay data of 7 high risk individuals for Type 1 Diabetes who progressed into a clinical onset, with an additional larger study requested to be presented at the conference. The normalized data is 25,000 genes strong, which were narrowed down based on statistical metrics, and finally a machine learning algorithm using critical transition metrics found the driving network. This approach was created to be repeatable across multiple complex diseases with only progression time course data needed so that it would be applicable to identifying when an individual is at risk of developing a complex disease. Thusly, preventative measures can be enacted, and in the longer term, offers a possible solution to prevent all Type 1 Diabetes.
Arguedas-Villa, Carolina; Kovacevic, Jovana; Allen, Kevin J; Stephan, Roger; Tasara, Taurai
2014-06-01
Sixty-two strains of Listeria monocytogenes isolated in Canada and Switzerland were investigated. Comparison based on molecular genotypes confirmed that strains in these two countries are genetically diverse. Interestingly strains from both countries displayed similar range of cold growth phenotypic profiles. Based on cold growth lag phase duration periods displayed in BHI at 4 °C, the strains were similarly divided into groups of fast, intermediate and slow cold adaptors. Overall Swiss strains had faster exponential cold growth rates compared to Canadian strains. However gene expression analysis revealed no significant differences between fast and slow cold adapting strains in the ability to induce nine cold adaptation genes (lmo0501, cspA, cspD, gbuA, lmo0688, pgpH, sigB, sigH and sigL) in response to cold stress exposure. Neither was the presence of Stress survival islet 1 (SSI-1) analysed by PCR associated with enhanced cold adaptation. Phylogeny based on the sigL gene subdivided strains from these two countries into two major and one minor cluster. Fast cold adaptors were more frequently in one of the major clusters (cluster A), whereas slow cold adaptors were mainly in the other (cluster B). Genetic differences between these two major clusters are associated with various amino acid substitutions in the predicted SigL proteins. Compared to the EGDe type strain and most slow cold adaptors, most fast cold adaptors exhibited five identical amino acid substitutions (M90L, S203A/S203T, S304N, S315N, and I383T) in their SigL proteins. We hypothesize that these amino acid changes might be associated with SigL protein structural and functional changes that may promote differences in cold growth behaviour between L. monocytogenes strains. Copyright © 2014 Elsevier Ltd. All rights reserved.
Xu, Li; Han, Ting; Ge, Mei; Zhu, Li; Qian, XiuPing
2016-09-01
Analysis of the Amycolatopsis orientalis HCCB10007 genome revealed new gene clusters involved in natural product biosynthesis that were not associated with the production of known compounds. Halogenases are a type of tailoring enzymes that are usually found within these secondary gene clusters. In this study, we identified an indole-type halometabolite 6-chrolo-1H-indole-3-carboxamide, named LYXLF2, by whole genome mining and metabolic profiling of a flavin-dependent halogenase mutant. LYXLF2 is a new plant growth-regulating compound that promotes root elongation. The results of this study demonstrated that the special gene knock-out/comparative metabolic profiling approach provides a powerful tool for the discovery of novel natural products by genome mining.
Ding, Jiarui; Shah, Sohrab; Condon, Anne
2016-01-01
Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. Availability and Implementation: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev. Contact: condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153661
Finn, Roderick Nigel; Chauvigné, François; Hlidberg, Jón Baldur; Cutler, Christopher P.; Cerdà, Joan
2014-01-01
A major physiological barrier for aquatic organisms adapting to terrestrial life is dessication in the aerial environment. This barrier was nevertheless overcome by the Devonian ancestors of extant Tetrapoda, but the origin of specific molecular mechanisms that solved this water problem remains largely unknown. Here we show that an ancient aquaporin gene cluster evolved specifically in the sarcopterygian lineage, and subsequently diverged into paralogous forms of AQP2, -5, or -6 to mediate water conservation in extant Tetrapoda. To determine the origin of these apomorphic genomic traits, we combined aquaporin sequencing from jawless and jawed vertebrates with broad taxon assembly of >2,000 transcripts amongst 131 deuterostome genomes and developed a model based upon Bayesian inference that traces their convergent roots to stem subfamilies in basal Metazoa and Prokaryota. This approach uncovered an unexpected diversity of aquaporins in every lineage investigated, and revealed that the vertebrate superfamily consists of 17 classes of aquaporins (Aqp0 - Aqp16). The oldest orthologs associated with water conservation in modern Tetrapoda are traced to a cluster of three aqp2-like genes in Actinistia that likely arose >500 Ma through duplication of an aqp0-like gene present in a jawless ancestor. In sea lamprey, we show that aqp0 first arose in a protocluster comprised of a novel aqp14 paralog and a fused aqp01 gene. To corroborate these findings, we conducted phylogenetic analyses of five syntenic nuclear receptor subfamilies, which, together with observations of extensive genome rearrangements, support the coincident loss of ancestral aqp2-like orthologs in Actinopterygii. We thus conclude that the divergence of sarcopterygian-specific aquaporin gene clusters was permissive for the evolution of water conservation mechanisms that facilitated tetrapod terrestrial adaptation. PMID:25426855
Identifying pathogenic processes by integrating microarray data with prior knowledge
2014-01-01
Background It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping. Results Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation. Conclusion Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways. PMID:24758699
NASA Technical Reports Server (NTRS)
Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara
2000-01-01
We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.
Wang, Zhao-Xin; Li, Shu-Ming; Heide, Lutz
2000-01-01
The biosynthetic gene cluster of the aminocoumarin antibiotic coumermycin A1 was cloned by screening of a cosmid library of Streptomyces rishiriensis DSM 40489 with heterologous probes from a dTDP-glucose 4,6-dehydratase gene, involved in deoxysugar biosynthesis, and from the aminocoumarin resistance gyrase gene gyrBr. Sequence analysis of a 30.8-kb region upstream of gyrBr revealed the presence of 28 complete open reading frames (ORFs). Fifteen of the identified ORFs showed, on average, 84% identity to corresponding ORFs in the biosynthetic gene cluster of novobiocin, another aminocoumarin antibiotic. Possible functions of 17 ORFs in the biosynthesis of coumermycin A1 could be assigned by comparison with sequences in GenBank. Experimental proof for the function of the identified gene cluster was provided by an insertional gene inactivation experiment, which resulted in an abolishment of coumermycin A1 production. PMID:11036020
Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K
2003-11-01
Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
NASA Astrophysics Data System (ADS)
Bhajun, Ricky; Guyon, Laurent; Pitaval, Amandine; Sulpice, Eric; Combe, Stéphanie; Obeid, Patricia; Haguet, Vincent; Ghorbel, Itebeddine; Lajaunie, Christian; Gidrol, Xavier
2015-02-01
MiRNAs are key regulators of gene expression. By binding to many genes, they create a complex network of gene co-regulation. Here, using a network-based approach, we identified miRNA hub groups by their close connections and common targets. In one cluster containing three miRNAs, miR-612, miR-661 and miR-940, the annotated functions of the co-regulated genes suggested a role in small GTPase signalling. Although the three members of this cluster targeted the same subset of predicted genes, we showed that their overexpression impacted cell fates differently. miR-661 demonstrated enhanced phosphorylation of myosin II and an increase in cell invasion, indicating a possible oncogenic miRNA. On the contrary, miR-612 and miR-940 inhibit phosphorylation of myosin II and cell invasion. Finally, expression profiling in human breast tissues showed that miR-940 was consistently downregulated in breast cancer tissues
Liu, Yong; Wei, Wen-Ping; Ye, Bang-Ce
2018-05-18
The overexpression of bacterial secondary metabolite biosynthetic enzymes is the basis for industrial overproducing strains. Genome editing tools can be used to further improve gene expression and yield. Saccharopolyspora erythraea produces erythromycin, which has extensive clinical applications. In this study, the CRISPR-Cas9 system was used to edit genes in the S. erythraea genome. A temperature-sensitive plasmid containing the PermE promoter, to drive Cas9 expression, and the Pj23119 and PkasO promoters, to drive sgRNAs, was designed. Erythromycin esterase, encoded by S. erythraea SACE_1765, inactivates erythromycin by hydrolyzing the macrolactone ring. Sequencing and qRT-PCR confirmed that reporter genes were successfully inserted into the SACE_1765 gene. Deletion of SACE_1765 in a high-producing strain resulted in a 12.7% increase in erythromycin levels. Subsequent PermE- egfp knock-in at the SACE_0712 locus resulted in an 80.3% increase in erythromycin production compared with that of wild type. Further investigation showed that PermE promoter knock-in activated the erythromycin biosynthetic gene clusters at the SACE_0712 locus. Additionally, deletion of indA (SACE_1229) using dual sgRNA targeting without markers increased the editing efficiency to 65%. In summary, we have successfully applied Cas9-based genome editing to a bacterial strain, S. erythraea, with a high GC content. This system has potential application for both genome-editing and biosynthetic gene cluster activation in Actinobacteria.
Chen, Chao; Zhao, Xinqing; Jin, Yingyu; Zhao, Zongbao Kent; Suh, Joo-Won
2014-11-01
Bacterial artificial chromosomal (BAC) vectors are increasingly being used in cloning large DNA fragments containing complex biosynthetic pathways to facilitate heterologous production of microbial metabolites for drug development. To express inserted genes using Streptomyces species as the production hosts, an integration expression cassette is required to be inserted into the BAC vector, which includes genetic elements encoding a phage-specific attachment site, an integrase, an origin of transfer, a selection marker and a promoter. Due to the large sizes of DNA inserted into the BAC vectors, it is normally inefficient and time-consuming to assemble these fragments by routine PCR amplifications and restriction-ligations. Here we present a rapid method to insert fragments to construct BAC-based expression vectors. A DNA fragment of about 130 bp was designed, which contains upstream and downstream homologous sequences of both BAC vector and pIB139 plasmid carrying the whole integration expression cassette. In-Fusion cloning was performed using the designer DNA fragment to modify pIB139, followed by λ-RED-mediated recombination to obtain the BAC-based expression vector. We demonstrated the effectiveness of this method by rapid construction of a BAC-based expression vector with an insert of about 120 kb that contains the entire gene cluster for biosynthesis of immunosuppressant FK506. The empty BAC-based expression vector constructed in this study can be conveniently used for construction of BAC libraries using either microbial pure culture or environmental DNA, and the selected BAC clones can be directly used for heterologous expression. Alternatively, if a BAC library has already been constructed using a commercial BAC vector, the selected BAC vectors can be manipulated using the method described here to get the BAC-based expression vectors with desired gene clusters for heterologous expression. The rapid construction of a BAC-based expression vector facilitates heterologous expression of large gene clusters for drug discovery. Copyright © 2014 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mason, Olivia U.; Di Meo-Savoie, Carol A.; Van Nostrand, Joy D.
2008-09-30
We used molecular techniques to analyze basalts of varying ages that were collected from the East Pacific Rise, 9 oN, from the rift axis of the Juan de Fuca Ridge, and from neighboring seamounts. Cluster analysis of 16S rDNA Terminal Restriction Fragment Polymorphism data revealed that basalt endoliths are distinct from seawater and that communities clustered, to some degree, based on the age of the host rock. This age-based clustering suggests that alteration processes may affect community structure. Cloning and sequencing of bacterial and archaeal 16S rRNA genes revealed twelve different phyla and sub-phyla associated with basalts. These include themore » Gemmatimonadetes, Nitrospirae, the candidate phylum SBR1093 in the c, andin the Archaea Marine Benthic Group B, none of which have been previously reported in basalts. We delineated novel ocean crust clades in the gamma-Proteobacteria, Planctomycetes, and Actinobacteria that are composed entirely of basalt associated microflora, and may represent basalt ecotypes. Finally, microarray analysis of functional genes in basalt revealed that genes coding for previously unreported processes such as carbon fixation, methane-oxidation, methanogenesis, and nitrogen fixation are present, suggesting that basalts harbor previously unrecognized metabolic diversity. These novel processes could exert a profound influence on ocean chemistry.« less
Transcription factor clusters regulate genes in eukaryotic cells
Hedlund, Erik G; Friemann, Rosmarie; Hohmann, Stefan
2017-01-01
Transcription is regulated through binding factors to gene promoters to activate or repress expression, however, the mechanisms by which factors find targets remain unclear. Using single-molecule fluorescence microscopy, we determined in vivo stoichiometry and spatiotemporal dynamics of a GFP tagged repressor, Mig1, from a paradigm signaling pathway of Saccharomyces cerevisiae. We find the repressor operates in clusters, which upon extracellular signal detection, translocate from the cytoplasm, bind to nuclear targets and turnover. Simulations of Mig1 configuration within a 3D yeast genome model combined with a promoter-specific, fluorescent translation reporter confirmed clusters are the functional unit of gene regulation. In vitro and structural analysis on reconstituted Mig1 suggests that clusters are stabilized by depletion forces between intrinsically disordered sequences. We observed similar clusters of a co-regulatory activator from a different pathway, supporting a generalized cluster model for transcription factors that reduces promoter search times through intersegment transfer while stabilizing gene expression. PMID:28841133
2013-01-01
Background Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. Results This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Conclusions Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics. PMID:24134721
Stathakis, D. G.; Pentz, E. S.; Freeman, M. E.; Kullman, J.; Hankins, G. R.; Pearlson, N. J.; Wright, TRF.
1995-01-01
We report the complete molecular organization of the Dopa decarboxylase gene cluster. Mutagenesis screens recovered 77 new Df(2L)TW130 recessive lethal mutations. These new alleles combined with 263 previously isolated mutations in the cluster to define 18 essential genes. In addition, seven new deficiencies were isolated and characterized. Deficiency mapping, restriction fragment length polymorphism (RFLP) analysis and P-element-mediated germline transformation experiments determined the gene order for all 18 loci. Genomic and cDNA restriction endonuclease mapping, Northern blot analysis and DNA sequencing provided information on exact gene location, mRNA size and transcriptional direction for most of these loci. In addition, this analysis identified two transcription units that had not previously been identified by extensive mutagenesis screening. Most of the loci are contained within two dense subclusters. We discuss the effectiveness of mutagens and strategies used in our screens, the variable mutability of loci within the genome of Drosophila melanogaster, the cytological and molecular organization of the Ddc gene cluster, the validity of the one band-one gene hypothesis and a possible purpose for the clustering of genes in the Ddc region. PMID:8647399
Aguirre von Wobeser, Eneas; Ibelings, Bas W.; Bok, Jasper; Krasikov, Vladimir; Huisman, Jef; Matthijs, Hans C.P.
2011-01-01
Physiological adaptation and genome-wide expression profiles of the cyanobacterium Synechocystis sp. strain PCC 6803 in response to gradual transitions between nitrogen-limited and light-limited growth conditions were measured in continuous cultures. Transitions induced changes in pigment composition, light absorption coefficient, photosynthetic electron transport, and specific growth rate. Physiological changes were accompanied by reproducible changes in the expression of several hundred open reading frames, genes with functions in photosynthesis and respiration, carbon and nitrogen assimilation, protein synthesis, phosphorus metabolism, and overall regulation of cell function and proliferation. Cluster analysis of the nearly 1,600 regulated open reading frames identified eight clusters, each showing a different temporal response during the transitions. Two large clusters mirrored each other. One cluster included genes involved in photosynthesis, which were up-regulated during light-limited growth but down-regulated during nitrogen-limited growth. Conversely, genes in the other cluster were down-regulated during light-limited growth but up-regulated during nitrogen-limited growth; this cluster included several genes involved in nitrogen uptake and assimilation. These results demonstrate complementary regulation of gene expression for two major metabolic activities of cyanobacteria. Comparison with batch-culture experiments revealed interesting differences in gene expression between batch and continuous culture and illustrates that continuous-culture experiments can pick up subtle changes in cell physiology and gene expression. PMID:21205618
Glenn, Anthony E.; Davis, C. Britton; Gao, Minglu; Gold, Scott E.; Mitchell, Trevor R.; Proctor, Robert H.; Stewart, Jane E.; Snook, Maurice E.
2016-01-01
Microbes encounter a broad spectrum of antimicrobial compounds in their environments and often possess metabolic strategies to detoxify such xenobiotics. We have previously shown that Fusarium verticillioides, a fungal pathogen of maize known for its production of fumonisin mycotoxins, possesses two unlinked loci, FDB1 and FDB2, necessary for detoxification of antimicrobial compounds produced by maize, including the γ-lactam 2-benzoxazolinone (BOA). In support of these earlier studies, microarray analysis of F. verticillioides exposed to BOA identified the induction of multiple genes at FDB1 and FDB2, indicating the loci consist of gene clusters. One of the FDB1 cluster genes encoded a protein having domain homology to the metallo-β-lactamase (MBL) superfamily. Deletion of this gene (MBL1) rendered F. verticillioides incapable of metabolizing BOA and thus unable to grow on BOA-amended media. Deletion of other FDB1 cluster genes, in particular AMD1 and DLH1, did not affect BOA degradation. Phylogenetic analyses and topology testing of the FDB1 and FDB2 cluster genes suggested two horizontal transfer events among fungi, one being transfer of FDB1 from Fusarium to Colletotrichum, and the second being transfer of the FDB2 cluster from Fusarium to Aspergillus. Together, the results suggest that plant-derived xenobiotics have exerted evolutionary pressure on these fungi, leading to horizontal transfer of genes that enhance fitness or virulence. PMID:26808652
antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters
Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko
2015-01-01
Abstract Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579
β-globin gene cluster haplotypes in ethnic minority populations of southwest China
Sun, Hao; Liu, Hongxian; Huang, Kai; Lin, Keqin; Huang, Xiaoqin; Chu, Jiayou; Ma, Shaohui; Yang, Zhaoqing
2017-01-01
The genetic diversity and relationships among ethnic minority populations of southwest China were investigated using seven polymorphic restriction enzyme sites in the β-globin gene cluster. The haplotypes of 1392 chromosomes from ten ethnic populations living in southwest China were determined. Linkage equilibrium and recombination hotspot were found between the 5′ sites and 3′ sites of the β-globin gene cluster. 5′ haplotypes 2 (+−−−), 6 (−++−+), 9 (−++++) and 3′ haplotype FW3 (−+) were the predominant haplotypes. Notably, haplotype 9 frequency was significantly high in the southwest populations, indicating their difference with other Chinese. The interpopulation differentiation of southwest Chinese minority populations is less than those in populations of northern China and other continents. Phylogenetic analysis shows that populations sharing same ethnic origin or language clustered to each other, indicating current β-globin cluster diversity in the Chinese populations reflects their ethnic origin and linguistic affiliations to a great extent. This study characterizes β-globin gene cluster haplotypes in southwest Chinese minorities for the first time, and reveals the genetic variability and affinity of these populations using β-globin cluster haplotype frequencies. The results suggest that ethnic origin plays an important role in shaping variations of the β-globin gene cluster in the southwestern ethnic populations of China. PMID:28205625
Zhang, Bo; Zhang, Lin; Dai, Ruixue; Yu, Meiying; Zhao, Guoping; Ding, Xiaoming
2013-01-01
Streptomyces bacteria are known for producing important natural compounds by secondary metabolism, especially antibiotics with novel biological activities. Functional studies of antibiotic-biosynthesizing gene clusters are generally through homologous genomic recombination by gene-targeting vectors. Here, we present a rapid and efficient method for construction of gene-targeting vectors. This approach is based on Streptomyces phage φBT1 integrase-mediated multisite in vitro site-specific recombination. Four 'entry clones' were assembled into a circular plasmid to generate the destination gene-targeting vector by a one-step reaction. The four 'entry clones' contained two clones of the upstream and downstream flanks of the target gene, a selectable marker and an E. coli-Streptomyces shuttle vector. After targeted modification of the genome, the selectable markers were removed by φC31 integrase-mediated in vivo site-specific recombination between pre-placed attB and attP sites. Using this method, part of the calcium-dependent antibiotic (CDA) and actinorhodin (Act) biosynthetic gene clusters were deleted, and the rrdA encoding RrdA, a negative regulator of Red production, was also deleted. The final prodiginine production of the engineered strain was over five times that of the wild-type strain. This straightforward φBT1 and φC31 integrase-based strategy provides an alternative approach for rapid gene-targeting vector construction and marker removal in streptomycetes.
The nif Gene Operon of the Methanogenic Archaeon Methanococcus maripaludis
Kessler, Peter S.; Blank, Carrine; Leigh, John A.
1998-01-01
Nitrogen fixation occurs in two domains, Archaea and Bacteria. We have characterized a nif (nitrogen fixation) gene cluster in the methanogenic archaeon Methanococcus maripaludis. Sequence analysis revealed eight genes, six with sequence similarity to known nif genes and two with sequence similarity to glnB. The gene order, nifH, ORF105 (similar to glnB), ORF121 (similar to glnB), nifD, nifK, nifE, nifN, and nifX, was the same as that found in part in other diazotrophic methanogens and except for the presence of the glnB-like genes, also resembled the order found in many members of the Bacteria. Using transposon insertion mutagenesis, we determined that an 8-kb region required for nitrogen fixation corresponded to the nif gene cluster. Northern analysis revealed the presence of either a single 7.6-kb nif mRNA transcript or 10 smaller mRNA species containing portions of the large transcript. Polar effects of transposon insertions demonstrated that all of these mRNAs arose from a single promoter region, where transcription initiated 80 bp 5′ to nifH. Distinctive features of the nif gene cluster include the presence of the six primary nif genes in a single operon, the placement of the two glnB-like genes within the cluster, the apparent physical separation of the cluster from any other nif genes that might be in the genome, the fragmentation pattern of the mRNA, and the regulation of expression by a repression mechanism described previously. Our study and others with methanogenic archaea reporting multiple mRNAs arising from gene clusters with only a single putative promoter sequence suggest that mRNA processing following transcription may be a common occurrence in methanogens. PMID:9515920
Use of keyword hierarchies to interpret gene expression patterns.
Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J
2001-04-01
High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.