The limitations of simple gene set enrichment analysis assuming gene independence.
Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P
2016-02-01
Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.
Comparative study on gene set and pathway topology-based enrichment methods.
Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim
2015-10-22
Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.
Spectral gene set enrichment (SGSE).
Frost, H Robert; Li, Zhigang; Moore, Jason H
2015-03-03
Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.
Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi
2016-01-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi
2015-11-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Pathway Distiller - multisource biological pathway consolidation
2012-01-01
Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636
Pathway Distiller - multisource biological pathway consolidation.
Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong
2012-01-01
One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis
Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...
2014-01-01
Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
Vimaleswaran, Karani S; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N; Dudbridge, Frank; Loos, Ruth J F
2012-10-15
Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.
NEAT: an efficient network enrichment analysis test.
Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C
2016-09-05
Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).
Raychaudhuri, Soumya; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David; Sklar, Pamela; Purcell, Shaun; Daly, Mark J.
2010-01-01
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package. PMID:20838587
Ranking metrics in gene set enrichment analysis: do they matter?
Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna
2017-05-12
There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
Determining Semantically Related Significant Genes.
Taha, Kamal
2014-01-01
GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool
2013-01-01
Background System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Results Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Conclusions Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr. PMID:23586463
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.
Chen, Edward Y; Tan, Christopher M; Kou, Yan; Duan, Qiaonan; Wang, Zichen; Meirelles, Gabriela Vaz; Clark, Neil R; Ma'ayan, Avi
2013-04-15
System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A
2014-01-01
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
ZHANG, YAFANG; CROFTON, ELIZABETH J.; FAN, XIUZHEN; LI, DINGGE; KONG, FANPING; SINHA, MALA; LUXON, BRUCE A.; SPRATT, HEIDI M.; LICHTI, CHERYL F.; GREEN, THOMAS A.
2016-01-01
Transcriptomic and proteomic approaches have separately proven effective at identifying novel mechanisms affecting addiction-related behavior; however, it is difficult to prioritize the many promising leads from each approach. A convergent secondary analysis of proteomic and transcriptomic results can glean additional information to help prioritize promising leads. The current study is a secondary analysis of the convergence of recently published separate transcriptomic and proteomic analyses of nucleus accumbens (NAc) tissue from rats subjected to environmental enrichment vs. isolation and cocaine self-administration vs. saline. Multiple bioinformatics approaches (e.g. Gene Ontology (GO) analysis, Ingenuity Pathway Analysis (IPA), and Gene Set Enrichment Analysis (GSEA)) were used to interrogate these rich data sets. Although there was little correspondence between mRNA vs. protein at the individual target level, good correspondence was found at the level of gene/protein sets, particularly for the environmental enrichment manipulation. These data identify gene sets where there is a positive relationship between changes in mRNA and protein (e.g. glycolysis, ATP synthesis, translation elongation factor activity, etc.) and gene sets where there is an inverse relationship (e.g. ribosomes, Rho GTPase signaling, protein ubiquitination, etc.). Overall environmental enrichment produced better correspondence than cocaine self-administration. The individual targets contributing to mRNA and protein effects were largely not overlapping. As a whole, these results confirm that robust transcriptomic and proteomic data sets can provide similar results at the gene/protein set level even when there is little correspondence at the individual target level and little overlap in the targets contributing to the effects. PMID:27717806
Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex
2010-01-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
Ficklin, Stephen P; Luo, Feng; Feltus, F Alex
2010-09-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi
2016-01-01
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees
2018-06-07
The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo
2015-12-23
The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.
Welker, Noah C; Habig, Jeffrey W; Bass, Brenda L
2007-07-01
We describe the first microarray analysis of a whole animal containing a mutation in the Dicer gene. We used adult Caenorhabditis elegans and, to distinguish among different roles of Dicer, we also performed microarray analyses of animals with mutations in rde-4 and rde-1, which are involved in silencing by siRNA, but not miRNA. Surprisingly, we find that the X chromosome is greatly enriched for genes regulated by Dicer. Comparison of all three microarray data sets indicates the majority of Dicer-regulated genes are not dependent on RDE-4 or RDE-1, including the X-linked genes. However, all three data sets are enriched in genes important for innate immunity and, specifically, show increased expression of innate immunity genes.
Welker, Noah C.; Habig, Jeffrey W.; Bass, Brenda L.
2007-01-01
We describe the first microarray analysis of a whole animal containing a mutation in the Dicer gene. We used adult Caenorhabditis elegans and, to distinguish among different roles of Dicer, we also performed microarray analyses of animals with mutations in rde-4 and rde-1, which are involved in silencing by siRNA, but not miRNA. Surprisingly, we find that the X chromosome is greatly enriched for genes regulated by Dicer. Comparison of all three microarray data sets indicates the majority of Dicer-regulated genes are not dependent on RDE-4 or RDE-1, including the X-linked genes. However, all three data sets are enriched in genes important for innate immunity and, specifically, show increased expression of innate immunity genes. PMID:17526642
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.
Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi
2016-07-08
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A
2017-01-25
With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.
Broad-Enrich: functional interpretation of large sets of broad genomic regions.
Cavalcante, Raymond G; Lee, Chee; Welch, Ryan P; Patil, Snehal; Weymouth, Terry; Scott, Laura J; Sartor, Maureen A
2014-09-01
Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes. To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations. http://broad-enrich.med.umich.edu for Web version and R package. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Combining multiple tools outperforms individual methods in gene set enrichment analyses.
Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E
2017-02-01
Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
EviNet: a web platform for network enrichment analysis with flexible definition of gene sets.
Jeggari, Ashwini; Alekseenko, Zhanna; Petrov, Iurii; Dias, José M; Ericson, Johan; Alexeyenko, Andrey
2018-06-09
The new web resource EviNet provides an easily run interface to network enrichment analysis for exploration of novel, experimentally defined gene sets. The major advantages of this analysis are (i) applicability to any genes found in the global network rather than only to those with pathway/ontology term annotations, (ii) ability to connect genes via different molecular mechanisms rather than within one high-throughput platform, and (iii) statistical power sufficient to detect enrichment of very small sets, down to individual genes. The users' gene sets are either defined prior to upload or derived interactively from an uploaded file by differential expression criteria. The pathways and networks used in the analysis can be chosen from the collection menu. The calculation is typically done within seconds or minutes and the stable URL is provided immediately. The results are presented in both visual (network graphs) and tabular formats using jQuery libraries. Uploaded data and analysis results are kept in separated project directories not accessible by other users. EviNet is available at https://www.evinet.org/.
Identification of a set of genes showing regionally enriched expression in the mouse brain
D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM
2008-01-01
Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066
Identification of a set of genes showing regionally enriched expression in the mouse brain.
D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa L C; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven J M
2008-07-14
The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.
Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.
2015-01-01
Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci.
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-02-14
Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). The PAX8-target gene set was ranked 1/615 in the discovery (P GSEA <0.001; FDR=0.21), 7/615 in the replication (P GSEA =0.004; FDR=0.37), and 1/615 in the combined (P GSEA <0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10 -5 (including six with P<5 × 10 -8 ). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (P GSEA =0.025) and IGROV1 (P GSEA =0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC.
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-01-01
Background: Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. Methods: All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). Results: The PAX8-target gene set was ranked 1/615 in the discovery (PGSEA<0.001; FDR=0.21), 7/615 in the replication (PGSEA=0.004; FDR=0.37), and 1/615 in the combined (PGSEA<0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10−5 (including six with P<5 × 10−8). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (PGSEA=0.025) and IGROV1 (PGSEA=0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Conclusions: Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC. PMID:28103614
Radiation Quality Effects on Transcriptome Profiles in 3-d Cultures After Particle Irradiation
NASA Technical Reports Server (NTRS)
Patel, Z. S.; Kidane, Y. H.; Huff, J. L.
2014-01-01
In this work, we evaluate the differential effects of low- and high-LET radiation on 3-D organotypic cultures in order to investigate radiation quality impacts on gene expression and cellular responses. Reducing uncertainties in current risk models requires new knowledge on the fundamental differences in biological responses (the so-called radiation quality effects) triggered by heavy ion particle radiation versus low-LET radiation associated with Earth-based exposures. We are utilizing novel 3-D organotypic human tissue models that provide a format for study of human cells within a realistic tissue framework, thereby bridging the gap between 2-D monolayer culture and animal models for risk extrapolation to humans. To identify biological pathway signatures unique to heavy ion particle exposure, functional gene set enrichment analysis (GSEA) was used with whole transcriptome profiling. GSEA has been used extensively as a method to garner biological information in a variety of model systems but has not been commonly used to analyze radiation effects. It is a powerful approach for assessing the functional significance of radiation quality-dependent changes from datasets where the changes are subtle but broad, and where single gene based analysis using rankings of fold-change may not reveal important biological information. We identified 45 statistically significant gene sets at 0.05 q-value cutoff, including 14 gene sets common to gamma and titanium irradiation, 19 gene sets specific to gamma irradiation, and 12 titanium-specific gene sets. Common gene sets largely align with DNA damage, cell cycle, early immune response, and inflammatory cytokine pathway activation. The top gene set enriched for the gamma- and titanium-irradiated samples involved KRAS pathway activation and genes activated in TNF-treated cells, respectively. Another difference noted for the high-LET samples was an apparent enrichment in gene sets involved in cycle cycle/mitotic control. It is plausible that the enrichment in these particular pathways results from the complex DNA damage resulting from high-LET exposure where repair processes are not completed during the same time scale as the less complex damage resulting from low-LET radiation.
Carbonetto, Peter; Stephens, Matthew
2013-01-01
Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study. PMID:24098138
Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong
2015-10-19
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .
Functional and evolutionary insights from the Ciona notochord transcriptome.
Reeves, Wendy M; Wu, Yuye; Harder, Matthew J; Veeman, Michael T
2017-09-15
The notochord of the ascidian Ciona consists of only 40 cells, and is a longstanding model for studying organogenesis in a small, simple embryo. Here, we perform RNAseq on flow-sorted notochord cells from multiple stages to define a comprehensive Ciona notochord transcriptome. We identify 1364 genes with enriched expression and extensively validate the results by in situ hybridization. These genes are highly enriched for Gene Ontology terms related to the extracellular matrix, cell adhesion and cytoskeleton. Orthologs of 112 of the Ciona notochord genes have known notochord expression in vertebrates, more than twice as many as predicted by chance alone. This set of putative effector genes with notochord expression conserved from tunicates to vertebrates will be invaluable for testing hypotheses about notochord evolution. The full set of Ciona notochord genes provides a foundation for systems-level studies of notochord gene regulation and morphogenesis. We find only modest overlap between this set of notochord-enriched transcripts and the genes upregulated by ectopic expression of the key notochord transcription factor Brachyury, indicating that Brachyury is not a notochord master regulator gene as strictly defined. © 2017. Published by The Company of Biologists Ltd.
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.
Gold, David L; Coombes, Kevin R; Wang, Jing; Mallick, Bani
2007-03-01
Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is 'enriched' for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental.
Bao, Weier; Greenwold, Matthew J; Sawyer, Roger H
2017-11-01
Gene co-expression network analysis has been a research method widely used in systematically exploring gene function and interaction. Using the Weighted Gene Co-expression Network Analysis (WGCNA) approach to construct a gene co-expression network using data from a customized 44K microarray transcriptome of chicken epidermal embryogenesis, we have identified two distinct modules that are highly correlated with scale or feather development traits. Signaling pathways related to feather development were enriched in the traditional KEGG pathway analysis and functional terms relating specifically to embryonic epidermal development were also enriched in the Gene Ontology analysis. Significant enrichment annotations were discovered from customized enrichment tools such as Modular Single-Set Enrichment Test (MSET) and Medical Subject Headings (MeSH). Hub genes in both trait-correlated modules showed strong specific functional enrichment toward epidermal development. Also, regulatory elements, such as transcription factors and miRNAs, were targeted in the significant enrichment result. This work highlights the advantage of this methodology for functional prediction of genes not previously associated with scale- and feather trait-related modules.
Schizophrenia and vitamin D related genes could have been subject to latitude-driven adaptation.
Amato, Roberto; Pinelli, Michele; Monticelli, Antonella; Miele, Gennaro; Cocozza, Sergio
2010-11-11
Many natural phenomena are directly or indirectly related to latitude. Living at different latitudes, indeed, has its consequences with being exposed to different climates, diets, light/dark cycles, etc. In humans, one of the best known examples of genetic traits following a latitudinal gradient is skin pigmentation. Nevertheless, also several diseases show latitudinal clinals such as hypertension, cancer, dismetabolic conditions, schizophrenia, Parkinson's disease and many more. We investigated, for the first time on a wide genomic scale, the latitude-driven adaptation phenomena. In particular, we selected a set of genes showing signs of latitude-dependent population differentiation. The biological characterization of these genes showed enrichment for neural-related processes. In light of this, we investigated whether genes associated to neuropsychiatric diseases were enriched by Latitude-Related Genes (LRGs). We found a strong enrichment of LRGs in the set of genes associated to schizophrenia. In an attempt to try to explain this possible link between latitude and schizophrenia, we investigated their associations with vitamin D. We found in a set of vitamin D related genes a significant enrichment of both LRGs and of genes involved in schizophrenia. Our results suggest a latitude-driven adaptation for both schizophrenia and vitamin D related genes. In addition we confirm, at a molecular level, the link between schizophrenia and vitamin D. Finally, we discuss a model in which schizophrenia is, at least partly, a maladaptive by-product of latitude dependent adaptive changes in vitamin D metabolism.
snpGeneSets: An R Package for Genome-Wide Study Annotation
Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian
2016-01-01
Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
Dean, Jeffry L; Zhao, Q Jay; Lambert, Jason C; Hawkins, Belinda S; Thomas, Russell S; Wesselkamper, Scott C
2017-05-01
The rate of new chemical development in commerce combined with a paucity of toxicity data for legacy chemicals presents a unique challenge for human health risk assessment. There is a clear need to develop new technologies and incorporate novel data streams to more efficiently inform derivation of toxicity values. One avenue of exploitation lies in the field of transcriptomics and the application of gene expression analysis to characterize biological responses to chemical exposures. In this context, gene set enrichment analysis (GSEA) was employed to evaluate tissue-specific, dose-response gene expression data generated following exposure to multiple chemicals for various durations. Patterns of transcriptional enrichment were evident across time and with increasing dose, and coordinated enrichment plausibly linked to the etiology of the biological responses was observed. GSEA was able to capture both transient and sustained transcriptional enrichment events facilitating differentiation between adaptive versus longer term molecular responses. When combined with benchmark dose (BMD) modeling of gene expression data from key drivers of biological enrichment, GSEA facilitated characterization of dose ranges required for enrichment of biologically relevant molecular signaling pathways, and promoted comparison of the activation dose ranges required for individual pathways. Median transcriptional BMD values were calculated for the most sensitive enriched pathway as well as the overall median BMD value for key gene members of significantly enriched pathways, and both were observed to be good estimates of the most sensitive apical endpoint BMD value. Together, these efforts support the application of GSEA to qualitative and quantitative human health risk assessment. Published by Oxford University Press on behalf of the Society of Toxicology 2017. This work is written by US Government employees and is in the public domain in the US.
Fang, Lingzhao; Sørensen, Peter; Sahana, Goutam; Panitz, Frank; Su, Guosheng; Zhang, Shengli; Yu, Ying; Li, Bingjie; Ma, Li; Liu, George; Lund, Mogens Sandø; Thomsen, Bo
2018-06-19
MicroRNAs (miRNA) are key modulators of gene expression and so act as putative fine-tuners of complex phenotypes. Here, we hypothesized that causal variants of complex traits are enriched in miRNAs and miRNA-target networks. First, we conducted a genome-wide association study (GWAS) for seven functional and milk production traits using imputed sequence variants (13~15 million) and >10,000 animals from three dairy cattle breeds, i.e., Holstein (HOL), Nordic red cattle (RDC) and Jersey (JER). Second, we analyzed for enrichments of association signals in miRNAs and their miRNA-target networks. Our results demonstrated that genomic regions harboring miRNA genes were significantly (P < 0.05) enriched with GWAS signals for milk production traits and mastitis, and that enrichments within miRNA-target gene networks were significantly higher than in random gene-sets for the majority of traits. Furthermore, most between-trait and across-breed correlations of enrichments with miRNA-target networks were significantly greater than with random gene-sets, suggesting pleiotropic effects of miRNAs. Intriguingly, genes that were differentially expressed in response to mammary gland infections were significantly enriched in the miRNA-target networks associated with mastitis. All these findings were consistent across three breeds. Collectively, our observations demonstrate the importance of miRNAs and their targets for the expression of complex traits.
2013-01-01
Background A recent study of lateral septum (LS) suggested a large number of autism-related genes with altered expression in the postpartum state. However, formally testing the findings for enrichment of autism-associated genes proved to be problematic with existing software. Many gene-disease association databases have been curated which are not currently incorporated in popular, full-featured enrichment tools, and the use of custom gene lists in these programs can be difficult to perform and interpret. As a simple alternative, we have developed the Modular Single-set Enrichment Test (MSET), a minimal tool that enables one to easily evaluate expression data for enrichment of any conceivable gene list of interest. Results The MSET approach was validated by testing several publicly available expression data sets for expected enrichment in areas of autism, attention deficit hyperactivity disorder (ADHD), and arthritis. Using nine independent, unique autism gene lists extracted from association databases and two recent publications, a striking consensus of enrichment was detected within gene expression changes in LS of postpartum mice. A network of 160 autism-related genes was identified, representing developmental processes such as synaptic plasticity, neuronal morphogenesis, and differentiation. Additionally, maternal LS displayed enrichment for genes associated with bipolar disorder, schizophrenia, ADHD, and depression. Conclusions The transition to motherhood includes the most fundamental social bonding event in mammals and features naturally occurring changes in sociability. Some individuals with autism, schizophrenia, or other mental health disorders exhibit impaired social traits. Genes involved in these deficits may also contribute to elevated sociability in the maternal brain. To date, this is the first study to show a significant, quantitative link between the maternal brain and mental health disorders using large scale gene expression data. Thus, the postpartum brain may provide a novel and promising platform for understanding the complex genetics of improved sociability that may have direct relevance for multiple psychiatric illnesses. This study also provides an important new tool that fills a critical analysis gap and makes evaluation of enrichment using any database of interest possible with an emphasis on ease of use and methodological transparency. PMID:24245670
Eisinger, Brian E; Saul, Michael C; Driessen, Terri M; Gammie, Stephen C
2013-11-19
A recent study of lateral septum (LS) suggested a large number of autism-related genes with altered expression in the postpartum state. However, formally testing the findings for enrichment of autism-associated genes proved to be problematic with existing software. Many gene-disease association databases have been curated which are not currently incorporated in popular, full-featured enrichment tools, and the use of custom gene lists in these programs can be difficult to perform and interpret. As a simple alternative, we have developed the Modular Single-set Enrichment Test (MSET), a minimal tool that enables one to easily evaluate expression data for enrichment of any conceivable gene list of interest. The MSET approach was validated by testing several publicly available expression data sets for expected enrichment in areas of autism, attention deficit hyperactivity disorder (ADHD), and arthritis. Using nine independent, unique autism gene lists extracted from association databases and two recent publications, a striking consensus of enrichment was detected within gene expression changes in LS of postpartum mice. A network of 160 autism-related genes was identified, representing developmental processes such as synaptic plasticity, neuronal morphogenesis, and differentiation. Additionally, maternal LS displayed enrichment for genes associated with bipolar disorder, schizophrenia, ADHD, and depression. The transition to motherhood includes the most fundamental social bonding event in mammals and features naturally occurring changes in sociability. Some individuals with autism, schizophrenia, or other mental health disorders exhibit impaired social traits. Genes involved in these deficits may also contribute to elevated sociability in the maternal brain. To date, this is the first study to show a significant, quantitative link between the maternal brain and mental health disorders using large scale gene expression data. Thus, the postpartum brain may provide a novel and promising platform for understanding the complex genetics of improved sociability that may have direct relevance for multiple psychiatric illnesses. This study also provides an important new tool that fills a critical analysis gap and makes evaluation of enrichment using any database of interest possible with an emphasis on ease of use and methodological transparency.
GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.
Subhash, Santhilal; Kanduri, Chandrasekhar
2016-09-13
High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-01
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-11
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
Krienen, Fenna M.; Yeo, B. T. Thomas; Ge, Tian; Buckner, Randy L.; Sherwood, Chet C.
2016-01-01
The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute’s human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections. PMID:26739559
Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C
2016-01-26
The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.
Yang, Qian; Wang, Shuyuan; Dai, Enyu; Zhou, Shunheng; Liu, Dianming; Liu, Haizhou; Meng, Qianqian; Jiang, Bin; Jiang, Wei
2017-08-16
Pathway enrichment analysis has been widely used to identify cancer risk pathways, and contributes to elucidating the mechanism of tumorigenesis. However, most of the existing approaches use the outdated pathway information and neglect the complex gene interactions in pathway. Here, we first reviewed the existing widely used pathway enrichment analysis approaches briefly, and then, we proposed a novel topology-based pathway enrichment analysis (TPEA) method, which integrated topological properties and global upstream/downstream positions of genes in pathways. We compared TPEA with four widely used pathway enrichment analysis tools, including database for annotation, visualization and integrated discovery (DAVID), gene set enrichment analysis (GSEA), centrality-based pathway enrichment (CePa) and signaling pathway impact analysis (SPIA), through analyzing six gene expression profiles of three tumor types (colorectal cancer, thyroid cancer and endometrial cancer). As a result, we identified several well-known cancer risk pathways that could not be obtained by the existing tools, and the results of TPEA were more stable than that of the other tools in analyzing different data sets of the same cancer. Ultimately, we developed an R package to implement TPEA, which could online update KEGG pathway information and is available at the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/TPEA/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.
Zheng, Qi; Wang, Xiu-Jie
2008-07-01
Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R
2005-09-01
We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Statistical assessment of crosstalk enrichment between gene groups in biological networks.
McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L
2013-01-01
Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans
Moretti, S.; Davydov, I.I.; Excoffier, L.
2017-01-01
Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345
Tissue enrichment analysis for C. elegans genomics.
Angeles-Albores, David; N Lee, Raymond Y; Chan, Juancarlos; Sternberg, Paul W
2016-09-13
Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python's standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results.
Fayolle-Guichard, Françoise; Lombard, Vincent; Hébert, Agnès; Coutinho, Pedro M.; Groppi, Alexis; Barre, Aurélien; Henrissat, Bernard
2016-01-01
Cost-effective biofuel production from lignocellulosic biomass depends on efficient degradation of the plant cell wall. One of the major obstacles for the development of a cost-efficient process is the lack of resistance of currently used fungal enzymes to harsh conditions such as high temperature. Adapted, thermophilic microbial communities provide a huge reservoir of potentially interesting lignocellulose-degrading enzymes for improvement of the cellulose hydrolysis step. In order to identify such enzymes, a leaf and wood chip compost was enriched on a mixture of thermo-chemically pretreated wheat straw, poplar and Miscanthus under thermophile conditions, but in two different set-ups. Unexpectedly, metagenome sequencing revealed that incubation of the lignocellulosic substrate with compost as inoculum in a suspension culture resulted in an impoverishment of putative cellulase- and hemicellulase-encoding genes. However, mimicking composting conditions without liquid phase yielded a high number and diversity of glycoside hydrolase genes and an enrichment of genes encoding cellulose binding domains. These identified genes were most closely related to species from Actinobacteria, which seem to constitute important players of lignocellulose degradation under the applied conditions. The study highlights that subtle changes in an enrichment set-up can have an important impact on composition and functions of the microcosm. Composting-like conditions were found to be the most successful method for enrichment in species with high biomass degrading capacity. PMID:27936240
GOMA: functional enrichment analysis tool based on GO modules
Huang, Qiang; Wu, Ling-Yun; Wang, Yong; Zhang, Xiang-Sun
2013-01-01
Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results. PMID:23237213
Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng
2017-08-01
Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10 -10 ), MGC57346 (p value=6.92×10 -7 ), BLK (p value=1.01×10 -6 ), XKR6 (p value=1.11×10 -6 ), C17ORF69 (p value=1.12×10 -6 ) and KIAA1267 (p value=4.00×10 -6 ). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.
An integrated analysis of genes and functional pathways for aggression in human and rodent models.
Zhang-James, Yanli; Fernàndez-Castillo, Noèlia; Hess, Jonathan L; Malki, Karim; Glatt, Stephen J; Cormand, Bru; Faraone, Stephen V
2018-06-01
Human genome-wide association studies (GWAS), transcriptome analyses of animal models, and candidate gene studies have advanced our understanding of the genetic architecture of aggressive behaviors. However, each of these methods presents unique limitations. To generate a more confident and comprehensive view of the complex genetics underlying aggression, we undertook an integrated, cross-species approach. We focused on human and rodent models to derive eight gene lists from three main categories of genetic evidence: two sets of genes identified in GWAS studies, four sets implicated by transcriptome-wide studies of rodent models, and two sets of genes with causal evidence from online Mendelian inheritance in man (OMIM) and knockout (KO) mice reports. These gene sets were evaluated for overlap and pathway enrichment to extract their similarities and differences. We identified enriched common pathways such as the G-protein coupled receptor (GPCR) signaling pathway, axon guidance, reelin signaling in neurons, and ERK/MAPK signaling. Also, individual genes were ranked based on their cumulative weights to quantify their importance as risk factors for aggressive behavior, which resulted in 40 top-ranked and highly interconnected genes. The results of our cross-species and integrated approach provide insights into the genetic etiology of aggression.
Joshi, Anagha
2014-12-30
Transcriptional hotspots are defined as genomic regions bound by multiple factors. They have been identified recently as cell type specific enhancers regulating developmentally essential genes in many species such as worm, fly and humans. The in-depth analysis of hotspots across multiple cell types in same species still remains to be explored and can bring new biological insights. We therefore collected 108 transcription-related factor (TF) ChIP sequencing data sets in ten murine cell types and classified the peaks in each cell type in three groups according to binding occupancy as singletons (low-occupancy), combinatorials (mid-occupancy) and hotspots (high-occupancy). The peaks in the three groups clustered largely according to the occupancy, suggesting priming of genomic loci for mid occupancy irrespective of cell type. We then characterized hotspots for diverse structural functional properties. The genes neighbouring hotspots had a small overlap with hotspot genes in other cell types and were highly enriched for cell type specific function. Hotspots were enriched for sequence motifs of key TFs in that cell type and more than 90% of hotspots were occupied by pioneering factors. Though we did not find any sequence signature in the three groups, the H3K4me1 binding profile had bimodal peaks at hotspots, distinguishing hotspots from mono-modal H3K4me1 singletons. In ES cells, differentially expressed genes after perturbation of activators were enriched for hotspot genes suggesting hotspots primarily act as transcriptional activator hubs. Finally, we proposed that ES hotspots might be under control of SetDB1 and not DNMT for silencing. Transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes. In ES cells, they are predicted to act as transcriptional activator hubs and might be under SetDB1 control for silencing.
Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun
2017-09-21
High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.
Integrative set enrichment testing for multiple omics platforms
2011-01-01
Background Enrichment testing assesses the overall evidence of differential expression behavior of the elements within a defined set. When we have measured many molecular aspects, e.g. gene expression, metabolites, proteins, it is desirable to assess their differential tendencies jointly across platforms using an integrated set enrichment test. In this work we explore the properties of several methods for performing a combined enrichment test using gene expression and metabolomics as the motivating platforms. Results Using two simulation models we explored the properties of several enrichment methods including two novel methods: the logistic regression 2-degree of freedom Wald test and the 2-dimensional permutation p-value for the sum-of-squared statistics test. In relation to their univariate counterparts we find that the joint tests can improve our ability to detect results that are marginal univariately. We also find that joint tests improve the ranking of associated pathways compared to their univariate counterparts. However, there is a risk of Type I error inflation with some methods and self-contained methods lose specificity when the sets are not representative of underlying association. Conclusions In this work we show that consideration of data from multiple platforms, in conjunction with summarization via a priori pathway information, leads to increased power in detection of genomic associations with phenotypes. PMID:22118224
LENS: web-based lens for enrichment and network studies of human proteins
2015-01-01
Background Network analysis is a common approach for the study of genetic view of diseases and biological pathways. Typically, when a set of genes are identified to be of interest in relation to a disease, say through a genome wide association study (GWAS) or a different gene expression study, these genes are typically analyzed in the context of their protein-protein interaction (PPI) networks. Further analysis is carried out to compute the enrichment of known pathways and disease-associations in the network. Having tools for such analysis at the fingertips of biologists without the requirement for computer programming or curation of data would accelerate the characterization of genes of interest. Currently available tools do not integrate network and enrichment analysis and their visualizations, and most of them present results in formats not most conducive to human cognition. Results We developed the tool Lens for Enrichment and Network Studies of human proteins (LENS) that performs network and pathway and diseases enrichment analyses on genes of interest to users. The tool creates a visualization of the network, provides easy to read statistics on network connectivity, and displays Venn diagrams with statistical significance values of the network's association with drugs, diseases, pathways, and GWASs. We used the tool to analyze gene sets related to craniofacial development, autism, and schizophrenia. Conclusion LENS is a web-based tool that does not require and download or plugins to use. The tool is free and does not require login for use, and is available at http://severus.dbmi.pitt.edu/LENS. PMID:26680011
Acevedo-Luna, Natalia; Mariño-Ramírez, Leonardo; Halbert, Armand; Hansen, Ulla; Landsman, David; Spouge, John L
2016-11-21
Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS.
Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su
2015-01-01
It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
2015-01-01
Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779
Gardiner, Laura-Jayne; Gawroński, Piotr; Olohan, Lisa; Schnurbusch, Thorsten; Hall, Neil; Hall, Anthony
2014-12-01
Mapping-by-sequencing analyses have largely required a complete reference sequence and employed whole genome re-sequencing. In species such as wheat, no finished genome reference sequence is available. Additionally, because of its large genome size (17 Gb), re-sequencing at sufficient depth of coverage is not practical. Here, we extend the utility of mapping by sequencing, developing a bespoke pipeline and algorithm to map an early-flowering locus in einkorn wheat (Triticum monococcum L.) that is closely related to the bread wheat genome A progenitor. We have developed a genomic enrichment approach using the gene-rich regions of hexaploid bread wheat to design a 110-Mbp NimbleGen SeqCap EZ in solution capture probe set, representing the majority of genes in wheat. Here, we use the capture probe set to enrich and sequence an F2 mapping population of the mutant. The mutant locus was identified in T. monococcum, which lacks a complete genome reference sequence, by mapping the enriched data set onto pseudo-chromosomes derived from the capture probe target sequence, with a long-range order of genes based on synteny of wheat with Brachypodium distachyon. Using this approach we are able to map the region and identify a set of deleted genes within the interval. © 2014 The Authors.The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
D'Addabbo, Annarita; Palmieri, Orazio; Maglietta, Rosalia; Latiano, Anna; Mukherjee, Sayan; Annese, Vito; Ancona, Nicola
2011-08-01
A meta-analysis has re-analysed previous genome-wide association scanning definitively confirming eleven genes and further identifying 21 new loci. However, the identified genes/loci still explain only the minority of genetic predisposition of Crohn's disease. To identify genes weakly involved in disease predisposition by analysing chromosomal regions enriched of single nucleotide polymorphisms with modest statistical association. We utilized the WTCCC data set evaluating 1748 CD and 2938 controls. The identification of candidate genes/loci was performed by a two-step procedure: first of all chromosomal regions enriched of weak association signals were localized; subsequently, weak signals clustered in gene regions were identified. The statistical significance was assessed by non parametric permutation tests. The cytoband enrichment analysis highlighted 44 regions (P≤0.05) enriched with single nucleotide polymorphisms significantly associated with the trait including 23 out of 31 previously confirmed and replicated genes. Importantly, we highlight further 20 novel chromosomal regions carrying approximately one hundred genes/loci with modest association. Amongst these we find compelling functional candidate genes such as MAPT, GRB2 and CREM, LCT, and IL12RB2. Our study suggests a different statistical perspective to discover genes weakly associated with a given trait, although further confirmatory functional studies are needed. Copyright © 2011 Editrice Gastroenterologica Italiana S.r.l. All rights reserved.
Gharib, Sina A; Seiger, Ashley N; Hayes, Amanda L; Mehra, Reena; Patel, Sanjay R
2014-04-01
Obstructive sleep apnea (OSA) has been associated with a number of chronic disorders that may improve with effective therapy. However, the molecular pathways affected by continuous positive airway pressure (CPAP) treatment are largely unknown. We sought to assess the system-wide consequences of CPAP therapy by transcriptionally profiling peripheral blood leukocytes (PBLs). Subjects in whom severe OSA was diagnosed were treated with CPAP, and whole-genome expression measurement of PBLs was performed at baseline and following therapy. We used gene set enrichment analysis (GSEA) to identify pathways that were differentially enriched. Network analysis was then applied to highlight key drivers of processes influenced by CPAP. Eighteen subjects with significant OSA underwent CPAP therapy and microarray analysis of their PBLs. Treatment with CPAP improved apnea-hypopnea index (AHI), daytime sleepiness, and blood pressure, but did not affect anthropometric measures. GSEA revealed a number of enriched gene sets, many of which were involved in neoplastic processes and displayed downregulated expression patterns in response to CPAP. Network analysis identified several densely connected genes that are important modulators of cancer and tumor growth. Effective therapy of OSA with CPAP is associated with alterations in circulating leukocyte gene expression. Functional enrichment and network analyses highlighted transcriptional suppression in cancer-related pathways, suggesting potentially novel mechanisms linking OSA with neoplastic signatures.
Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka)
2011-01-01
Background There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. Results We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16K microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. Conclusions This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes. PMID:22136247
Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka).
Pavey, Scott A; Sutherland, Ben J G; Leong, Jong; Robb, Adrienne; von Schalburg, Kris; Hamon, Troy R; Koop, Ben F; Nielsen, Jennifer L
2011-12-02
There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16 k microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes.
STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation
2013-01-01
Background Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. Results As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. Conclusion Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/. PMID:23409969
Morine, Melissa J; McMonagle, Jolene; Toomey, Sinead; Reynolds, Clare M; Moloney, Aidan P; Gormley, Isobel C; Gaora, Peadar O; Roche, Helen M
2010-10-07
Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p < 0.05), followed by muscle (601 genes) and adipose (16 genes). Results from modified GSEA showed that the high-CLA beef diet affected diverse biological processes across the three tissues, and that the majority of pathway changes reached significance only with the bi-directional test. Combining the liver tissue microarray results with plasma marker data revealed 110 CLA-sensitive genes showing strong canonical correlation with one or more plasma markers of metabolic health, and 9 significantly overrepresented pathways among this set; each of these pathways was also significantly changed by the high-CLA diet. Closer inspection of two of these pathways--selenoamino acid metabolism and steroid biosynthesis--illustrated clear diet-sensitive changes in constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect. Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease.
2010-01-01
Background Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Results Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p < 0.05), followed by muscle (601 genes) and adipose (16 genes). Results from modified GSEA showed that the high-CLA beef diet affected diverse biological processes across the three tissues, and that the majority of pathway changes reached significance only with the bi-directional test. Combining the liver tissue microarray results with plasma marker data revealed 110 CLA-sensitive genes showing strong canonical correlation with one or more plasma markers of metabolic health, and 9 significantly overrepresented pathways among this set; each of these pathways was also significantly changed by the high-CLA diet. Closer inspection of two of these pathways - selenoamino acid metabolism and steroid biosynthesis - illustrated clear diet-sensitive changes in constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect. Conclusion Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease. PMID:20929581
Tcof1-Related Molecular Networks in Treacher Collins Syndrome.
Dai, Jiewen; Si, Jiawen; Wang, Minjiao; Huang, Li; Fang, Bing; Shi, Jun; Wang, Xudong; Shen, Guofang
2016-09-01
Treacher Collins syndrome (TCS) is a rare, autosomal-dominant disorder characterized by craniofacial deformities, and is primarily caused by mutations in the Tcof1 gene. This article was aimed to perform a comprehensive literature review and systematic bioinformatic analysis of Tcof1-related molecular networks in TCS. First, the up- and down-regulated genes in Tcof1 heterozygous haploinsufficient mutant mice embryos and Tcof1 knockdown and Tcof1 over-expressed neuroblastoma N1E-115 cells were obtained from the Gene Expression Omnibus database. The GeneDecks database was used to calculate the 500 genes most closely related to Tcof1. Then, the relationships between 4 gene sets (a predicted set and sets comparing the wildtype with the 3 Gene Expression Omnibus datasets) were analyzed using the DAVID, GeneMANIA and STRING databases. The analysis results showed that the Tcof1-related genes were enriched in various biological processes, including cell proliferation, apoptosis, cell cycle, differentiation, and migration. They were also enriched in several signaling pathways, such as the ribosome, p53, cell cycle, and WNT signaling pathways. Additionally, these genes clearly had direct or indirect interactions with Tcof1 and between each other. Literature review and bioinformatic analysis finds imply that special attention should be given to these pathways, as they may offer target points for TCS therapies.
Van Loo, Peter; Aerts, Stein; Thienpont, Bernard; De Moor, Bart; Moreau, Yves; Marynen, Peter
2008-01-01
We present ModuleMiner, a novel algorithm for computationally detecting cis-regulatory modules (CRMs) in a set of co-expressed genes. ModuleMiner outperforms other methods for CRM detection on benchmark data, and successfully detects CRMs in tissue-specific microarray clusters and in embryonic development gene sets. Interestingly, CRM predictions for differentiated tissues exhibit strong enrichment close to the transcription start site, whereas CRM predictions for embryonic development gene sets are depleted in this region. PMID:18394174
Gene expression profiles in whole blood and associations with metabolic dysregulation in obesity.
Cox, Amanda J; Zhang, Ping; Evans, Tiffany J; Scott, Rodney J; Cripps, Allan W; West, Nicholas P
Gene expression data provides one tool to gain further insight into the complex biological interactions linking obesity and metabolic disease. This study examined associations between blood gene expression profiles and metabolic disease in obesity. Whole blood gene expression profiles, performed using the Illumina HT-12v4 Human Expression Beadchip, were compared between (i) individuals with obesity (O) or lean (L) individuals (n=21 each), (ii) individuals with (M) or without (H) Metabolic Syndrome (n=11 each) matched on age and gender. Enrichment of differentially expressed genes (DEG) into biological pathways was assessed using Ingenuity Pathway Analysis. Association between sets of genes from biological pathways considered functionally relevant and Metabolic Syndrome were further assessed using an area under the curve (AUC) and cross-validated classification rate (CR). For OvL, only 50 genes were significantly differentially expressed based on the selected differential expression threshold (1.2-fold, p<0.05). For MvH, 582 genes were significantly differentially expressed (1.2-fold, p<0.05) and pathway analysis revealed enrichment of DEG into a diverse set of pathways including immune/inflammatory control, insulin signalling and mitochondrial function pathways. Gene sets from the mTOR signalling pathways demonstrated the strongest association with Metabolic Syndrome (p=8.1×10 -8 ; AUC: 0.909, CR: 72.7%). These results support the use of expression profiling in whole blood in the absence of more specific tissue types for investigations of metabolic disease. Using a pathway analysis approach it was possible to identify an enrichment of DEG into biological pathways that could be targeted for in vitro follow-up. Copyright © 2017 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
Kristiansen, Wenche; Karlsson, Robert; Rounge, Trine B; Whitington, Thomas; Andreassen, Bettina K; Magnusson, Patrik K; Fosså, Sophie D; Adami, Hans-Olov; Turnbull, Clare; Haugen, Trine B; Grotmol, Tom; Wiklund, Fredrik
2015-07-15
Genome-wide association (GWA) studies have reported 19 distinct susceptibility loci for testicular germ cell tumor (TGCT). A GWA study for TGCT was performed by genotyping 610 240 single-nucleotide polymorphisms (SNPs) in 1326 cases and 6687 controls from Sweden and Norway. No novel genome-wide significant associations were observed in this discovery stage. We put forward 27 SNPs from 15 novel regions and 12 SNPs previously reported, for replication in 710 case-parent triads and 289 cases and 290 controls. Predefined biological pathways and processes, in addition to a custom-built sex-determination gene set, were subject to enrichment analyses using Meta-Analysis Gene Set Enrichment of Variant Associations (M) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (I). In the combined meta-analysis, we observed genome-wide significant association for rs7501939 on chromosome 17q12 (OR = 0.78, 95% CI = 0.72-0.84, P = 1.1 × 10(-9)) and rs2195987 on chromosome 19p12 (OR = 0.76, 95% CI: 0.69-0.84, P = 3.2 × 10(-8)). The marker rs7501939 on chromosome 17q12 is located in an intron of the HNF1B gene, encoding a member of the homeodomain-containing superfamily of transcription factors. The sex-determination gene set (false discovery rate, FDRM < 0.001, FDRI < 0.001) and pathways related to NF-κB, glycerophospholipid and ether lipid metabolism, as well as cancer and apoptosis, was associated with TGCT (FDR < 0.1). In addition to revealing two new TGCT susceptibility loci, our results continue to support the notion that genes governing normal germ cell development in utero are implicated in the development of TGCT. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A
2014-11-15
Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model
Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.
2014-01-01
Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298
2012-01-01
Background Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Results Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since multiple TLRs were found in the generic fever network, it is reasonable to hypothesize that vaccine-TLR interactions may play an important role in inducing fever response, which deserves a further investigation. Conclusions This study demonstrated that ontology-based literature mining is a powerful method for analyzing gene interaction networks and generating new scientific hypotheses. PMID:23256563
Feng, Juerong; Zhou, Rui; Chang, Ying; Liu, Jing; Zhao, Qiu
2017-01-01
Hepatocellular carcinoma (HCC) has a high incidence and mortality worldwide, and its carcinogenesis and progression are influenced by a complex network of gene interactions. A weighted gene co-expression network was constructed to identify gene modules associated with the clinical traits in HCC (n = 214). Among the 13 modules, high correlation was only found between the red module and metastasis risk (classified by the HCC metastasis gene signature) (R2 = −0.74). Moreover, in the red module, 34 network hub genes for metastasis risk were identified, six of which (ABAT, AGXT, ALDH6A1, CYP4A11, DAO and EHHADH) were also hub nodes in the protein-protein interaction network of the module genes. Thus, a total of six hub genes were identified. In validation, all hub genes showed a negative correlation with the four-stage HCC progression (P for trend < 0.05) in the test set. Furthermore, in the training set, HCC samples with any hub gene lowly expressed demonstrated a higher recurrence rate and poorer survival rate (hazard ratios with 95% confidence intervals > 1). RNA-sequencing data of 142 HCC samples showed consistent results in the prognosis. Gene set enrichment analysis (GSEA) demonstrated that in the samples with any hub gene highly expressed, a total of 24 functional gene sets were enriched, most of which focused on amino acid metabolism and oxidation. In conclusion, co-expression network analysis identified six hub genes in association with HCC metastasis risk and prognosis, which might improve the prognosis by influencing amino acid metabolism and oxidation. PMID:28430663
Hill, W.D.; Davies, G.; Liewald, D.C.; Payton, A.; McNeil, C.J.; Whalley, L.J.; Horan, M.; Ollier, W.; Starr, J.M.; Pendleton, N.; Hansel, N.K.; Montgomery, G.W.; Medland, S.E.; Martin, N.G.; Wright, M.J.; Bates, T.C.; Deary, I.J.
2016-01-01
Two themes are emerging regarding the molecular genetic aetiology of intelligence. The first is that intelligence is influenced by many variants and those that are tagged by common single nucleotide polymorphisms account for around 30% of the phenotypic variation. The second, in line with other polygenic traits such as height and schizophrenia, is that these variants are not randomly distributed across the genome but cluster in genes that work together. Less clear is whether the very low range of cognitive ability (intellectual disability) is simply one end of the normal distribution describing individual differences in cognitive ability across a population. Here, we examined 40 genes with a known association with non-syndromic autosomal recessive intellectual disability (NS-ARID) to determine if they are enriched for common variants associated with the normal range of intelligence differences. The current study used the 3511 individuals of the Cognitive Ageing Genetics in England and Scotland (CAGES) consortium. In addition, a text mining analysis was used to identify gene sets biologically related to the NS-ARID set. Gene-based tests indicated that genes implicated in NS-ARID were not significantly enriched for quantitative trait loci (QTL) associated with intelligence. These findings suggest that genes in which mutations can have a large and deleterious effect on intelligence are not associated with variation across the range of intelligence differences. PMID:26912939
Aging-like Changes in the Transcriptome of Irradiated Microglia
Li, Matthew D.; Burns, Terry C.; Kumar, Sunny; Morgan, Alexander A.; Sloan, Steven A.; Palmer, Theo D.
2014-01-01
Whole brain irradiation remains important in the management of brain tumors. Although necessary for improving survival outcomes, cranial irradiation also results in cognitive decline in long-term survivors. A chronic inflammatory state characterized by microglial activation has been implicated in radiation-induced brain injury. We here provide the first comprehensive transcriptional profile of irradiated microglia. Fluorescence-activated cell sorting (FACS) was used to isolate CD11b+ microglia from the hippocampi of C57BL/6 and Balb/c mice 1 month after 10Gy cranial irradiation. Affymetrix gene expression profiles were evaluated using linear modeling, rank product analyses. One month after irradiation, a conserved irradiation signature across strains was identified, comprising 448 and 85 differentially up- and down-regulated genes, respectively. Gene set enrichment analysis (GSEA) demonstrated enrichment for inflammation, including M1 macrophage-associated genes, but also an unexpected enrichment for extracellular matrix and blood coagulation-related gene sets, in contrast previously described microglial states. Weighted gene co-expression network analysis (WGCNA) confirmed these findings and further revealed alterations in mitochondrial function. The RNA-seq transcriptome of microglia 24h post-radiation proved similar to the 1-month transcriptome, but additionally featured alterations in apoptotic and lysosomal gene expression. Re-analysis of published aging mouse microglia transcriptome data demonstrated striking similarity to the 1 month irradiated microglia transcriptome, suggesting that shared mechanisms may underlie aging and chronic irradiation-induced cognitive decline. PMID:25690519
Hill, W D; Davies, G; Liewald, D C; Payton, A; McNeil, C J; Whalley, L J; Horan, M; Ollier, W; Starr, J M; Pendleton, N; Hansel, N K; Montgomery, G W; Medland, S E; Martin, N G; Wright, M J; Bates, T C; Deary, I J
2016-01-01
Two themes are emerging regarding the molecular genetic aetiology of intelligence. The first is that intelligence is influenced by many variants and those that are tagged by common single nucleotide polymorphisms account for around 30% of the phenotypic variation. The second, in line with other polygenic traits such as height and schizophrenia, is that these variants are not randomly distributed across the genome but cluster in genes that work together. Less clear is whether the very low range of cognitive ability (intellectual disability) is simply one end of the normal distribution describing individual differences in cognitive ability across a population. Here, we examined 40 genes with a known association with non-syndromic autosomal recessive intellectual disability (NS-ARID) to determine if they are enriched for common variants associated with the normal range of intelligence differences. The current study used the 3511 individuals of the Cognitive Ageing Genetics in England and Scotland (CAGES) consortium. In addition, a text mining analysis was used to identify gene sets biologically related to the NS-ARID set. Gene-based tests indicated that genes implicated in NS-ARID were not significantly enriched for quantitative trait loci (QTL) associated with intelligence. These findings suggest that genes in which mutations can have a large and deleterious effect on intelligence are not associated with variation across the range of intelligence differences.
Chen, Xiaohang; Yan, Bingqing; Lou, Huihuang; Shen, Zhenji; Tong, Fangjia; Zhai, Aixia; Wei, Lanlan; Zhang, Fengmin
2018-04-01
Human papillomavirus-positive (HPV+) head and neck squamous cell cancer (HNSCC) exhibits a better prognosis than HPV-negative (HPV-) HNSCC. This difference may in part be due to enhanced immune activation in the HPV+ HNSCC tumor microenvironment. To characterize differences in immune activation between HPV+ and HPV- HNSCC tumors, we identified and annotated differentially expressed genes based upon mRNA expression data from The Cancer Genome Atlas (TCGA). Immune network between immune cells and cytokines was constructed by using single sample Gene Set Enrichment Analysis and conditional mutual information. Multivariate Cox regression analysis was used to determine the prognostic value of immune microenvironment characterization. A total of 1673 differentially expressed genes were functionally annotated. We found that genes upregulated in HPV+ HNSCC are enriched in immune-associated processes. And the up-regulated gene sets were validated by Gene Set Enrichment Analysis. The microenvironment of HPV+ HNSCC exhibited greater numbers of infiltrating B and T cells and fewer neutrophils than HPV- HNSCC. These findings were validated by two independent datasets in the Gene Expression Omnibus (GEO) database. Further analyses of T cell subtypes revealed that cytotoxic T cell subtypes predominated in HPV+ HNSCC. In addition, the ratio of M1/M2 macrophages was much higher in HPV+ HNSCC. The infiltration of these immune cells was correlated with differentially expressed cytokine-associated genes. Enhanced infiltration of B cells and CD8+ T cells were identified as independent protective factors, while high neutrophil infiltration was a risk enhancing factor for HPV+ HNSCC patients. A schematic model of immunological network was established for HPV+ HNSCC to summarize our findings. Copyright © 2018 Elsevier Ltd. All rights reserved.
Moore, Abigail J; Vos, Jurriaan M De; Hancock, Lillian P; Goolsby, Eric; Edwards, Erika J
2018-05-01
Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the "portullugo" (Caryophyllales), a moderately sized lineage of flowering plants (~ 2200 species) that includes the cacti and harbors many evolutionary transitions to C$_{\\mathrm{4}}$ and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C$_{\\mathrm{4}}$ and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C$_{\\mathrm{4}}$ and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75-218 loci across 74 taxa, with ~ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae $+$ Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
htsint: a Python library for sequencing pipelines that combines data through gene set generation.
Richards, Adam J; Herrel, Anthony; Bonneaud, Camille
2015-09-24
Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
Gao, Jianyong; Tian, Gang; Han, Xu; Zhu, Qiang
2018-01-01
Oral squamous cell carcinoma (OSCC) is the sixth most common type cancer worldwide, with poor prognosis. The present study aimed to identify gene signatures that could classify OSCC and predict prognosis in different stages. A training data set (GSE41613) and two validation data sets (GSE42743 and GSE26549) were acquired from the online Gene Expression Omnibus database. In the training data set, patients were classified based on the tumor-node-metastasis staging system, and subsequently grouped into low stage (L) or high stage (H). Signature genes between L and H stages were selected by disparity index analysis, and classification was performed by the expression of these signature genes. The established classification was compared with the L and H classification, and fivefold cross validation was used to evaluate the stability. Enrichment analysis for the signature genes was implemented by the Database for Annotation, Visualization and Integration Discovery. Two validation data sets were used to determine the precise of classification. Survival analysis was conducted followed each classification using the package ‘survival’ in R software. A set of 24 signature genes was identified based on the classification model with the Fi value of 0.47, which was used to distinguish OSCC samples in two different stages. Overall survival of patients in the H stage was higher than those in the L stage. Signature genes were primarily enriched in ‘ether lipid metabolism’ pathway and biological processes such as ‘positive regulation of adaptive immune response’ and ‘apoptotic cell clearance’. The results provided a novel 24-gene set that may be used as biomarkers to predict OSCC prognosis with high accuracy, which may be used to determine an appropriate treatment program for patients with OSCC in addition to the traditional evaluation index. PMID:29257303
twzPEA: A Topology and Working Zone Based Pathway Enrichment Analysis Framework
USDA-ARS?s Scientific Manuscript database
Sensitive detection of involvement and adaptation of key signaling, regulatory, and metabolic pathways holds the key to deciphering molecular mechanisms such as those in the biomass-to-biofuel conversion process in yeast. Typical gene set enrichment analyses often do not use topology information in...
eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks.
Clarke, Daniel J B; Kuleshov, Maxim V; Schilder, Brian M; Torre, Denis; Duffy, Mary E; Keenan, Alexandra B; Lachmann, Alexander; Feldmann, Axel S; Gundersen, Gregory W; Silverstein, Moshe C; Wang, Zichen; Ma'ayan, Avi
2018-05-25
While gene expression data at the mRNA level can be globally and accurately measured, profiling the activity of cell signaling pathways is currently much more difficult. eXpression2Kinases (X2K) computationally predicts involvement of upstream cell signaling pathways, given a signature of differentially expressed genes. X2K first computes enrichment for transcription factors likely to regulate the expression of the differentially expressed genes. The next step of X2K connects these enriched transcription factors through known protein-protein interactions (PPIs) to construct a subnetwork. The final step performs kinase enrichment analysis on the members of the subnetwork. X2K Web is a new implementation of the original eXpression2Kinases algorithm with important enhancements. X2K Web includes many new transcription factor and kinase libraries, and PPI networks. For demonstration, thousands of gene expression signatures induced by kinase inhibitors, applied to six breast cancer cell lines, are provided for fetching directly into X2K Web. The results are displayed as interactive downloadable vector graphic network images and bar graphs. Benchmarking various settings via random permutations enabled the identification of an optimal set of parameters to be used as the default settings in X2K Web. X2K Web is freely available from http://X2K.cloud.
Use of molecular techniques to evaluate the survival of a microorganism injected into an aquifer
Thiem, S.M.; Krumme, M.L.; Smith, R.L.; Tiedje, J.M.
1994-01-01
A PCR primer set and an internal probe that are specific for Pseudomonas sp. strain B13, a 3-chlorobenzoate-metabolizing strain, were developed. Using this primer set and probe, we were able to detect Pseudomonas sp. strain B13 DNA sequences in DNA extracted from aquifer samples 14.5 months after Pseudomonas sp. strain B13 had been injected into a sand and gravel aquifer. This primer set and probe were also used to analyze isolates from 3-chlorobenzoate enrichments of the aquifer samples by Southern blot analysis. Hybridization of Southern blots with the Pseudomonas sp. strain B13-specific probe and a catabolic probe in conjunction with restriction fragment length polymorphism (RFLP) analysis of ribosome genes was used to determine that viable Pseudomonas sp. strain B13 persisted in this environment. We isolated a new 3-chlorobenzoate-degrading strain from one of these enrichment cultures. The B13-specific probe does not hybridize to DNA from this isolate. The new strain could be the result of gene exchange between Pseudomonas sp. strain B13 and an indigenous bacterium. This speculation is based on an RFLP pattern of ribosome genes that differs from that of Pseudomonas sp. strain B13, the fact that identically sized restriction fragments hybridized to the catabolic gene probe, and the absence of any enrichable 3-chlorobenzoate-degrading strains in the aquifer prior to inoculation.
He, Hao; Zhang, Lei; Li, Jian; Wang, Yu-Ping; Zhang, Ji-Gang; Shen, Jie; Guo, Yan-Fang
2014-01-01
Context: To date, few systems genetics studies in the bone field have been performed. We designed our study from a systems-level perspective by integrating genome-wide association studies (GWASs), human protein-protein interaction (PPI) network, and gene expression to identify gene modules contributing to osteoporosis risk. Methods: First we searched for modules significantly enriched with bone mineral density (BMD)-associated genes in human PPI network by using 2 large meta-analysis GWAS datasets through a dense module search algorithm. One included 7 individual GWAS samples (Meta7). The other was from the Genetic Factors for Osteoporosis Consortium (GEFOS2). One was assigned as a discovery dataset and the other as an evaluation dataset, and vice versa. Results: In total, 42 modules and 129 modules were identified significantly in both Meta7 and GEFOS2 datasets for femoral neck and spine BMD, respectively. There were 3340 modules identified for hip BMD only in Meta7. As candidate modules, they were assessed for the biological relevance to BMD by gene set enrichment analysis in 2 expression profiles generated from circulating monocytes in subjects with low versus high BMD values. Interestingly, there were 2 modules significantly enriched in monocytes from the low BMD group in both gene expression datasets (nominal P value <.05). Two modules had 16 nonredundant genes. Functional enrichment analysis revealed that both modules were enriched for genes involved in Wnt receptor signaling and osteoblast differentiation. Conclusion: We highlighted 2 modules and novel genes playing important roles in the regulation of bone mass, providing important clues for therapeutic approaches for osteoporosis. PMID:25119315
Ramanan, Vijay K; Kim, Sungeun; Holohan, Kelly; Shen, Li; Nho, Kwangsik; Risacher, Shannon L; Foroud, Tatiana M; Mukherjee, Shubhabrata; Crane, Paul K; Aisen, Paul S; Petersen, Ronald C; Weiner, Michael W; Saykin, Andrew J
2012-12-01
Memory deficits are prominent features of mild cognitive impairment (MCI) and Alzheimer's disease (AD). The genetic architecture underlying these memory deficits likely involves the combined effects of multiple genetic variants operative within numerous biological pathways. In order to identify functional pathways associated with memory impairment, we performed a pathway enrichment analysis on genome-wide association data from 742 Alzheimer's Disease Neuroimaging Initiative (ADNI) participants. A composite measure of memory was generated as the phenotype for this analysis by applying modern psychometric theory to item-level data from the ADNI neuropsychological test battery. Using the GSA-SNP software tool, we identified 27 canonical, expertly-curated pathways with enrichment (FDR-corrected p-value < 0.05) against this composite memory score. Processes classically understood to be involved in memory consolidation, such as neurotransmitter receptor-mediated calcium signaling and long-term potentiation, were highly represented among the enriched pathways. In addition, pathways related to cell adhesion, neuronal differentiation and guided outgrowth, and glucose- and inflammation-related signaling were also enriched. Among genes that were highly-represented in these enriched pathways, we found indications of coordinated relationships, including one large gene set that is subject to regulation by the SP1 transcription factor, and another set that displays co-localized expression in normal brain tissue along with known AD risk genes. These results 1) demonstrate that psychometrically-derived composite memory scores are an effective phenotype for genetic investigations of memory impairment and 2) highlight the promise of pathway analysis in elucidating key mechanistic targets for future studies and for therapeutic interventions.
Weidner, Christopher; Steinfath, Matthias; Wistorf, Elisa; Oelgeschläger, Michael; Schneider, Marlon R; Schönfelder, Gilbert
2017-08-16
Recent studies that compared transcriptomic datasets of human diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. A major reason for the discrepancies between different gene expression analyses is the arbitrary filtering of differentially expressed genes. Furthermore, the comparison of single genes between different species and platforms often is limited by technical variance, leading to misinterpretation of the con/discordance between data from human and animal models. Thus, standardized approaches for systematic data analysis are needed. To overcome subjective gene filtering and ineffective gene-to-gene comparisons, we recently demonstrated that gene set enrichment analysis (GSEA) has the potential to avoid these problems. Therefore, we developed a standardized protocol for the use of GSEA to distinguish between appropriate and inappropriate animal models for translational research. This protocol is not suitable to predict how to design new model systems a-priori, as it requires existing experimental omics data. However, the protocol describes how to interpret existing data in a standardized manner in order to select the most suitable animal model, thus avoiding unnecessary animal experiments and misleading translational studies.
Hwang, Sun-Goo; Kim, Dong Sub; Hwang, Jung Eun; Han, A-Reum; Jang, Cheol Seong
2014-05-15
In order to better understand the biological systems that are affected in response to cosmic ray (CR), we conducted weighted gene co-expression network analysis using the module detection method. By using the Pearson's correlation coefficient (PCC) value, we evaluated complex gene-gene functional interactions between 680 CR-responsive probes from integrated microarray data sets, which included large-scale transcriptional profiling of 1000 microarray samples. These probes were divided into 6 distinct modules that contained 20 enriched gene ontology (GO) functions, such as oxidoreductase activity, hydrolase activity, and response to stimulus and stress. In particular, modules 1 and 2 commonly showed enriched annotation categories such as oxidoreductase activity, including enriched cis-regulatory elements known as ROS-specific regulators. These results suggest that the ROS-mediated irradiation response pathway is affected by CR in modules 1 and 2. We found 243 ionizing radiation (IR)-responsive probes that exhibited similarities in expression patterns in various irradiation microarray data sets. The expression patterns of 6 randomly selected IR-responsive genes were evaluated by quantitative reverse transcription polymerase chain reaction following treatment with CR, gamma rays (GR), and ion beam (IB); similar patterns were observed among these genes under these 3 treatments. Moreover, we constructed subnetworks of IR-responsive genes and evaluated the expression levels of their neighboring genes following GR treatment; similar patterns were observed among them. These results of network-based analyses might provide a clue to understanding the complex biological system related to the CR response in plants. Copyright © 2014 Elsevier B.V. All rights reserved.
Bioinformatics/biostatistics: microarray analysis.
Eichler, Gabriel S
2012-01-01
The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).
Muziasari, Windi I.; Pitkänen, Leena K.; Sørum, Henning; Stedtfeld, Robert D.; Tiedje, James M.; Virta, Marko
2017-01-01
Our previous studies showed that particular antibiotic resistance genes (ARGs) were enriched locally in sediments below fish farms in the Northern Baltic Sea, Finland, even when the selection pressure from antibiotics was negligible. We assumed that a constant influx of farmed fish feces could be the plausible source of the ARGs enriched in the farm sediments. In the present study, we analyzed the composition of the antibiotic resistome from the intestinal contents of 20 fish from the Baltic Sea farms. We used a high-throughput method, WaferGen qPCR array with 364 primer sets to detect and quantify ARGs, mobile genetic elements (MGE), and the 16S rRNA gene. Despite a considerably wide selection of qPCR primer sets, only 28 genes were detected in the intestinal contents. The detected genes were ARGs encoding resistance to sulfonamide (sul1), trimethoprim (dfrA1), tetracycline [tet(32), tetM, tetO, tetW], aminoglycoside (aadA1, aadA2), chloramphenicol (catA1), and efflux-pumps resistance genes (emrB, matA, mefA, msrA). The detected genes also included class 1 integron-associated genes (intI1, qacEΔ1) and transposases (tnpA). Importantly, most of the detected genes were the same genes enriched in the farm sediments. This preliminary study suggests that feces from farmed fish contribute to the ARG enrichment in farm sediments despite the lack of contemporaneous antibiotic treatments at the farms. We observed that the intestinal contents of individual farmed fish had their own resistome compositions. Our result also showed that the total relative abundances of transposases and tet genes were significantly correlated (p = 0.001, R2 = 0.71). In addition, we analyzed the mucosal skin and gill filament resistomes of the farmed fish but only one multidrug-efflux resistance gene (emrB) was detected. To our knowledge, this is the first study reporting the resistome of farmed fish using a culture-independent method. Determining the possible sources of ARGs, especially mobilized ARGs, is essential for controlling the occurrence and spread of ARGs at fish farming facilities and for lowering the risk of ARG spread from the farms to surrounding environments. PMID:28111573
Muziasari, Windi I; Pitkänen, Leena K; Sørum, Henning; Stedtfeld, Robert D; Tiedje, James M; Virta, Marko
2016-01-01
Our previous studies showed that particular antibiotic resistance genes (ARGs) were enriched locally in sediments below fish farms in the Northern Baltic Sea, Finland, even when the selection pressure from antibiotics was negligible. We assumed that a constant influx of farmed fish feces could be the plausible source of the ARGs enriched in the farm sediments. In the present study, we analyzed the composition of the antibiotic resistome from the intestinal contents of 20 fish from the Baltic Sea farms. We used a high-throughput method, WaferGen qPCR array with 364 primer sets to detect and quantify ARGs, mobile genetic elements (MGE), and the 16S rRNA gene. Despite a considerably wide selection of qPCR primer sets, only 28 genes were detected in the intestinal contents. The detected genes were ARGs encoding resistance to sulfonamide ( sul1 ), trimethoprim ( dfrA1 ), tetracycline [ tet(32), tetM, tetO, tetW ], aminoglycoside ( aadA1, aadA2 ), chloramphenicol ( catA1 ), and efflux-pumps resistance genes ( emrB, matA, mefA, msrA ). The detected genes also included class 1 integron-associated genes ( intI1, qacE Δ 1 ) and transposases ( tnpA ). Importantly, most of the detected genes were the same genes enriched in the farm sediments. This preliminary study suggests that feces from farmed fish contribute to the ARG enrichment in farm sediments despite the lack of contemporaneous antibiotic treatments at the farms. We observed that the intestinal contents of individual farmed fish had their own resistome compositions. Our result also showed that the total relative abundances of transposases and tet genes were significantly correlated ( p = 0.001, R 2 = 0.71). In addition, we analyzed the mucosal skin and gill filament resistomes of the farmed fish but only one multidrug-efflux resistance gene ( emrB ) was detected. To our knowledge, this is the first study reporting the resistome of farmed fish using a culture-independent method. Determining the possible sources of ARGs, especially mobilized ARGs, is essential for controlling the occurrence and spread of ARGs at fish farming facilities and for lowering the risk of ARG spread from the farms to surrounding environments.
Markov Chain Ontology Analysis (MCOA)
2012-01-01
Background Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. Results In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. Conclusion A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches. PMID:22300537
Markov Chain Ontology Analysis (MCOA).
Frost, H Robert; McCray, Alexa T
2012-02-03
Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.
Martin, Guiomar; Soy, Judit; Monte, Elena
2016-01-01
Members of the PIF quartet (PIFq; PIF1, PIF3, PIF4, and PIF5) collectively contribute to induce growth in Arabidopsis seedlings under short day (SD) conditions, specifically promoting elongation at dawn. Their action involves the direct regulation of growth-related and hormone-associated genes. However, a comprehensive definition of the PIFq-regulated transcriptome under SD is still lacking. We have recently shown that SD and free-running (LL) conditions correspond to "growth" and "no growth" conditions, respectively, correlating with greater abundance of PIF protein in SD. Here, we present a genomic analysis whereby we first define SD-regulated genes at dawn compared to LL in the wild type, followed by identification of those SD-regulated genes whose expression depends on the presence of PIFq. By using this sequential strategy, we have identified 349 PIF/SD-regulated genes, approximately 55% induced and 42% repressed by both SD and PIFq. Comparison with available databases indicates that PIF/SD-induced and PIF/SD-repressed sets are differently phased at dawn and mid-morning, respectively. In addition, we found that whereas rhythmicity of the PIF/SD-induced gene set is lost in LL, most PIF/SD-repressed genes keep their rhythmicity in LL, suggesting differential regulation of both gene sets by the circadian clock. Moreover, we also uncovered distinct overrepresented functions in the induced and repressed gene sets, in accord with previous studies in other examined PIF-regulated processes. Interestingly, promoter analyses showed that, whereas PIF/SD-induced genes are enriched in direct PIF targets, PIF/SD-repressed genes are mostly indirectly regulated by the PIFs and might be more enriched in ABA-regulated genes.
GeneSigDB: a manually curated database and resource for analysis of gene expression signatures
Culhane, Aedín C.; Schröder, Markus S.; Sultana, Razvan; Picard, Shaita C.; Martinelli, Enzo N.; Kelly, Caroline; Haibe-Kains, Benjamin; Kapushesky, Misha; St Pierre, Anne-Alyssa; Flahive, William; Picard, Kermshlise C.; Gusenleitner, Daniel; Papenhausen, Gerald; O'Connor, Niall; Correll, Mick; Quackenbush, John
2012-01-01
GeneSigDB (http://www.genesigdb.org or http://compbio.dfci.harvard.edu/genesigdb/) is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a ‘basket’ feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from http://www.genesigdb.org. PMID:22110038
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis.
Yu, Guangchuang; Wang, Li-Gen; Yan, Guang-Rong; He, Qing-Yu
2015-02-15
Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. This allows biologists to verify disease relevance in a biological experiment and identify unexpected disease associations. Comparison among gene clusters is also supported. DOSE is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/DOSE.html). Supplementary data are available at Bioinformatics online. gcyu@connect.hku.hk or tqyhe@jnu.edu.cn. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Pathway Coexpression Network: Revealing pathway relationships
Tanzi, Rudolph E.
2018-01-01
A goal of genomics is to understand the relationships between biological processes. Pathways contribute to functional interplay within biological processes through complex but poorly understood interactions. However, limited functional references for global pathway relationships exist. Pathways from databases such as KEGG and Reactome provide discrete annotations of biological processes. Their relationships are currently either inferred from gene set enrichment within specific experiments, or by simple overlap, linking pathway annotations that have genes in common. Here, we provide a unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) to establish the Pathway Coexpression Network (PCxN). We estimated the correlation between canonical pathways valid in a broad context using a curated collection of 3,207 microarrays from 72 normal human tissues. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. We demonstrate that PCxN provides novel insight into mechanisms of complex diseases using an Alzheimer’s Disease (AD) case study. PCxN retrieved pathways significantly correlated with an expert curated AD gene list. These pathways have known associations with AD and were significantly enriched for genes independently associated with AD. As a further step, we show how PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways. PCxN revealed that correlated pathways from an AD expression profiling study include functional clusters involved in cell adhesion and oxidative stress. PCxN provides expanded connections to pathways from the extracellular matrix. PCxN provides a powerful new framework for interrogation of global pathway relationships. Comprehensive exploration of PCxN can be performed at http://pcxn.org/. PMID:29554099
Markunas, Christina A; Johnson, Eric O; Hancock, Dana B
2017-07-01
Genome-wide association study (GWAS)-identified variants are enriched for functional elements. However, we have limited knowledge of how functional enrichment may differ by disease/trait and tissue type. We tested a broad set of eight functional elements for enrichment among GWAS-identified SNPs (p < 5×10 -8 ) from the NHGRI-EBI Catalog across seven disease/trait categories: cancer, cardiovascular disease, diabetes, autoimmune disease, psychiatric disease, neurological disease, and anthropometric traits. SNPs were annotated using HaploReg for the eight functional elements across any tissue: DNase sites, expression quantitative trait loci (eQTL), sequence conservation, enhancers, promoters, missense variants, sequence motifs, and protein binding sites. In addition, tissue-specific annotations were considered for brain vs. blood. Disease/trait SNPs were compared to a control set of 4809 SNPs matched to the GWAS SNPs (N = 1639) on allele frequency, gene density, distance to nearest gene, and linkage disequilibrium at ~3:1 ratio. Enrichment analyses were conducted using logistic regression, with Bonferroni correction. Overall, a significant enrichment was observed for all functional elements, except sequence motifs. Missense SNPs showed the strongest magnitude of enrichment. eQTLs were the only functional element significantly enriched across all diseases/traits. Magnitudes of enrichment were generally similar across diseases/traits, where enrichment was statistically significant. Blood vs. brain tissue effects on enrichment were dependent on disease/trait and functional element (e.g., cardiovascular disease: eQTLs P TissueDifference = 1.28 × 10 -6 vs. enhancers P TissueDifference = 0.94). Identifying disease/trait-relevant functional elements and tissue types could provide new insight into the underlying biology, by guiding a priori GWAS analyses (e.g., brain enhancer elements for psychiatric disease) or facilitating post hoc interpretation.
Ozbayram, E Gozde; Kleinsteuber, Sabine; Nikolausz, Marcell; Ince, Bahar; Ince, Orhan
2017-08-01
The aim of this study was to determine the potential of bioaugmentation with cellulolytic rumen microbiota to enhance the anaerobic digestion of lignocellulosic feedstock. An anaerobic cellulolytic culture was enriched from sheep rumen fluid using wheat straw as substrate under mesophilic conditions. To investigate the effects of bioaugmentation on methane production from straw, the enrichment culture was added to batch reactors in proportions of 2% (Set-1) and 4% (Set-2) of the microbial cell number of the standard inoculum slurry. The methane production in the bioaugmented reactors was higher than in the control reactors. After 30 days of batch incubation, the average methane yield was 154 mL N CH 4 g VS -1 in the control reactors. Addition of 2% enrichment culture did not enhance methane production, whereas in Set-2 the methane yield was increased by 27%. The bacterial communities were examined by 454 amplicon sequencing of 16S rRNA genes, while terminal restriction fragment length polymorphism (T-RFLP) fingerprinting of mcrA genes was applied to analyze the methanogenic communities. The results highlighted that relative abundances of Ruminococcaceae and Lachnospiraceae increased during the enrichment. However, Cloacamonaceae, which were abundant in the standard inoculum, dominated the bacterial communities of all batch reactors. T-RFLP profiles revealed that Methanobacteriales were predominant in the rumen fluid, whereas the enrichment culture was dominated by Methanosarcinales. In the batch rectors, the most abundant methanogens were affiliated to Methanobacteriales and Methanomicrobiales. Our results suggest that bioaugmentation with sheep rumen enrichment cultures can enhance the performance of digesters treating lignocellulosic feedstock. Copyright © 2017 Elsevier Ltd. All rights reserved.
Discovering causal signaling pathways through gene-expression patterns
Parikh, Jignesh R.; Klinger, Bertram; Xia, Yu; Marto, Jarrod A.; Blüthgen, Nils
2010-01-01
High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place. Here, we present an intuitive approach for discovering upstream signaling pathways responsible for regulating these differentially expressed genes. We identify consistently regulated signature genes specific for signal transduction pathways from a panel of single-pathway perturbation experiments. An algorithm that detects overrepresentation of these signature genes in a gene group of interest is used to infer the signaling pathway responsible for regulation. We expose our novel resource and algorithm through a web server called SPEED: Signaling Pathway Enrichment using Experimental Data sets. SPEED can be freely accessed at http://speed.sys-bio.net/. PMID:20494976
Jia, Peilin; Chen, Xiangning; Xie, Wei; Kendler, Kenneth S; Zhao, Zhongming
2018-06-20
Numerous high-throughput omics studies have been conducted in schizophrenia, providing an accumulated catalog of susceptible variants and genes. The results from these studies, however, are highly heterogeneous. The variants and genes nominated by different omics studies often have limited overlap with each other. There is thus a pressing need for integrative analysis to unify the different types of data and provide a convergent view of schizophrenia candidate genes (SZgenes). In this study, we collected a comprehensive, multidimensional dataset, including 7819 brain-expressed genes. The data hosted genome-wide association evidence in genetics (eg, genotyping data, copy number variations, de novo mutations), epigenetics, transcriptomics, and literature mining. We developed a method named mega-analysis of odds ratio (MegaOR) to prioritize SZgenes. Application of MegaOR in the multidimensional data resulted in consensus sets of SZgenes (up to 530), each enriched with dense, multidimensional evidence. We proved that these SZgenes had highly tissue-specific expression in brain and nerve and had intensive interactions that were significantly stronger than chance expectation. Furthermore, we found these SZgenes were involved in human brain development by showing strong spatiotemporal expression patterns; these characteristics were replicated in independent brain expression datasets. Finally, we found the SZgenes were enriched in critical functional gene sets involved in neuronal activities, ligand gated ion signaling, and fragile X mental retardation protein targets. In summary, MegaOR analysis reported consensus sets of SZgenes with enriched association evidence to schizophrenia, providing insights into the pathophysiology underlying schizophrenia.
Nicoletti, Paola; Bansal, Mukesh; Lefebvre, Celine; Guarnieri, Paolo; Shen, Yufeng; Pe'er, Itsik; Califano, Andrea; Floratos, Aris
2015-01-01
Stevens-Johnson syndrome (SJS) and Toxic Epidermal Necrolysis (TEN) represent rare but serious adverse drug reactions (ADRs). Both are characterized by distinctive blistering lesions and significant mortality rates. While there is evidence for strong drug-specific genetic predisposition related to HLA alleles, recent genome wide association studies (GWAS) on European and Asian populations have failed to identify genetic susceptibility alleles that are common across multiple drugs. We hypothesize that this is a consequence of the low to moderate effect size of individual genetic risk factors. To test this hypothesis we developed Pointer, a new algorithm that assesses the aggregate effect of multiple low risk variants on a pathway using a gene set enrichment approach. A key advantage of our method is the capability to associate SNPs with genes by exploiting physical proximity as well as by using expression quantitative trait loci (eQTLs) that capture information about both cis- and trans-acting regulatory effects. We control for known bias-inducing aspects of enrichment based analyses, such as: 1) gene length, 2) gene set size, 3) presence of biologically related genes within the same linkage disequilibrium (LD) region, and, 4) genes shared among multiple gene sets. We applied this approach to publicly available SJS/TEN genome-wide genotype data and identified the ABC transporter and Proteasome pathways as potentially implicated in the genetic susceptibility of non-drug-specific SJS/TEN. We demonstrated that the innovative SNP-to-gene mapping phase of the method was essential in detecting the significant enrichment for those pathways. Analysis of an independent gene expression dataset provides supportive functional evidence for the involvement of Proteasome pathways in SJS/TEN cutaneous lesions. These results suggest that Pointer provides a useful framework for the integrative analysis of pharmacogenetic GWAS data, by increasing the power to detect aggregate effects of multiple low risk variants. The software is available for download at https://sourceforge.net/projects/pointergsa/.
Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.
Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K
2011-01-01
Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.
Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1
Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron
2014-01-01
• Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629
Heck, Angela; Fastenrath, Matthias; Coynel, David; Auschra, Bianca; Bickel, Horst; Freytag, Virginie; Gschwind, Leo; Hartmann, Francina; Jessen, Frank; Kaduszkiewicz, Hanna; Maier, Wolfgang; Milnik, Annette; Pentzek, Michael; Riedel-Heller, Steffi G; Spalek, Klara; Vogler, Christian; Wagner, Michael; Weyerer, Siegfried; Wolfsgruber, Steffen; de Quervain, Dominique J-F; Papassotiropoulos, Andreas
2015-10-01
Human episodic memory performance is linked to the function of specific brain regions, including the hippocampus; declines as a result of increasing age; and is markedly disturbed in Alzheimer disease (AD), an age-associated neurodegenerative disorder that primarily affects the hippocampus. Exploring the molecular underpinnings of human episodic memory is key to the understanding of hippocampus-dependent cognitive physiology and pathophysiology. To determine whether biologically defined groups of genes are enriched in episodic memory performance across age, memory encoding-related brain activity, and AD. In this multicenter collaborative study, which began in August 2008 and is ongoing, gene set enrichment analysis was done by using primary and meta-analysis data from 57 968 participants. The Swiss cohorts consisted of 3043 healthy young adults assessed for episodic memory performance. In a subgroup (n = 1119) of one of these cohorts, functional magnetic resonance imaging was used to identify gene set-dependent differences in brain activity related to episodic memory. The German Study on Aging, Cognition, and Dementia in Primary Care Patients cohort consisted of 763 elderly participants without dementia who were assessed for episodic memory performance. The International Genomics of Alzheimer's Project case-control sample consisted of 54 162 participants (17 008 patients with sporadic AD and 37 154 control participants). Analyses were conducted between January 2014 and June 2015. Gene set enrichment analysis in all samples was done using genome-wide single-nucleotide polymorphism data. Episodic memory performance in the Swiss cohort and German Study on Aging, Cognition, and Dementia in Primary Care Patients cohort was quantified by picture and verbal delayed free recall tasks. In the functional magnetic resonance imaging experiment, activation of the hippocampus during encoding of pictures served as the phenotype of interest. In the International Genomics of Alzheimer's Project sample, diagnosis of sporadic AD served as the phenotype of interest. In the discovery sample, we detected significant enrichment for genes constituting the calcium signaling pathway, especially those related to the elevation of cytosolic calcium (P = 2 × 10-4). This enrichment was replicated in 2 additional samples of healthy young individuals (P = .02 and .04, respectively) and a sample of healthy elderly participants (P = .004). Hippocampal activation (P = 4 × 10-4) and the risk for sporadic AD (P = .01) were also significantly enriched for genes related to the elevation of cytosolic calcium. By detecting consistent significant enrichment in independent cohorts of young and elderly participants, this study identified that calcium signaling plays a central role in hippocampus-dependent human memory processes in cognitive health and disease, contributing to the understanding and potential treatment of hippocampus-dependent cognitive pathology.
Polygenic overlap between schizophrenia risk and antipsychotic response: a genomic medicine approach
Ruderfer, Douglas M; Charney, Alexander W; Readhead, Ben; Kidd, Brian A; Kähler, Anna K; Kenny, Paul J; Keiser, Michael J; Moran, Jennifer L; Hultman, Christina M; Scott, Stuart A; Sullivan, Patrick F; Purcell, Shaun M; Dudley, Joel T; Sklar, Pamela
2016-01-01
Summary Background Therapeutic treatments for schizophrenia do not alleviate symptoms for all patients and efficacy is limited by common, often severe, side-effects. Genetic studies of disease can identify novel drug targets, and drugs for which the mechanism has direct genetic support have increased likelihood of clinical success. Large-scale genetic studies of schizophrenia have increased the number of genes and gene sets associated with risk. We aimed to examine the overlap between schizophrenia risk loci and gene targets of a comprehensive set of medications to potentially inform and improve treatment of schizophrenia. Methods We defined schizophrenia risk loci as genomic regions reaching genome-wide significance in the latest Psychiatric Genomics Consortium schizophrenia genome-wide association study (GWAS) of 36 989 cases and 113 075 controls and loss of function variants observed only once among 5079 individuals in an exome-sequencing study of 2536 schizophrenia cases and 2543 controls (Swedish Schizophrenia Study). Using two large and orthogonally created databases, we collated drug targets into 167 gene sets targeted by pharmacologically similar drugs and examined enrichment of schizophrenia risk loci in these sets. We further linked the exome-sequenced data with a national drug registry (the Swedish Prescribed Drug Register) to assess the contribution of rare variants to treatment response, using clozapine prescription as a proxy for treatment resistance. Findings We combined results from testing rare and common variation and, after correction for multiple testing, two gene sets were associated with schizophrenia risk: agents against amoebiasis and other protozoal diseases (106 genes, p=0·00046, pcorrected =0·024) and antipsychotics (347 genes, p=0·00078, pcorrected=0·046). Further analysis pointed to antipsychotics as having independent enrichment after removing genes that overlapped these two target sets. We noted significant enrichment both in known targets of antipsychotics (70 genes, p=0·0078) and novel predicted targets (277 genes, p=0·019). Patients with treatment-resistant schizophrenia had an excess of rare disruptive variants in gene targets of antipsychotics (347 genes, p=0·0067) and in genes with evidence for a role in antipsychotic efficacy (91 genes, p=0·0029). Interpretation Our results support genetic overlap between schizophrenia pathogenesis and antipsychotic mechanism of action. This finding is consistent with treatment efficacy being polygenic and suggests that single-target therapeutics might be insufficient. We provide evidence of a role for rare functional variants in antipsychotic treatment response, pointing to a subset of patients where their genetic information could inform treatment. Finally, we present a novel framework for identifying treatments from genetic data and improving our understanding of therapeutic mechanism. PMID:26915512
DE NOVO MUTATIONS IN AUTISM IMPLICATE THE SYNAPTIC ELIMINATION NETWORK.
Ram Venkataraman, Guhan; O'Connell, Chloe; Egawa, Fumiko; Kashef-Haghighi, Dorna; Wall, Dennis P
2017-01-01
Autism has been shown to have a major genetic risk component; the architecture of documented autism in families has been over and again shown to be passed down for generations. While inherited risk plays an important role in the autistic nature of children, de novo (germline) mutations have also been implicated in autism risk. Here we find that autism de novo variants verified and published in the literature are Bonferroni-significantly enriched in a gene set implicated in synaptic elimination. Additionally, several of the genes in this synaptic elimination set that were enriched in protein-protein interactions (CACNA1C, SHANK2, SYNGAP1, NLGN3, NRXN1, and PTEN) have been previously confirmed as genes that confer risk for the disorder. The results demonstrate that autism-associated de novos are linked to proper synaptic pruning and density, hinting at the etiology of autism and suggesting pathophysiology for downstream correction and treatment.
Logue, Mark W.; Smith, Alicia K.; Baldwin, Clinton; Wolf, Erika J.; Guffanti, Guia; Ratanatharathorn, Andrew; Stone, Annjanette; Schichman, Steven A.; Humphries, Donald; Binder, Elisabeth B.; Arloth, Janine; Menke, Andreas; Uddin, Monica; Wildman, Derek; Galea, Sandro; Aiello, Allison E.; Koenen, Karestan C.; Miller, Mark W.
2015-01-01
We examined the association between posttraumatic stress disorder (PTSD) and gene expression using whole blood samples from a cohort of trauma-exposed white non-Hispanic male veterans (115 cases and 28 controls). 10,264 probes of genes and gene transcripts were analyzed. We found 41 that were differentially expressed in PTSD cases versus controls (multiple-testing corrected p<0.05). The most significant was DSCAM, a neurological gene expressed widely in the developing brain and in the amygdala and hippocampus of the adult brain. We then examined the 41 differentially expressed genes in a meta-analysis using two replication cohorts and found significant associations with PTSD for 7 of the 41 (p<0.05), one of which (ATP6AP1L) survived multiple-testing correction. There was also broad evidence of overlap across the discovery and replication samples for the entire set of genes implicated in the discovery data based on the direction of effect and an enrichment of p<0.05 significant probes beyond what would be expected under the null. Finally, we found that the set of differentially expressed genes from the discovery sample was enriched for genes responsive to glucocorticoid signaling with most showing reduced expression in PTSD cases compared to controls. PMID:25867994
Microarray analysis reveals key genes and pathways in Tetralogy of Fallot
He, Yue-E; Qiu, Hui-Xian; Jiang, Jian-Bing; Wu, Rong-Zhou; Xiang, Ru-Lian; Zhang, Yuan-Hai
2017-01-01
The aim of the present study was to identify key genes that may be involved in the pathogenesis of Tetralogy of Fallot (TOF) using bioinformatics methods. The GSE26125 microarray dataset, which includes cardiovascular tissue samples derived from 16 children with TOF and five healthy age-matched control infants, was downloaded from the Gene Expression Omnibus database. Differential expression analysis was performed between TOF and control samples to identify differentially expressed genes (DEGs) using Student's t-test, and the R/limma package, with a log2 fold-change of >2 and a false discovery rate of <0.01 set as thresholds. The biological functions of DEGs were analyzed using the ToppGene database. The ReactomeFIViz application was used to construct functional interaction (FI) networks, and the genes in each module were subjected to pathway enrichment analysis. The iRegulon plugin was used to identify transcription factors predicted to regulate the DEGs in the FI network, and the gene-transcription factor pairs were then visualized using Cytoscape software. A total of 878 DEGs were identified, including 848 upregulated genes and 30 downregulated genes. The gene FI network contained seven function modules, which were all comprised of upregulated genes. Genes enriched in Module 1 were enriched in the following three neurological disorder-associated signaling pathways: Parkinson's disease, Alzheimer's disease and Huntington's disease. Genes in Modules 0, 3 and 5 were dominantly enriched in pathways associated with ribosomes and protein translation. The Xbox binding protein 1 transcription factor was demonstrated to be involved in the regulation of genes encoding the subunits of cytoplasmic and mitochondrial ribosomes, as well as genes involved in neurodegenerative disorders. Therefore, dysfunction of genes involved in signaling pathways associated with neurodegenerative disorders, ribosome function and protein translation may contribute to the pathogenesis of TOF. PMID:28713939
Wei, Qingyi Wei
2012-01-01
Asbestos exposure is a known risk factor for lung cancer. Although recent genome-wide association studies (GWASs) have identified some novel loci for lung cancer risk, few addressed genome-wide gene–environment interactions. To determine gene–asbestos interactions in lung cancer risk, we conducted genome-wide gene–environment interaction analyses at levels of single nucleotide polymorphisms (SNPs), genes and pathways, using our published Texas lung cancer GWAS dataset. This dataset included 317 498 SNPs from 1154 lung cancer cases and 1137 cancer-free controls. The initial SNP-level P-values for interactions between genetic variants and self-reported asbestos exposure were estimated by unconditional logistic regression models with adjustment for age, sex, smoking status and pack-years. The P-value for the most significant SNP rs13383928 was 2.17×10–6, which did not reach the genome-wide statistical significance. Using a versatile gene-based test approach, we found that the top significant gene was C7orf54, located on 7q32.1 (P = 8.90×10–5). Interestingly, most of the other significant genes were located on 11q13. When we used an improved gene-set-enrichment analysis approach, we found that the Fas signaling pathway and the antigen processing and presentation pathway were most significant (nominal P < 0.001; false discovery rate < 0.05) among 250 pathways containing 17 572 genes. We believe that our analysis is a pilot study that first describes the gene–asbestos interaction in lung cancer risk at levels of SNPs, genes and pathways. Our findings suggest that immune function regulation-related pathways may be mechanistically involved in asbestos-associated lung cancer risk. Abbreviations:CIconfidence intervalEenvironmentFDRfalse discovery rateGgeneGSEAgene-set-enrichment analysisGWASgenome-wide association studiesi-GSEAimproved gene-set-enrichment analysis approachORodds ratioSNPsingle nucleotide polymorphism PMID:22637743
Biological interpretation of genome-wide association studies using predicted gene functions.
Pers, Tune H; Karjalainen, Juha M; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R; Yang, Jian; Lui, Julian C; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S N; Hirschhorn, Joel N; Franke, Lude
2015-01-19
The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes.
Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.
Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter
2014-01-01
A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.
Chatterjee, Shatakshee; Verma, Srikant Prasad; Pandey, Priyanka
2017-09-05
Initiation and progression of fluid filled cysts mark Autosomal Dominant Polycystic Kidney Disease (ADPKD). Thus, improved therapeutics targeting cystogenesis remains a constant challenge. Microarray studies in single ADPKD animal models species with limited sample sizes tend to provide scattered views on underlying ADPKD pathogenesis. Thus we aim to perform a cross species meta-analysis to profile conserved biological pathways that might be key targets for therapy. Nine ADPKD microarray datasets on rat, mice and human fulfilled our study criteria and were chosen. Intra-species combined analysis was performed after considering removal of batch effect. Significantly enriched GO biological processes and KEGG pathways were computed and their overlap was observed. For the conserved pathways, biological modules and gene regulatory networks were observed. Additionally, Gene Set Enrichment Analysis (GSEA) using Molecular Signature Database (MSigDB) was performed for genes found in conserved pathways. We obtained 28 modules of significantly enriched GO processes and 5 major functional categories from significantly enriched KEGG pathways conserved in human, mice and rats that in turn suggest a global transcriptomic perturbation affecting cyst - formation, growth and progression. Significantly enriched pathways obtained from up-regulated genes such as Genomic instability, Protein localization in ER and Insulin Resistance were found to regulate cyst formation and growth whereas cyst progression due to increased cell adhesion and inflammation was suggested by perturbations in Angiogenesis, TGF-beta, CAMs, and Infection related pathways. Additionally, networks revealed shared genes among pathways e.g. SMAD2 and SMAD7 in Endocytosis and TGF-beta. Our study suggests cyst formation and progression to be an outcome of interplay between a set of several key deregulated pathways. Thus, further translational research is warranted focusing on developing a combinatorial therapeutic approach for ADPKD redressal. Copyright © 2017 Elsevier B.V. All rights reserved.
Lv, Yufeng; Wei, Wenhao; Huang, Zhong; Chen, Zhichao; Fang, Yuan; Pan, Lili; Han, Xueqiong; Xu, Zihai
2018-06-20
The aim of this study was to develop a novel long non-coding RNA (lncRNA) expression signature to accurately predict early recurrence for patients with hepatocellular carcinoma (HCC) after curative resection. Using expression profiles downloaded from The Cancer Genome Atlas database, we identified multiple lncRNAs with differential expression between early recurrence (ER) group and non-early recurrence (non-ER) group of HCC. Least absolute shrinkage and selection operator (LASSO) for logistic regression models were used to develop a lncRNA-based classifier for predicting ER in the training set. An independent test set was used to validated the predictive value of this classifier. Futhermore, a co-expression network based on these lncRNAs and its highly related genes was constructed and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of genes in the network were performed. We identified 10 differentially expressed lncRNAs, including 3 that were upregulated and 7 that were downregulated in ER group. The lncRNA-based classifier was constructed based on 7 lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861 and LINC02084), and its accuracy was 0.83 in training set, 0.87 in test set and 0.84 in total set. And ROC curve analysis showed the AUROC was 0.741 in training set, 0.824 in the test set and 0.765 in total set. A functional enrichment analysis suggested that the genes of which is highly related to 4 lncRNAs were involved in immune system. This 7-lncRNA expression profile can effectively predict the early recurrence after surgical resection for HCC. This article is protected by copyright. All rights reserved.
Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie
2011-09-12
Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
2011-01-01
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis
Han, Junwei; Shi, Xinrui; Zhang, Yunpeng; Xu, Yanjun; Jiang, Ying; Zhang, Chunlong; Feng, Li; Yang, Haixiu; Shang, Desi; Sun, Zeguo; Su, Fei; Li, Chunquan; Li, Xia
2015-01-01
Pathway analyses are playing an increasingly important role in understanding biological mechanism, cellular function and disease states. Current pathway-identification methods generally focus on only the changes of gene expression levels; however, the biological relationships among genes are also the fundamental components of pathways, and the dysregulated relationships may also alter the pathway activities. We propose a powerful computational method, Edge Set Enrichment Analysis (ESEA), for the identification of dysregulated pathways. This provides a novel way of pathway analysis by investigating the changes of biological relationships of pathways in the context of gene expression data. Simulation studies illustrate the power and performance of ESEA under various simulated conditions. Using real datasets from p53 mutation, Type 2 diabetes and lung cancer, we validate effectiveness of ESEA in identifying dysregulated pathways. We further compare our results with five other pathway enrichment analysis methods. With these analyses, we show that ESEA is able to help uncover dysregulated biological pathways underlying complex traits and human diseases via specific use of the dysregulated biological relationships. We develop a freely available R-based tool of ESEA. Currently, ESEA can support pathway analysis of the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther). PMID:26267116
Kar, Siddhartha P; Tyrer, Jonathan P; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S; Cramer, Daniel; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K; Kelemen, Linda E; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Iain A; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B; Narod, Steven A; Nedergaard, Lotte; Ness, Roberta B; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schildkraut, Joellen M; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Sucheston-Campbell, Lara E; Tangen, Ingvild L; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S; van Altena, Anne M; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A; Monteiro, Alvaro N A; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P
2015-10-01
Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by coexpression may also be enriched for additional EOC risk associations. We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly coexpressed with each selected TF gene in the unified microarray dataset of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this dataset were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P < 0.05 and FDR < 0.05). These results were replicated (P < 0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Network analysis integrating large, context-specific datasets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. ©2015 American Association for Cancer Research.
Crosley, E J; Elliot, M G; Christians, J K; Crespi, B J
2013-02-01
Recent evidence from chimpanzees and gorillas has raised doubts that preeclampsia is a uniquely human disease. The deep extravillous trophoblast (EVT) invasion and spiral artery remodeling that characterizes our placenta (and is abnormal in preeclampsia) is shared within great apes, setting Homininae apart from Hylobatidae and Old World Monkeys, which show much shallower trophoblast invasion and limited spiral artery remodeling. We hypothesize that the evolution of a more invasive placenta in the lineage ancestral to the great apes involved positive selection on genes crucial to EVT invasion and spiral artery remodeling. Furthermore, identification of placentally-expressed genes under selection in this lineage may identify novel genes involved in placental development. We tested for positive selection in approximately 18,000 genes using the ratio of non-synonymous to synonymous amino acid substitution for protein-coding DNA. DAVID Bioinformatics Resources identified biological processes enriched in positively selected genes, including processes related to EVT invasion and spiral artery remodeling. Analyses revealed 295 and 264 genes under significant positive selection on the branches ancestral to Hominidae (Human, Chimp, Gorilla, Orangutan) and Homininae (Human, Chimp, Gorilla), respectively. Gene ontology analysis of these gene sets demonstrated significant enrichments for several functional gene clusters relevant to preeclampsia risk, and sets of placentally-expressed genes that have been linked with preeclampsia and/or trophoblast invasion in other studies. Our study represents a novel approach to the identification of candidate genes and amino acid residues involved in placental pathologies by implicating them in the evolution of highly-invasive placenta. Copyright © 2012 Elsevier Ltd. All rights reserved.
Blevins, Tana; Aliev, Fazil; Adkins, Amy; Hack, Laura; Bigdeli, Tim; D. van der Vaart, Andrew; Web, Bradley Todd; Bacanu, Silviu-Alin; Kalsi, Gursharan; Kendler, Kenneth S.; Miles, Michael F.; Dick, Danielle; Riley, Brien P.; Dumur, Catherine; Vladimirov, Vladimir I.
2015-01-01
Alcohol consumption is known to lead to gene expression changes in the brain. After performing weighted gene co-expression network analyses (WGCNA) on genome-wide mRNA and microRNA (miRNA) expression in Nucleus Accumbens (NAc) of subjects with alcohol dependence (AD; N = 18) and of matched controls (N = 18), six mRNA and three miRNA modules significantly correlated with AD were identified (Bonferoni-adj. p≤ 0.05). Cell-type-specific transcriptome analyses revealed two of the mRNA modules to be enriched for neuronal specific marker genes and downregulated in AD, whereas the remaining four mRNA modules were enriched for astrocyte and microglial specific marker genes and upregulated in AD. Gene set enrichment analysis demonstrated that neuronal specific modules were enriched for genes involved in oxidative phosphorylation, mitochondrial dysfunction and MAPK signaling. Glial-specific modules were predominantly enriched for genes involved in processes related to immune functions, i.e. cytokine signaling (all adj. p≤ 0.05). In mRNA and miRNA modules, 461 and 25 candidate hub genes were identified, respectively. In contrast to the expected biological functions of miRNAs, correlation analyses between mRNA and miRNA hub genes revealed a higher number of positive than negative correlations (χ2 test p≤ 0.0001). Integration of hub gene expression with genome-wide genotypic data resulted in 591 mRNA cis-eQTLs and 62 miRNA cis-eQTLs. mRNA cis-eQTLs were significantly enriched for AD diagnosis and AD symptom counts (adj. p = 0.014 and p = 0.024, respectively) in AD GWAS signals in a large, independent genetic sample from the Collaborative Study on Genetics of Alcohol (COGA). In conclusion, our study identified putative gene network hubs coordinating mRNA and miRNA co-expression changes in the NAc of AD subjects, and our genetic (cis-eQTL) analysis provides novel insights into the etiological mechanisms of AD. PMID:26381263
Ciampi de Andrade, Daniel; Maschietto, Mariana; Galhardoni, Ricardo; Gouveia, Gisele; Chile, Thais; Victorino Krepischi, Ana C; Dale, Camila S; Brunoni, André R; Parravano, Daniella C; Cueva Moscoso, Ana S; Raicher, Irina; Kaziyama, Helena H S; Teixeira, Manoel J; Brentani, Helena P
2017-08-01
To evaluate changes in DNA methylation profiles in patients with fibromyalgia (FM) compared to matched healthy controls (HCs). All individuals underwent full clinical and neurophysiological assessment by cortical excitability (CE) parameters measured by transcranial magnetic stimulation. DNA from the peripheral blood of patients with FM (n = 24) and HC (n = 24) were assessed using the Illumina-HumanMethylation450 BeadChips. We identified 1610 differentially methylated positions (DMPs) in patients with FM displaying a nonrandom distribution in regions of the genome. Sixty-nine percent of DMP in FM were hypomethylated compared to HC. Differentially methylated positions were enriched in 5 genomic regions (1p34; 6p21; 10q26; 17q25; 19q13). The functional characterization of 960 genes related to DMPs revealed an enrichment for MAPK signaling pathway (n = 18 genes), regulation of actin cytoskeleton (n = 15 genes), and focal adhesion (n = 13 genes). A gene-gene interaction network enrichment analysis revealed the participation of DNA repair pathways, mitochondria-related processes, and synaptic signaling. Even though DNA was extracted from peripheral blood, this set of genes was enriched for disorders such as schizophrenia, mood disorders, bulimia, hyperphagia, and obesity. Remarkably, the hierarchical clusterization based on the methylation levels of the 1610 DMPs showed an association with neurophysiological measurements of CE in FM and HC. Fibromyalgia has a hypomethylation DNA pattern, which is enriched in genes implicated in stress response and DNA repair/free radical clearance. These changes occurred parallel to changes in CE parameters. New epigenetic insights into the pathophysiology of FM may provide the basis for the development of biomarkers of this disorder.
Noh, Hyun Ji; Ponting, Chris P; Boulding, Hannah C; Meader, Stephen; Betancur, Catalina; Buxbaum, Joseph D; Pinto, Dalila; Marshall, Christian R; Lionel, Anath C; Scherer, Stephen W; Webber, Caleb
2013-06-01
Autism Spectrum Disorders (ASD) are highly heritable and characterised by impairments in social interaction and communication, and restricted and repetitive behaviours. Considering four sets of de novo copy number variants (CNVs) identified in 181 individuals with autism and exploiting mouse functional genomics and known protein-protein interactions, we identified a large and significantly interconnected interaction network. This network contains 187 genes affected by CNVs drawn from 45% of the patients we considered and 22 genes previously implicated in ASD, of which 192 form a single interconnected cluster. On average, those patients with copy number changed genes from this network possess changes in 3 network genes, suggesting that epistasis mediated through the network is extensive. Correspondingly, genes that are highly connected within the network, and thus whose copy number change is predicted by the network to be more phenotypically consequential, are significantly enriched among patients that possess only a single ASD-associated network copy number changed gene (p = 0.002). Strikingly, deleted or disrupted genes from the network are significantly enriched in GO-annotated positive regulators (2.3-fold enrichment, corrected p = 2×10(-5)), whereas duplicated genes are significantly enriched in GO-annotated negative regulators (2.2-fold enrichment, corrected p = 0.005). The direction of copy change is highly informative in the context of the network, providing the means through which perturbations arising from distinct deletions or duplications can yield a common outcome. These findings reveal an extensive ASD-associated molecular network, whose topology indicates ASD-relevant mutational deleteriousness and that mechanistically details how convergent aetiologies can result extensively from CNVs affecting pathways causally implicated in ASD.
Bien, Stephanie A; Auer, Paul L; Harrison, Tabitha A; Qu, Conghui; Connolly, Charles M; Greenside, Peyton G; Chen, Sai; Berndt, Sonja I; Bézieau, Stéphane; Kang, Hyun M; Huyghe, Jeroen; Brenner, Hermann; Casey, Graham; Chan, Andrew T; Hopper, John L; Banbury, Barbara L; Chang-Claude, Jenny; Chanock, Stephen J; Haile, Robert W; Hoffmeister, Michael; Fuchsberger, Christian; Jenkins, Mark A; Leal, Suzanne M; Lemire, Mathieu; Newcomb, Polly A; Gallinger, Steven; Potter, John D; Schoen, Robert E; Slattery, Martha L; Smith, Joshua D; Le Marchand, Loic; White, Emily; Zanke, Brent W; Abeçasis, Goncalo R; Carlson, Christopher S; Peters, Ulrike; Nickerson, Deborah A; Kundaje, Anshul; Hsu, Li
2017-01-01
The evaluation of less frequent genetic variants and their effect on complex disease pose new challenges for genomic research. To investigate whether epigenetic data can be used to inform aggregate rare-variant association methods (RVAM), we assessed whether variants more significantly associated with colorectal cancer (CRC) were preferentially located in non-coding regulatory regions, and whether enrichment was specific to colorectal tissues. Active regulatory elements (ARE) were mapped using data from 127 tissues and cell-types from NIH Roadmap Epigenomics and Encyclopedia of DNA Elements (ENCODE) projects. We investigated whether CRC association p-values were more significant for common variants inside versus outside AREs, or 2) inside colorectal (CR) AREs versus AREs of other tissues and cell-types. We employed an integrative epigenomic RVAM for variants with allele frequency <1%. Gene sets were defined as ARE variants within 200 kilobases of a transcription start site (TSS) using either CR ARE or ARE from non-digestive tissues. CRC-set association p-values were used to evaluate enrichment of less frequent variant associations in CR ARE versus non-digestive ARE. ARE from 126/127 tissues and cell-types were significantly enriched for stronger CRC-variant associations. Strongest enrichment was observed for digestive tissues and immune cell types. CR-specific ARE were also enriched for stronger CRC-variant associations compared to ARE combined across non-digestive tissues (p-value = 9.6 × 10-4). Additionally, we found enrichment of stronger CRC association p-values for rare variant sets of CR ARE compared to non-digestive ARE (p-value = 0.029). Integrative epigenomic RVAM may enable discovery of less frequent variants associated with CRC, and ARE of digestive and immune tissues are most informative. Although distance-based aggregation of less frequent variants in CR ARE surrounding TSS showed modest enrichment, future association studies would likely benefit from joint analysis of transcriptomes and epigenomes to better link regulatory variation with target genes.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.
Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H
2015-12-01
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J
2009-01-01
Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Biological interpretation of genome-wide association studies using predicted gene functions
Pers, Tune H.; Karjalainen, Juha M.; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R.; Yang, Jian; Lui, Julian C.; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K.; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S.N.; Hirschhorn, Joel N.; Franke, Lude
2015-01-01
The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes. PMID:25597830
Defining the optimal animal model for translational research using gene set enrichment analysis.
Weidner, Christopher; Steinfath, Matthias; Opitz, Elisa; Oelgeschläger, Michael; Schönfelder, Gilbert
2016-08-01
The mouse is the main model organism used to study the functions of human genes because most biological processes in the mouse are highly conserved in humans. Recent reports that compared identical transcriptomic datasets of human inflammatory diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. To reduce susceptibility to biased interpretation, all genes of interest for the biological question under investigation should be considered. Thus, standardized approaches for systematic data analysis are needed. We analyzed the same datasets using gene set enrichment analysis focusing on pathways assigned to inflammatory processes in either humans or mice. The analyses revealed a moderate overlap between all human and mouse datasets, with average positive and negative predictive values of 48 and 57% significant correlations. Subgroups of the septic mouse models (i.e., Staphylococcus aureus injection) correlated very well with most human studies. These findings support the applicability of targeted strategies to identify the optimal animal model and protocol to improve the success of translational research. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.
Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter
2014-02-01
Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.
Jia, Zhilong; Liu, Ying; Guan, Naiyang; Bo, Xiaochen; Luo, Zhigang; Barnes, Michael R
2016-05-27
Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the "black box" nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate this using a psoriatic skin transcriptome, as an exemplar, and recover two widely used Psoriasis drugs (Methotrexate and Ciclosporin) with distinct modes of action. Cogena out-performs the results of Connectivity Map and NFFinder webservers in similar disease transcriptome analyses. Furthermore, we investigated the literature support for the other top-ranked compounds to treat psoriasis and showed how the outputs of cogena analysis can contribute new insight to support the progression of drugs into the clinic. We have made cogena freely available within Bioconductor or https://github.com/zhilongjia/cogena . In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and repositioning, allowing the grouping and prioritisation of drug repositioning candidates on the basis of putative mode of action.
Yi, Jin Wook; Park, Ji Yeon; Sung, Ji-Youn; Kwak, Sang Hyuk; Yu, Jihan; Chang, Ji Hyun; Kim, Jo-Heon; Ha, Sang Yun; Paik, Eun Kyung; Lee, Woo Seung; Kim, Su-Jin; Lee, Kyu Eun; Kim, Ju Han
2015-01-01
Elevated levels of reactive oxygen species (ROS) have been proposed as a risk factor for the development of papillary thyroid carcinoma (PTC) in patients with Hashimoto thyroiditis (HT). However, it has yet to be proven that the total levels of ROS are sufficiently increased to contribute to carcinogenesis. We hypothesized that if the ROS levels were increased in HT, ROS-related genes would also be differently expressed in PTC with HT. To find differentially expressed genes (DEGs) we analyzed data from the Cancer Genomic Atlas, gene expression data from RNA sequencing: 33 from normal thyroid tissue, 232 from PTC without HT, and 60 from PTC with HT. We prepared 402 ROS-related genes from three gene sets by genomic database searching. We also analyzed a public microarray data to validate our results. Thirty-three ROS related genes were up-regulated in PTC with HT, whereas there were only nine genes in PTC without HT (Chi-square p-value < 0.001). Mean log2 fold changes of up-regulated genes was 0.562 in HT group and 0.252 in PTC without HT group (t-test p-value = 0.001). In microarray data analysis, 12 of 32 ROS-related genes showed the same differential expression pattern with statistical significance. In gene ontology analysis, up-regulated ROS-related genes were related with ROS metabolism and apoptosis. Immune function-related and carcinogenesis-related gene sets were enriched only in HT group in Gene Set Enrichment Analysis. Our results suggested that ROS levels may be increased in PTC with HT. Increased levels of ROS may contribute to PTC development in patients with HT.
Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H
2014-04-11
Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.
Wang, Wenyu; Liu, Yang; Hao, Jingcan; Zheng, Shuyu; Wen, Yan; Xiao, Xiao; He, Awen; Fan, Qianrui; Zhang, Feng; Liu, Ruiyu
2016-10-10
Hip cartilage destruction is consistently observed in the non-traumatic osteonecrosis of femoral head (NOFH) and accelerates its bone necrosis. The molecular mechanism underlying the cartilage damage of NOFH remains elusive. In this study, we conducted a systematically comparative study of gene expression profiles between NOFH and osteoarthritis (OA). Hip articular cartilage specimens were collected from 12 NOFH patients and 12 controls with traumatic femoral neck fracture for microarray (n=4) and quantitative real-time PCR validation experiments (n=8). Gene expression profiling of articular cartilage was performed using Agilent Human 4×44K Microarray chip. The accuracy of microarray experiment was further validated by qRT-PCR. Gene expression results of OA hip cartilage were derived from previously published study. Significance Analysis of Microarrays (SAM) software was applied for identifying differently expressed genes. Gene ontology (GO) and pathway enrichment analysis were conducted by Gene Set Enrichment Analysis software and DAVID tool, respectively. Totally, 27 differently expressed genes were identified for NOFH. Comparing the gene expression profiles of NOFH cartilage and OA cartilage detected 8 common differently expressed genes, including COL5A1, OGN, ANGPTL4, CRIP1, NFIL3, METRNL, ID2 and STEAP1. GO comparative analysis identified 10 common significant GO terms, mainly implicated in apoptosis and development process. Pathway comparative analysis observed that ECM-receptor interaction pathway and focal adhesion pathway were enriched in the differently expressed genes of both NOFH and hip OA. In conclusion, we identified a set of differently expressed genes, GO and pathways for NOFH articular destruction, some of which were also involved in the hip OA. Our study results may help to reveal the pathogenetic similarities and differences of cartilage damage of NOFH and hip OA. Copyright © 2016 Elsevier B.V. All rights reserved.
Cho, Jin-Hyung; Huang, Ben S.; Gray, Jesse M.
2016-01-01
The stable formation of remote fear memories is thought to require neuronal gene induction in cortical ensembles that are activated during learning. However, the set of genes expressed specifically in these activated ensembles is not known; knowledge of such transcriptional profiles may offer insights into the molecular program underlying stable memory formation. Here we use RNA-Seq to identify genes whose expression is enriched in activated cortical ensembles labeled during associative fear learning. We first establish that mouse temporal association cortex (TeA) is required for remote recall of auditory fear memories. We then perform RNA-Seq in TeA neurons that are labeled by the activity reporter Arc-dVenus during learning. We identify 944 genes with enriched expression in Arc-dVenus+ neurons. These genes include markers of L2/3, L5b, and L6 excitatory neurons but not glial or inhibitory markers, confirming Arc-dVenus to be an excitatory neuron-specific but non-layer-specific activity reporter. Cross comparisons to other transcriptional profiles show that 125 of the enriched genes are also activity-regulated in vitro or induced by visual stimulus in the visual cortex, suggesting that they may be induced generally in the cortex in an experience-dependent fashion. Prominent among the enriched genes are those encoding potassium channels that down-regulate neuronal activity, suggesting the possibility that part of the molecular program induced by fear conditioning may initiate homeostatic plasticity. PMID:27557751
Loci and pathways associated with uterine capacity for pregnancy and fertility in beef cattle.
Neupane, Mahesh; Geary, Thomas W; Kiser, Jennifer N; Burns, Gregory W; Hansen, Peter J; Spencer, Thomas E; Neibergs, Holly L
2017-01-01
Infertility and subfertility negatively impact the economics and reproductive performance of cattle. Of note, significant pregnancy loss occurs in cattle during the first month of pregnancy, yet little is known about the genetic loci influencing pregnancy success and loss in cattle. To identify quantitative trait loci (QTL) with large effects associated with early pregnancy loss, Angus crossbred heifers were classified based on day 28 pregnancy outcomes to serial embryo transfer. A genome wide association analysis (GWAA) was conducted comparing 30 high fertility heifers with 100% success in establishing pregnancy to 55 subfertile heifers with 25% or less success. A gene set enrichment analysis SNP (GSEA-SNP) was performed to identify gene sets and leading edge genes influencing pregnancy loss. The GWAA identified 22 QTL (p < 1 x 10-5), and GSEA-SNP identified 9 gene sets (normalized enrichment score > 3.0) with 253 leading edge genes. Network analysis identified TNF (tumor necrosis factor), estrogen, and TP53 (tumor protein 53) as the top of 671 upstream regulators (p < 0.001), whereas the SOX2 (SRY [sex determining region Y]-box 2) and OCT4 (octamer-binding transcription factor 4) complex was the top master regulator out of 773 master regulators associated with fertility (p < 0.001). Identification of QTL and genes in pathways that improve early pregnancy success provides critical information for genomic selection to increase fertility in cattle. The identified genes and regulators also provide insight into the complex biological mechanisms underlying pregnancy establishment in cattle.
Alcohol-related Genes Show an Enrichment of Associations with a Persistent Externalizing Factor
Ashenhurst, James R.; Harden, K. Paige; Corbin, William R.; Fromme, Kim
2016-01-01
Research using twins has found that much of the variability in externalizing phenotypes – including alcohol and drug use, impulsive personality traits, risky sex and property crime – is explained by genetic factors. Nevertheless, identification of specific genes and variants associated with these traits has proven to be difficult, likely because individual differences in externalizing are explained by many genes of small individual effect. Moreover, twin research indicates that heritable variance in externalizing behaviors is mostly shared across the externalizing spectrum rather than specific to any behavior. We use a longitudinal, “deep phenotyping” approach to model a general externalizing factor reflecting persistent engagement in a variety of socially problematic behaviors measured at eleven assessment occasions spanning early adulthood (ages 18 to 28). In an ancestrally homogenous sample of non-Hispanic Whites (N = 337), we then tested for enrichment of associations between the persistent externalizing factor and a set of 3,281 polymorphisms within 104 genes that were previously identified as associated with alcohol-use behaviors. Next we tested for enrichment among domain-specific factors (e.g., property crime) composed of residual variance not accounted for by the common factor. Significance was determined relative to bootstrapped empirical thresholds derived from permutations of phenotypic data. Results indicated significant enrichment of genetic associations for persistent externalizing, but not for domain-specific factors. Consistent with twin research findings, these results suggest that genetic variants are broadly associated with externalizing behaviors rather than unique to specific behaviors. General Scientific Summary This study shows that variation in 104 genes is associated with socially problematic “externalizing” behavior, including substance misuse, property crime, risky sex, and aspects of impulsive personality. Importantly, this association was with the common variation across these behaviors rather than with the variation unique to any given behavior. The manuscript demonstrates a potentially advantageous technique for relating sets of hypothesized genes to complex traits or behaviors. PMID:27505405
In silico pathway analysis in cervical carcinoma reveals potential new targets for treatment
van Dam, Peter A.; van Dam, Pieter-Jan H. H.; Rolfo, Christian; Giallombardo, Marco; van Berckelaer, Christophe; Trinh, Xuan Bich; Altintas, Sevilay; Huizing, Manon; Papadimitriou, Kostas; Tjalma, Wiebren A. A.; van Laere, Steven
2016-01-01
An in silico pathway analysis was performed in order to improve current knowledge on the molecular drivers of cervical cancer and detect potential targets for treatment. Three publicly available Affymetrix gene expression data-sets (GSE5787, GSE7803, GSE9750) were retrieved, vouching for a total of 9 cervical cancer cell lines (CCCLs), 39 normal cervical samples, 7 CIN3 samples and 111 cervical cancer samples (CCSs). Predication analysis of microarrays was performed in the Affymetrix sets to identify cervical cancer biomarkers. To select cancer cell-specific genes the CCSs were compared to the CCCLs. Validated genes were submitted to a gene set enrichment analysis (GSEA) and Expression2Kinases (E2K). In the CCSs a total of 1,547 probe sets were identified that were overexpressed (FDR < 0.1). Comparing to CCCLs 560 probe sets (481 unique genes) had a cancer cell-specific expression profile, and 315 of these genes (65%) were validated. GSEA identified 5 cancer hallmarks enriched in CCSs (P < 0.01 and FDR < 0.25) showing that deregulation of the cell cycle is a major component of cervical cancer biology. E2K identified a protein-protein interaction (PPI) network of 162 nodes (including 20 drugable kinases) and 1626 edges. This PPI-network consists of 5 signaling modules associated with MYC signaling (Module 1), cell cycle deregulation (Module 2), TGFβ-signaling (Module 3), MAPK signaling (Module 4) and chromatin modeling (Module 5). Potential targets for treatment which could be identified were CDK1, CDK2, ABL1, ATM, AKT1, MAPK1, MAPK3 among others. The present study identified important driver pathways in cervical carcinogenesis which should be assessed for their potential therapeutic drugability. PMID:26701206
Duan, Qiaonan; Flynn, Corey; Niepel, Mario; Hafner, Marc; Muhlich, Jeremy L; Fernandez, Nicolas F; Rouillard, Andrew D; Tan, Christopher M; Chen, Edward Y; Golub, Todd R; Sorger, Peter K; Subramanian, Aravind; Ma'ayan, Avi
2014-07-01
For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Leonenko, Ganna; Richards, Alexander L; Walters, James T; Pocklington, Andrew; Chambert, Kimberly; Al Eissa, Mariam M; Sharp, Sally I; O'Brien, Niamh L; Curtis, David; Bass, Nicholas J; McQuillin, Andrew; Hultman, Christina; Moran, Jennifer L; McCarroll, Steven A; Sklar, Pamela; Neale, Benjamin M; Holmans, Peter A; Owen, Michael J; Sullivan, Patrick F; O'Donovan, Michael C
2017-10-01
Risk of schizophrenia is conferred by alleles occurring across the full spectrum of frequencies from common SNPs of weak effect through to ultra rare alleles, some of which may be moderately to highly penetrant. Previous studies have suggested that some of the risk of schizophrenia is attributable to uncommon alleles represented on Illumina exome arrays. Here, we present the largest study of exomic variation in schizophrenia to date, using samples from the United Kingdom and Sweden (10,011 schizophrenia cases and 13,791 controls). Single variants, genes, and gene sets were analyzed for association with schizophrenia. No single variant or gene reached genome-wide significance. Among candidate gene sets, we found significant enrichment for rare alleles (minor allele frequency [MAF] < 0.001) in genes intolerant of loss-of-function (LoF) variation and in genes whose messenger RNAs bind to fragile X mental retardation protein (FMRP). We further delineate the genetic architecture of schizophrenia by excluding a role for uncommon exomic variants (0.01 ≤ MAF ≥ 0.001) that confer a relatively large effect (odds ratio [OR] > 4). We also show risk alleles within this frequency range exist, but confer smaller effects and should be identified by larger studies. © 2017 Wiley Periodicals, Inc.
A systematic analysis of genomic changes in Tg2576 mice.
Tan, Lu; Wang, Xiong; Ni, Zhong-Fei; Zhu, Xiuming; Wu, Wei; Zhu, Ling-Qiang; Liu, Dan
2013-06-01
Alzheimer's disease (AD) is an age-related neurodegenerative disorder characterized by intelligence decline, behavioral disorders and cognitive disability. The purpose of this study was to investigate gene expression in AD, based on published microarray data on Tg2576 mice. Hierarchical Cluster Analysis and Gene Ontology were employed to group genes together on the basis of their product characteristics and annotation data. Genes with prominent alterations were clustered into apoptosis and axon guidance pathways. Based on our findings and those of previous studies, we propose that the mitochondria-mediated apoptotic pathway plays a crucial role in the neuronal loss and synaptic dysfunction associated with AD. Furthermore, based on the findings of Positional Gene Enrichment analysis and Gene Set Enrichment analysis, we propose that the regulation of transcription of AD genes may be an important pathogenic factor in this neurodegenerative disease. Our results highlight the importance of genes that could subsequently be examined for their potential as prognostic markers for AD.
Tumor-stroma interactions a trademark for metastasis.
Morales, Monica; Planet, Evarist; Arnal-Estape, Anna; Pavlovic, Milica; Tarragona, Maria; Gomis, Roger R
2011-10-01
We aimed to unravel genes that are significantly associated with metastasis in order to identify functions that support disseminated disease. We identify genes associated with metastasis and verify its clinical correlations using publicly available primary tumor expression profile data sets. We used facilities in R and Bioconductor (GSEA). Specific data structures and functions were imported. Our results show that genes associated with metastasis in primary tumor enriched for pathways associated with immune infiltration or cytokine-cytokine receptor interaction. As an example, we focus on the enrichment of TGFBR2 and TGF|X A set of communication tools capital for tumor-stroma interactions that define metastasis to the lung and support bone colonization. We showed that tumor-stroma communication through cytokine-cytokine receptor interaction pathway is selected in primary tumors with high risk of relapse. High levels of these factors support systemic instigation of the far metastatic nest as well as local metastatic-specific functions that provide solid ground for metastatic development. Copyright © 2011 Elsevier Ltd. All rights reserved.
Arkas: Rapid reproducible RNAseq analysis
Colombo, Anthony R.; J. Triche Jr, Timothy; Ramsingh, Giridharan
2017-01-01
The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments. We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina’s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways . Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing. Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import. Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps. PMID:28868134
Kim, Kyungmun; Kim, Ju Hyeon; Kim, Young Ho; Hong, Seong-Eui; Lee, Si Hyeock
2018-01-01
Perturbation of normal behaviors in honey bee colonies by any external factor can immediately reduce the colony's capacity for brood rearing, which can eventually lead to colony collapse. To investigate the effects of brood-rearing suppression on the biology of honey bee workers, gene-set enrichment analysis of the transcriptomes of worker bees with or without suppressed brood rearing was performed. When brood rearing was suppressed, pathways associated with both protein degradation and synthesis were simultaneously over-represented in both nurses and foragers, and their overall pathway representation profiles resembled those of normal foragers and nurses, respectively. Thus, obstruction of normal labor induced over-representation in pathways related with reshaping of worker bee physiology, suggesting that transition of labor is physiologically reversible. In addition, some genes associated with the regulation of neuronal excitability, cellular and nutritional stress and aggressiveness were over-expressed under brood rearing suppression perhaps to manage in-hive stress under unfavorable conditions. Copyright © 2017 Elsevier Inc. All rights reserved.
An integrated map of structural variation in 2,504 human genomes.
Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Fritz, Markus Hsi-Yang; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Casale, Francesco Paolo; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Mu, Xinmeng Jasmine; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki; Muzny, Donna M; Nelson, Bradley J; Noor, Amina; Parrish, Nicholas F; Pendleton, Matthew; Quitadamo, Andrew; Raeder, Benjamin; Schadt, Eric E; Romanovitch, Mallory; Schlattl, Andreas; Sebra, Robert; Shabalin, Andrey A; Untergasser, Andreas; Walker, Jerilyn A; Wang, Min; Yu, Fuli; Zhang, Chengsheng; Zhang, Jing; Zheng-Bradley, Xiangqun; Zhou, Wanding; Zichner, Thomas; Sebat, Jonathan; Batzer, Mark A; McCarroll, Steven A; Mills, Ryan E; Gerstein, Mark B; Bashir, Ali; Stegle, Oliver; Devine, Scott E; Lee, Charles; Eichler, Evan E; Korbel, Jan O
2015-10-01
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Muziasari, Windi I; Pärnänen, Katariina; Johnson, Timothy A; Lyra, Christina; Karkman, Antti; Stedtfeld, Robert D; Tamminen, Manu; Tiedje, James M; Virta, Marko
2016-04-01
Antibiotics are commonly used in aquaculture and they can change the environmental resistome by increasing antibiotic resistance genes (ARGs). Sediment samples were collected from two fish farms located in the Northern Baltic Sea, Finland, and from a site outside the farms (control). The sediment resistome was assessed by using a highly parallel qPCR array containing 295 primer sets to detect ARGs, mobile genetic elements and the 16S rRNA gene. The fish farm resistomes were enriched in transposon and integron associated genes and in ARGs encoding resistance to antibiotics which had been used to treat fish at the farms. Aminoglycoside resistance genes were also enriched in the farm sediments despite the farms not having used aminoglycosides. In contrast, the total relative abundance values of ARGs were higher in the control sediment resistome and they were mainly genes encoding efflux pumps followed by beta-lactam resistance genes, which are found intrinsically in many bacteria. This suggests that there is a natural Baltic sediment resistome. The resistome associated with fish farms can be from native ARGs enriched by antibiotic use at the farms and/or from ARGs and mobile elements that have been introduced by fish farming. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Grote, Steffi; Prüfer, Kay; Kelso, Janet; Dannemann, Michael
2016-10-15
We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Osato, Naoki
2018-01-19
Transcriptional target genes show functional enrichment of genes. However, how many and how significantly transcriptional target genes include functional enrichments are still unclear. To address these issues, I predicted human transcriptional target genes using open chromatin regions, ChIP-seq data and DNA binding sequences of transcription factors in databases, and examined functional enrichment and gene expression level of putative transcriptional target genes. Gene Ontology annotations showed four times larger numbers of functional enrichments in putative transcriptional target genes than gene expression information alone, independent of transcriptional target genes. To compare the number of functional enrichments of putative transcriptional target genes between cells or search conditions, I normalized the number of functional enrichment by calculating its ratios in the total number of transcriptional target genes. With this analysis, native putative transcriptional target genes showed the largest normalized number of functional enrichments, compared with target genes including 5-60% of randomly selected genes. The normalized number of functional enrichments was changed according to the criteria of enhancer-promoter interactions such as distance from transcriptional start sites and orientation of CTCF-binding sites. Forward-reverse orientation of CTCF-binding sites showed significantly higher normalized number of functional enrichments than the other orientations. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types. The median expression level of transcriptional target genes changed according to the criteria of enhancer-promoter assignments (i.e. interactions) and was correlated with the changes of the normalized number of functional enrichments of transcriptional target genes. Human putative transcriptional target genes showed significant functional enrichments. Functional enrichments were related to the cellular functions. The normalized number of functional enrichments of human putative transcriptional target genes changed according to the criteria of enhancer-promoter assignments and correlated with the median expression level of the target genes. These analyses and characters of human putative transcriptional target genes would be useful to examine the criteria of enhancer-promoter assignments and to predict the novel mechanisms and factors such as DNA binding proteins and DNA sequences of enhancer-promoter interactions.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Chang, Lun-Ching; Jamain, Stephane; Lin, Chien-Wei; Rujescu, Dan; Tseng, George C; Sibille, Etienne
2014-01-01
Large scale gene expression (transcriptome) analysis and genome-wide association studies (GWAS) for single nucleotide polymorphisms have generated a considerable amount of gene- and disease-related information, but heterogeneity and various sources of noise have limited the discovery of disease mechanisms. As systematic dataset integration is becoming essential, we developed methods and performed meta-clustering of gene coexpression links in 11 transcriptome studies from postmortem brains of human subjects with major depressive disorder (MDD) and non-psychiatric control subjects. We next sought enrichment in the top 50 meta-analyzed coexpression modules for genes otherwise identified by GWAS for various sets of disorders. One coexpression module of 88 genes was consistently and significantly associated with GWAS for MDD, other neuropsychiatric disorders and brain functions, and for medical illnesses with elevated clinical risk of depression, but not for other diseases. In support of the superior discriminative power of this novel approach, we observed no significant enrichment for GWAS-related genes in coexpression modules extracted from single studies or in meta-modules using gene expression data from non-psychiatric control subjects. Genes in the identified module encode proteins implicated in neuronal signaling and structure, including glutamate metabotropic receptors (GRM1, GRM7), GABA receptors (GABRA2, GABRA4), and neurotrophic and development-related proteins [BDNF, reelin (RELN), Ephrin receptors (EPHA3, EPHA5)]. These results are consistent with the current understanding of molecular mechanisms of MDD and provide a set of putative interacting molecular partners, potentially reflecting components of a functional module across cells and biological pathways that are synchronously recruited in MDD, other brain disorders and MDD-related illnesses. Collectively, this study demonstrates the importance of integrating transcriptome data, gene coexpression modules and GWAS results for providing novel and complementary approaches to investigate the molecular pathology of MDD and other complex brain disorders.
Chen, Rui; Davis, Lea K; Guter, Stephen; Wei, Qiang; Jacob, Suma; Potter, Melissa H; Cox, Nancy J; Cook, Edwin H; Sutcliffe, James S; Li, Bingshan
2017-01-01
Autism spectrum disorder (ASD) is one of the most highly heritable neuropsychiatric disorders, but underlying molecular mechanisms are still unresolved due to extreme locus heterogeneity. Leveraging meaningful endophenotypes or biomarkers may be an effective strategy to reduce heterogeneity to identify novel ASD genes. Numerous lines of evidence suggest a link between hyperserotonemia, i.e., elevated serotonin (5-hydroxytryptamine or 5-HT) in whole blood, and ASD. However, the genetic determinants of blood 5-HT level and their relationship to ASD are largely unknown. In this study, pursuing the hypothesis that de novo variants (DNVs) and rare risk alleles acting in a recessive mode may play an important role in predisposition of hyperserotonemia in people with ASD, we carried out whole exome sequencing (WES) in 116 ASD parent-proband trios with most (107) probands having 5-HT measurements. Combined with published ASD DNVs, we identified USP15 as having recurrent de novo loss of function mutations and discovered evidence supporting two other known genes with recurrent DNVs ( FOXP1 and KDM5B ). Genes harboring functional DNVs significantly overlap with functional/disease gene sets known to be involved in ASD etiology, including FMRP targets and synaptic formation and transcriptional regulation genes. We grouped the probands into High-5HT and Normal-5HT groups based on normalized serotonin levels, and used network-based gene set enrichment analysis (NGSEA) to identify novel hyperserotonemia-related ASD genes based on LoF and missense DNVs. We found enrichment in the High-5HT group for a gene network module (DAWN-1) previously implicated in ASD, and this points to the TGF-β pathway and cell junction processes. Through analysis of rare recessively acting variants (RAVs), we also found that rare compound heterozygotes (CHs) in the High-5HT group were enriched for loci in an ASD-associated gene set. Finally, we carried out rare variant group-wise transmission disequilibrium tests (gTDT) and observed significant association of rare variants in genes encoding a subset of the serotonin pathway with ASD. Our study identified USP15 as a novel gene implicated in ASD based on recurrent DNVs. It also demonstrates the potential value of 5-HT as an effective endophenotype for gene discovery in ASD, and the effectiveness of this strategy needs to be further explored in studies of larger sample sizes.
Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.
Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin
2011-04-14
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes
Kurokawa, Ken; Itoh, Takehiko; Kuwahara, Tomomi; Oshima, Kenshiro; Toh, Hidehiro; Toyoda, Atsushi; Takami, Hideto; Morita, Hidetoshi; Sharma, Vineet K.; Srivastava, Tulika P.; Taylor, Todd D.; Noguchi, Hideki; Mori, Hiroshi; Ogura, Yoshitoshi; Ehrlich, Dusko S.; Itoh, Kikuji; Takagi, Toshihisa; Sakaki, Yoshiyuki; Hayashi, Tetsuya; Hattori, Masahira
2007-01-01
Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal microbiomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes. PMID:17916580
BRAIN NETWORKS. Correlated gene expression supports synchronous activity in brain networks.
Richiardi, Jonas; Altmann, Andre; Milazzo, Anna-Clare; Chang, Catie; Chakravarty, M Mallar; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Bromberg, Uli; Büchel, Christian; Conrod, Patricia; Fauth-Bühler, Mira; Flor, Herta; Frouin, Vincent; Gallinat, Jürgen; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Lemaître, Hervé; Mann, Karl F; Martinot, Jean-Luc; Nees, Frauke; Paus, Tomáš; Pausova, Zdenka; Rietschel, Marcella; Robbins, Trevor W; Smolka, Michael N; Spanagel, Rainer; Ströhle, Andreas; Schumann, Gunter; Hawrylycz, Mike; Poline, Jean-Baptiste; Greicius, Michael D
2015-06-12
During rest, brain activity is synchronized between different regions widely distributed throughout the brain, forming functional networks. However, the molecular mechanisms supporting functional connectivity remain undefined. We show that functional brain networks defined with resting-state functional magnetic resonance imaging can be recapitulated by using measures of correlated gene expression in a post mortem brain tissue data set. The set of 136 genes we identify is significantly enriched for ion channels. Polymorphisms in this set of genes significantly affect resting-state functional connectivity in a large sample of healthy adolescents. Expression levels of these genes are also significantly associated with axonal connectivity in the mouse. The results provide convergent, multimodal evidence that resting-state functional networks correlate with the orchestrated activity of dozens of genes linked to ion channel activity and synaptic function. Copyright © 2015, American Association for the Advancement of Science.
BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.
Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph
2015-02-21
Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.
Subcutaneous and gonadal adipose tissue transcriptome differences in lean and obese female dogs.
Grant, Ryan W; Vester Boler, Brittany M; Ridge, Tonya K; Graves, Thomas K; Swanson, Kelly S
2013-12-01
Canine obesity leads to shortened life span and increased disease incidence. Adipose tissue depots are known to have unique metabolic and gene expression profiles in rodents and humans, but few comparisons of depot gene expression have been performed in the dog. Using microarray technology, our objective was to identify differentially expressed genes and enriched functional pathways between subcutaneous and gonadal adipose of lean and obese dogs to better understand the pathogenesis of obesity in the dog. Because no depot × body weight status interactions were identified in the microarray data, depot differences were the primary focus. A total of 946 and 703 transcripts were differentially expressed (FDR P < 0.05) between gonadal and subcutaneous adipose tissue in obese and lean dogs respectively. Of the adipose depot-specific differences in gene expression, 162 were present in both lean and obese dogs, with the majority (85%) expressed in the same direction. Both lean and obese dog gene lists had enrichment of the complement and coagulation cascade and systemic lupus erythematosus pathways. Obese dogs had enrichment of lysosome, extracellular matrix-receptor interaction, renin-angiotensin system and hematopoietic cell lineage pathways. Lean dogs had enrichment of glutathione metabolism and synthesis and degradation of ketone bodies. We have identified a core set of genes differentially expressed between subcutaneous and gonadal adipose tissue in dogs regardless of body weight. These genes contribute to depot-specific differences in immune function, extracellular matrix remodeling and lysosomal function and may contribute to the physiological differences noted between depots. © 2013 The Authors, Animal Genetics © 2013 Stichting International Foundation for Animal Genetics.
Yang, Yujia; Wang, Xiaozhu; Liu, Yang; Fu, Qiang; Tian, Changxu; Wu, Chenglong; Shi, Huitong; Yuan, Zihao; Tan, Suxu; Liu, Shikai; Gao, Dongya; Dunham, Rex; Liu, Zhanjiang
2018-04-30
In aquatic organisms, hearing is an important sense for acoustic communications and detection of sound-emitting predators and prey. Channel catfish is a dominant aquaculture species in the United States. As channel catfish can hear sounds of relatively high frequency, it serves as a good model for study auditory mechanisms. In catfishes, Weberian ossicles connect the swimbladder to the inner ear to transfer the forced vibrations and improve hearing ability. In this study, we examined the transcriptional profiles of channel catfish swimbladder and other four tissues (gill, liver, skin, and intestine). We identified a total of 1777 genes that exhibited preferential expression pattern in swimbladder of channel catfish. Based on Gene Ontology enrichment analysis, many of swimbladder-enriched genes were categorized into sensory perception of sound, auditory behavior, response to auditory stimulus, or detection of mechanical stimulus involved in sensory perception of sound, such as coch, kcnq4, sptbn1, sptbn4, dnm1, ush2a, and col11a1. Six signaling pathways associated with hearing (Glutamatergic synapse, GABAergic synapse pathways, Axon guidance, cAMP signaling pathway, Ionotropic glutamate receptor pathway, and Metabotropic glutamate receptor group III pathway) were over-represented in KEGG and PANTHER databases. Protein interaction prediction revealed an interactive relationship among the swimbladder-enriched genes and genes involved in sensory perception of sound. This study identified a set of genes and signaling pathways associated with auditory system in the swimbladder of channel catfish and provide resources for further study on the biological and physiological roles in catfish swimbladder. Copyright © 2018 Elsevier Inc. All rights reserved.
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.
Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas
2017-01-21
We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K
2015-01-01
Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ochsner, Scott A.; Tsimelzon, Anna; Dong, Jianrong; Coarfa, Cristian
2016-01-01
The pregnane X receptor (PXR) (PXR/NR1I3) and constitutive androstane receptor (CAR) (CAR/NR1I2) members of the nuclear receptor (NR) superfamily of ligand-regulated transcription factors are well-characterized mediators of xenobiotic and endocrine-disrupting chemical signaling. The Nuclear Receptor Signaling Atlas maintains a growing library of transcriptomic datasets involving perturbations of NR signaling pathways, many of which involve perturbations relevant to PXR and CAR xenobiotic signaling. Here, we generated a reference transcriptome based on the frequency of differential expression of genes across 159 experiments compiled from 22 datasets involving perturbations of CAR and PXR signaling pathways. In addition to the anticipated overrepresentation in the reference transcriptome of genes encoding components of the xenobiotic stress response, the ranking of genes involved in carbohydrate metabolism and gonadotropin action sheds mechanistic light on the suspected role of xenobiotics in metabolic syndrome and reproductive disorders. Gene Set Enrichment Analysis showed that although acetaminophen, chlorpromazine, and phenobarbital impacted many similar gene sets, differences in direction of regulation were evident in a variety of processes. Strikingly, gene sets representing genes linked to Parkinson's, Huntington's, and Alzheimer's diseases were enriched in all 3 transcriptomes. The reference xenobiotic transcriptome will be supplemented with additional future datasets to provide the community with a continually updated reference transcriptomic dataset for CAR- and PXR-mediated xenobiotic signaling. Our study demonstrates how aggregating and annotating transcriptomic datasets, and making them available for routine data mining, facilitates research into the mechanisms by which xenobiotics and endocrine-disrupting chemicals subvert conventional NR signaling modalities. PMID:27409825
Hackett, Justin B; Lu, Yan
2017-05-04
In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed 2 plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the β subunit of cytochrome b 559 , an essential component of PSII. The accD gene encodes the β subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly upregulated in both orrm6 mutants. The upregulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.
Loci and pathways associated with uterine capacity for pregnancy and fertility in beef cattle
Geary, Thomas W.; Kiser, Jennifer N.; Burns, Gregory W.; Hansen, Peter J.; Spencer, Thomas E.; Neibergs, Holly L.
2017-01-01
Infertility and subfertility negatively impact the economics and reproductive performance of cattle. Of note, significant pregnancy loss occurs in cattle during the first month of pregnancy, yet little is known about the genetic loci influencing pregnancy success and loss in cattle. To identify quantitative trait loci (QTL) with large effects associated with early pregnancy loss, Angus crossbred heifers were classified based on day 28 pregnancy outcomes to serial embryo transfer. A genome wide association analysis (GWAA) was conducted comparing 30 high fertility heifers with 100% success in establishing pregnancy to 55 subfertile heifers with 25% or less success. A gene set enrichment analysis SNP (GSEA-SNP) was performed to identify gene sets and leading edge genes influencing pregnancy loss. The GWAA identified 22 QTL (p < 1 x 10−5), and GSEA-SNP identified 9 gene sets (normalized enrichment score > 3.0) with 253 leading edge genes. Network analysis identified TNF (tumor necrosis factor), estrogen, and TP53 (tumor protein 53) as the top of 671 upstream regulators (p < 0.001), whereas the SOX2 (SRY [sex determining region Y]-box 2) and OCT4 (octamer-binding transcription factor 4) complex was the top master regulator out of 773 master regulators associated with fertility (p < 0.001). Identification of QTL and genes in pathways that improve early pregnancy success provides critical information for genomic selection to increase fertility in cattle. The identified genes and regulators also provide insight into the complex biological mechanisms underlying pregnancy establishment in cattle. PMID:29228019
Ochsner, Scott A; Tsimelzon, Anna; Dong, Jianrong; Coarfa, Cristian; McKenna, Neil J
2016-08-01
The pregnane X receptor (PXR) (PXR/NR1I3) and constitutive androstane receptor (CAR) (CAR/NR1I2) members of the nuclear receptor (NR) superfamily of ligand-regulated transcription factors are well-characterized mediators of xenobiotic and endocrine-disrupting chemical signaling. The Nuclear Receptor Signaling Atlas maintains a growing library of transcriptomic datasets involving perturbations of NR signaling pathways, many of which involve perturbations relevant to PXR and CAR xenobiotic signaling. Here, we generated a reference transcriptome based on the frequency of differential expression of genes across 159 experiments compiled from 22 datasets involving perturbations of CAR and PXR signaling pathways. In addition to the anticipated overrepresentation in the reference transcriptome of genes encoding components of the xenobiotic stress response, the ranking of genes involved in carbohydrate metabolism and gonadotropin action sheds mechanistic light on the suspected role of xenobiotics in metabolic syndrome and reproductive disorders. Gene Set Enrichment Analysis showed that although acetaminophen, chlorpromazine, and phenobarbital impacted many similar gene sets, differences in direction of regulation were evident in a variety of processes. Strikingly, gene sets representing genes linked to Parkinson's, Huntington's, and Alzheimer's diseases were enriched in all 3 transcriptomes. The reference xenobiotic transcriptome will be supplemented with additional future datasets to provide the community with a continually updated reference transcriptomic dataset for CAR- and PXR-mediated xenobiotic signaling. Our study demonstrates how aggregating and annotating transcriptomic datasets, and making them available for routine data mining, facilitates research into the mechanisms by which xenobiotics and endocrine-disrupting chemicals subvert conventional NR signaling modalities.
2010-01-01
Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped redundant clones together and illustrated that the SSHscreen plots are a useful tool for choosing anonymous clones for sequencing, since redundant clones cluster together on the enrichment ratio plots. Conclusions We developed the SSHscreen-SSHdb software pipeline, which greatly facilitates gene discovery using suppression subtractive hybridization by improving the selection of clones for sequencing after screening the library on a small number of microarrays. Annotation of the sequence information and collaboration was further enhanced through a web-based SSHdb database, and we illustrated this through identification of drought responsive genes from cowpea, which can now be investigated in gene function studies. SSH is a popular and powerful gene discovery tool, and therefore this pipeline will have application for gene discovery in any biological system, particularly non-model organisms. SSHscreen 2.0.1 and a link to SSHdb are available from http://microarray.up.ac.za/SSHscreen. PMID:20359330
Woo, Sangsoon; Gao, Hong; Henderson, David; Zacharias, Wolfgang; Liu, Gang; Tran, Quynh T; Prasad, G L
2017-05-03
Smoking has been established as a major risk factor for developing oral squamous cell carcinoma (OSCC), but less attention has been paid to the effects of smokeless tobacco products. Our objective is to identify potential biomarkers to distinguish the biological effects of combustible tobacco products from those of non-combustible ones using oral cell lines. Normal human gingival epithelial cells (HGEC), non-metastatic (101A) and metastatic (101B) OSCC cell lines were exposed to different tobacco product preparations (TPPs) including cigarette smoke total particulate matter (TPM), whole-smoke conditioned media (WS-CM), smokeless tobacco extract in complete artificial saliva (STE), or nicotine (NIC) alone. We performed microarray-based gene expression profiling and found 3456 probe sets from 101A, 1432 probe sets from 101B, and 2717 probe sets from HGEC to be differentially expressed. Gene Set Enrichment Analysis (GSEA) revealed xenobiotic metabolism and steroid biosynthesis were the top two pathways that were upregulated by combustible but not by non-combustible TPPs. Notably, aldo-keto reductase genes, AKR1C1 and AKR1C2 , were the core genes in the top enriched pathways and were statistically upregulated more than eight-fold by combustible TPPs. Quantitative real time polymerase chain reaction (qRT-PCR) results statistically support AKR1C1 as a potential biomarker for differentiating the biological effects of combustible from non-combustible tobacco products.
Woo, Sangsoon; Gao, Hong; Henderson, David; Zacharias, Wolfgang; Liu, Gang; Tran, Quynh T.; Prasad, G.L.
2017-01-01
Smoking has been established as a major risk factor for developing oral squamous cell carcinoma (OSCC), but less attention has been paid to the effects of smokeless tobacco products. Our objective is to identify potential biomarkers to distinguish the biological effects of combustible tobacco products from those of non-combustible ones using oral cell lines. Normal human gingival epithelial cells (HGEC), non-metastatic (101A) and metastatic (101B) OSCC cell lines were exposed to different tobacco product preparations (TPPs) including cigarette smoke total particulate matter (TPM), whole-smoke conditioned media (WS-CM), smokeless tobacco extract in complete artificial saliva (STE), or nicotine (NIC) alone. We performed microarray-based gene expression profiling and found 3456 probe sets from 101A, 1432 probe sets from 101B, and 2717 probe sets from HGEC to be differentially expressed. Gene Set Enrichment Analysis (GSEA) revealed xenobiotic metabolism and steroid biosynthesis were the top two pathways that were upregulated by combustible but not by non-combustible TPPs. Notably, aldo-keto reductase genes, AKR1C1 and AKR1C2, were the core genes in the top enriched pathways and were statistically upregulated more than eight-fold by combustible TPPs. Quantitative real time polymerase chain reaction (qRT-PCR) results statistically support AKR1C1 as a potential biomarker for differentiating the biological effects of combustible from non-combustible tobacco products. PMID:28467356
Tian, Honglai; Guan, Donghui; Li, Jianmin
2018-06-01
Osteosarcoma (OS), the most common malignant bone tumor, accounts for the heavy healthy threat in the period of children and adolescents. OS occurrence usually correlates with early metastasis and high death rate. This study aimed to better understand the mechanism of OS metastasis.Based on Gene Expression Omnibus (GEO) database, we downloaded 4 expression profile data sets associated with OS metastasis, and selected differential expressed genes. Weighted gene co-expression network analysis (WGCNA) approach allowed us to investigate the most OS metastasis-correlated module. Gene Ontology functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to give annotation of selected OS metastasis-associated genes.We select 897 differential expressed genes from OS metastasis and OS non-metastasis groups. Based on these selected genes, WGCNA further explored 142 genes included in the most OS metastasis-correlated module. Gene Ontology functional and KEGG pathway enrichment analyses showed that significantly OS metastasis-associated genes were involved in pathway correlated with insulin-like growth factor binding.Our research figured out several potential molecules participating in metastasis process and factors acting as biomarker. With this study, we could better explore the mechanism of OS metastasis and further discover more therapy targets.
Beinke, C; Port, M; Ullmann, R; Gilbertz, K; Majewski, M; Abend, M
2018-06-01
Dicentric chromosome analysis (DCA) is the gold standard for individual radiation dose assessment. However, DCA is limited by the time-consuming phytohemagglutinin (PHA)-mediated lymphocyte activation. In this study using human peripheral blood lymphocytes, we investigated PHA-associated whole genome gene expression changes to elucidate this process and sought to identify suitable gene targets as a means of meeting our long-term objective of accelerating cell cycle kinetics to reduce DCA culture time. Human peripheral whole blood from three healthy donors was separately cultured in RPMI/FCS/antibiotics with BrdU and PHA-M. Diluted whole blood samples were transferred into PAXgene tubes at 0, 12, 24 and 36 h culture time. RNA was isolated and aliquots were used for whole genome gene expression screening. Microarray results were validated using qRT-PCR and differentially expressed genes [significantly (FDR corrected) twofold different from the 0 h value reference] were analyzed using several bioinformatic tools. The cell cycle positions and DNA-synthetic activities of lymphocytes were determined by analyzing the correlated total DNA content and incorporated BrdU level with flow cytometry after continued BrdU incubation. From 42,545 transcripts of the whole genome microarray 47.6%, on average, appeared expressed. The number of differentially expressed genes increased linearly from 855 to 2,858 and 4,607 at 12, 24 and 36 h after PHA addition, respectively. Approximately 2-3 times more up- than downregulated genes were observed with several hundred genes differentially expressed at each time point. Earliest enrichment was observed for gene sets related to the nucleus (12 h) followed by genes assigned to intracellular structures such as organelles (24 h) and finally genes related to the membrane and the extracellular matrix were enriched (36 h). Early gene expression changes at 12 h, in particular, were associated with protein classes such as chemokines/cytokines (e.g., CXCL1, CXCL2) and chaperones. Genes coding for biological processes involved in cell cycle control (e.g., MYBL2, RBL1, CCNA, CCNE) and DNA replication (e.g., POLA, POLE, MCM) appeared enriched at 24 h and later, but many more biological processes (42 altogether) showed enrichment as well. Flow cytometry data fit together with gene expression and bioinformatic analyses as cell cycle transition into S phase was observed with interindividual differences from 12 h onward, whereas progression into G 2 as well as into the second G 1 occurred from 36 h onward after activation. Gene set enrichment analysis over time identifies, in particular, two molecular categories of PHA-responsive gene targets (cytokine and cell cycle control genes). Based on that analysis target genes for cell cycle acceleration in lymphocytes have been identified ( CDKN1A/B/C, RBL-1/RBL-2, E2F2, Deaf-1), and it remains undetermined whether the time expenditure for DCA can be reduced by influencing gene expression involved in the regulatory circuits controlling PHA-associated cell cycle entry and/or progression at a specific early cell cycle phase.
We have previously developed a statistical method to identify gene sets enriched with condition-specific genetic dependencies. The method constructs gene dependency networks from bootstrapped samples in one condition and computes the divergence between distributions of network likelihood scores from different conditions. It was shown to be capable of sensitive and specific identification of pathways with phenotype-specific dysregulation, i.e., rewiring of dependencies between genes in different conditions.
Identification of candidate genes in osteoporosis by integrated microarray analysis.
Li, J J; Wang, B Q; Fei, Q; Yang, Y; Li, D
2016-12-01
In order to screen the altered gene expression profile in peripheral blood mononuclear cells of patients with osteoporosis, we performed an integrated analysis of the online microarray studies of osteoporosis. We searched the Gene Expression Omnibus (GEO) database for microarray studies of peripheral blood mononuclear cells in patients with osteoporosis. Subsequently, we integrated gene expression data sets from multiple microarray studies to obtain differentially expressed genes (DEGs) between patients with osteoporosis and normal controls. Gene function analysis was performed to uncover the functions of identified DEGs. A total of three microarray studies were selected for integrated analysis. In all, 1125 genes were found to be significantly differentially expressed between osteoporosis patients and normal controls, with 373 upregulated and 752 downregulated genes. Positive regulation of the cellular amino metabolic process (gene ontology (GO): 0033240, false discovery rate (FDR) = 1.00E + 00) was significantly enriched under the GO category for biological processes, while for molecular functions, flavin adenine dinucleotide binding (GO: 0050660, FDR = 3.66E-01) and androgen receptor binding (GO: 0050681, FDR = 6.35E-01) were significantly enriched. DEGs were enriched in many osteoporosis-related signalling pathways, including those of mitogen-activated protein kinase (MAPK) and calcium. Protein-protein interaction (PPI) network analysis showed that the significant hub proteins contained ubiquitin specific peptidase 9, X-linked (Degree = 99), ubiquitin specific peptidase 19 (Degree = 57) and ubiquitin conjugating enzyme E2 B (Degree = 57). Analysis of gene function of identified differentially expressed genes may expand our understanding of fundamental mechanisms leading to osteoporosis. Moreover, significantly enriched pathways, such as MAPK and calcium, may involve in osteoporosis through osteoblastic differentiation and bone formation.Cite this article: J. J. Li, B. Q. Wang, Q. Fei, Y. Yang, D. Li. Identification of candidate genes in osteoporosis by integrated microarray analysis. Bone Joint Res 2016;5:594-601. DOI: 10.1302/2046-3758.512.BJR-2016-0073.R1. © 2016 Fei et al.
Lavallée-Adam, Mathieu
2017-01-01
PSEA-Quant analyzes quantitative mass spectrometry-based proteomics datasets to identify enrichments of annotations contained in repositories such as the Gene Ontology and Molecular Signature databases. It allows users to identify the annotations that are significantly enriched for reproducibly quantified high abundance proteins. PSEA-Quant is available on the web and as a command-line tool. It is compatible with all label-free and isotopic labeling-based quantitative proteomics methods. This protocol describes how to use PSEA-Quant and interpret its output. The importance of each parameter as well as troubleshooting approaches are also discussed. PMID:27010334
Uddin, Raihan; Singh, Shiva M.
2017-01-01
As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in “learning and memory” related functions and pathways. Subsequent differential network analysis of this “learning and memory” module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they provide a new insight and generate new hypotheses into the molecular mechanisms responsible for age associated learning impairment, including spatial learning. PMID:29066959
Uddin, Raihan; Singh, Shiva M
2017-01-01
As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in "learning and memory" related functions and pathways. Subsequent differential network analysis of this "learning and memory" module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they provide a new insight and generate new hypotheses into the molecular mechanisms responsible for age associated learning impairment, including spatial learning.
PathwaySplice: An R package for unbiased pathway analysis of alternative splicing in RNA-Seq data.
Yan, Aimin; Ban, Yuguang; Gao, Zhen; Chen, Xi; Wang, Lily
2018-04-24
Pathway analysis of alternative splicing would be biased without accounting for the different number of exons or junctions associated with each gene, because genes with higher number of exons or junctions are more likely to be included in the "significant" gene list in alternative splicing. We present PathwaySplice, an R package that (1) Performs pathway analysis that explicitly adjusts for the number of exons or junctions associated with each gene; (2) Visualizes selection bias due to different number of exons or junctions for each gene and formally tests for presence of bias using logistic regression; (3) Supports gene sets based on the Gene Ontology terms, as well as more broadly defined gene sets (e.g. MSigDB) or user defined gene sets; (4) Identifies the significant genes driving pathway significance and (5) Organizes significant pathways with an enrichment map, where pathways with large number of overlapping genes are grouped together in a network graph. https://bioconductor.org/packages/release/bioc/html/PathwaySplice.html. lily.wangg@gmail.com, xi.steven.chen@gmail.com.
Protein and Genetic Composition of Four Chromatin Types in Drosophila melanogaster Cell Lines.
Boldyreva, Lidiya V; Goncharov, Fyodor P; Demakova, Olga V; Zykova, Tatyana Yu; Levitsky, Victor G; Kolesnikov, Nikolay N; Pindyurin, Alexey V; Semeshin, Valeriy F; Zhimulev, Igor F
2017-04-01
Recently, we analyzed genome-wide protein binding data for the Drosophila cell lines S2, Kc, BG3 and Cl.8 (modENCODE Consortium) and identified a set of 12 proteins enriched in the regions corresponding to interbands of salivary gland polytene chromosomes. Using these data, we developed a bioinformatic pipeline that partitioned the Drosophila genome into four chromatin types that we hereby refer to as aquamarine, lazurite, malachite and ruby. Here, we describe the properties of these chromatin types across different cell lines. We show that aquamarine chromatin tends to harbor transcription start sites (TSSs) and 5' untranslated regions (5'UTRs) of the genes, is enriched in diverse "open" chromatin proteins, histone modifications, nucleosome remodeling complexes and transcription factors. It encompasses most of the tRNA genes and shows enrichment for non-coding RNAs and miRNA genes. Lazurite chromatin typically encompasses gene bodies. It is rich in proteins involved in transcription elongation. Frequency of both point mutations and natural deletion breakpoints is elevated within lazurite chromatin. Malachite chromatin shows higher frequency of insertions of natural transposons. Finally, ruby chromatin is enriched for proteins and histone modifications typical for the "closed" chromatin. Ruby chromatin has a relatively low frequency of point mutations and is essentially devoid of miRNA and tRNA genes. Aquamarine and ruby chromatin types are highly stable across cell lines and have contrasting properties. Lazurite and malachite chromatin types also display characteristic protein composition, as well as enrichment for specific genomic features. We found that two types of chromatin, aquamarine and ruby, retain their complementary protein patterns in four Drosophila cell lines.
Enriched pathways for major depressive disorder identified from a genome-wide association study.
Kao, Chung-Feng; Jia, Peilin; Zhao, Zhongming; Kuo, Po-Hsiu
2012-11-01
Major depressive disorder (MDD) has caused a substantial burden of disease worldwide with moderate heritability. Despite efforts through conducting numerous association studies and now, genome-wide association (GWA) studies, the success of identifying susceptibility loci for MDD has been limited, which is partially attributed to the complex nature of depression pathogenesis. A pathway-based analytic strategy to investigate the joint effects of various genes within specific biological pathways has emerged as a powerful tool for complex traits. The present study aimed to identify enriched pathways for depression using a GWA dataset for MDD. For each gene, we estimated its gene-wise p value using combined and minimum p value, separately. Canonical pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and BioCarta were used. We employed four pathway-based analytic approaches (gene set enrichment analysis, hypergeometric test, sum-square statistic, sum-statistic). We adjusted for multiple testing using Benjamini & Hochberg's method to report significant pathways. We found 17 significantly enriched pathways for depression, which presented low-to-intermediate crosstalk. The top four pathways were long-term depression (p⩽1×10-5), calcium signalling (p⩽6×10-5), arrhythmogenic right ventricular cardiomyopathy (p⩽1.6×10-4) and cell adhesion molecules (p⩽2.2×10-4). In conclusion, our comprehensive pathway analyses identified promising pathways for depression that are related to neurotransmitter and neuronal systems, immune system and inflammatory response, which may be involved in the pathophysiological mechanisms underlying depression. We demonstrated that pathway enrichment analysis is promising to facilitate our understanding of complex traits through a deeper interpretation of GWA data. Application of this comprehensive analytic strategy in upcoming GWA data for depression could validate the findings reported in this study.
2012-01-01
Background Early liver development and the transcriptional transitions during hepatogenesis are well characterized. However, gene expression changes during the late postnatal/pre-pubertal to young adulthood period are less well understood, especially with regards to sex-specific gene expression. Methods Microarray analysis of male and female mouse liver was carried out at 3, 4, and 8 wk of age to elucidate developmental changes in gene expression from the late postnatal/pre-pubertal period to young adulthood. Results A large number of sex-biased and sex-independent genes showed significant changes during this developmental period. Notably, sex-independent genes involved in cell cycle, chromosome condensation, and DNA replication were down regulated from 3 wk to 8 wk, while genes associated with metal ion binding, ion transport and kinase activity were up regulated. A majority of genes showing sex differential expression in adult liver did not display sex differences prior to puberty, at which time extensive changes in sex-specific gene expression were seen, primarily in males. Thus, in male liver, 76% of male-specific genes were up regulated and 47% of female-specific genes were down regulated from 3 to 8 wk of age, whereas in female liver 67% of sex-specific genes showed no significant change in expression. In both sexes, genes up regulated from 3 to 8 wk were significantly enriched (p < E-76) in the set of genes positively regulated by the liver transcription factor HNF4α, as determined in a liver-specific HNF4α knockout mouse model, while genes down regulated during this developmental period showed significant enrichment (p < E-65) for negative regulation by HNF4α. Significant enrichment of the developmentally regulated genes in the set of genes subject to positive and negative regulation by pituitary hormone was also observed. Five sex-specific transcriptional regulators showed sex-specific expression at 4 wk (male-specific Ihh; female-specific Cdx4, Cux2, Tox, and Trim24) and may contribute to the developmental changes that lead to global acquisition of liver sex-specificity by 8 wk of age. Conclusions Overall, the observed changes in gene expression during postnatal liver development reflect the deceleration of liver growth and the induction of specialized liver functions, with widespread changes in sex-specific gene expression primarily occurring in male liver. PMID:22475005
Pathway analysis of high-throughput biological data within a Bayesian network framework.
Isci, Senol; Ozturk, Cengizhan; Jones, Jon; Otu, Hasan H
2011-06-15
Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to causal inference. Here, we describe for the first time an algorithm that models biological pathways as BNs and identifies pathways that best explain given HTBD by scoring fitness of each network. Proposed method takes into account the connectivity and relatedness between nodes of the pathway through factoring pathway topology in its model. Our simulations using synthetic data demonstrated robustness of our approach. We tested proposed method, Bayesian Pathway Analysis (BPA), on human microarray data regarding renal cell carcinoma (RCC) and compared our results with gene set enrichment analysis. BPA was able to find broader and more specific pathways related to RCC. Accompanying BPA software (BPAS) package is freely available for academic use at http://bumil.boun.edu.tr/bpa.
Blatti, Charles; Sinha, Saurabh
2014-07-01
The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zang, Hongyan; Li, Ning; Pan, Yuling; Hao, Jingguang
2017-03-01
Breast cancer is a common malignancy among women with a rising incidence. Our intention was to detect transcription factors (TFs) for deeper understanding of the underlying mechanisms of breast cancer. Integrated analysis of gene expression datasets of breast cancer was performed. Then, functional annotation of differentially expressed genes (DEGs) was conducted, including Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Furthermore, TFs were identified and a global transcriptional regulatory network was constructed. Seven publically available GEO datasets were obtained, and a set of 1196 DEGs were identified (460 up-regulated and 736 down-regulated). Functional annotation results showed that cell cycle was the most significantly enriched pathway, which was consistent with the fact that cell cycle is closely related to various tumors. Fifty-three differentially expressed TFs were identified, and the regulatory networks consisted of 817 TF-target interactions between 46 TFs and 602 DEGs in the context of breast cancer. Top 10 TFs covering the most downstream DEGs were SOX10, NFATC2, ZNF354C, ARID3A, BRCA1, FOXO3, GATA3, ZEB1, HOXA5 and EGR1. The transcriptional regulatory networks could enable a better understanding of regulatory mechanisms of breast cancer pathology and provide an opportunity for the development of potential therapy.
Li, Zheng; Srivastava, Shireesh; Yang, Xuerui; Mittal, Sheenu; Norton, Paul; Resau, James; Haab, Brian; Chan, Christina
2007-01-01
Background Free fatty acids (FFA) and tumor necrosis factor alpha (TNF-α) have been implicated in the pathogenesis of many obesity-related metabolic disorders. When human hepatoblastoma cells (HepG2) were exposed to different types of FFA and TNF-α, saturated fatty acid was found to be cytotoxic and its toxicity was exacerbated by TNF-α. In order to identify the processes associated with the toxicity of saturated FFA and TNF-α, the metabolic and gene expression profiles were measured to characterize the cellular states. A computational model was developed to integrate these disparate data to reveal the underlying pathways and mechanisms involved in saturated fatty acid toxicity. Results A hierarchical framework consisting of three stages was developed to identify the processes and genes that regulate the toxicity. First, discriminant analysis identified that fatty acid oxidation and intracellular triglyceride accumulation were the most relevant in differentiating the cytotoxic phenotype. Second, gene set enrichment analysis (GSEA) was applied to the cDNA microarray data to identify the transcriptionally altered pathways and processes. Finally, the genes and gene sets that regulate the metabolic responses identified in step 1 were identified by integrating the expression of the enriched gene sets and the metabolic profiles with a multi-block partial least squares (MBPLS) regression model. Conclusion The hierarchical approach suggested potential mechanisms involved in mediating the cytotoxic and cytoprotective pathways, as well as identified novel targets, such as NADH dehydrogenases, aldehyde dehydrogenases 1A1 (ALDH1A1) and endothelial membrane protein 3 (EMP3) as modulator of the toxic phenotypes. These predictions, as well as, some specific targets that were suggested by the analysis were experimentally validated. PMID:17498300
Berke, Lidija; Snel, Berend
2014-01-01
The histone modification H3K27me3 is involved in repression of transcription and plays a crucial role in developmental transitions in both animals and plants. It is deposited by PRC2 (Polycomb repressive complex 2), a conserved protein complex. In Arabidopsis thaliana, H3K27me3 is found at 15% of all genes. These tend to encode transcription factors and other regulators important for development. However, it is not known how PRC2 is recruited to target loci nor how this set of target genes arose during Arabidopsis evolution. To resolve the latter, we integrated A. thaliana gene families with five independent genome-wide H3K27me3 data sets. Gene families were either significantly enriched or depleted of H3K27me3, showing a strong impact of shared ancestry to H3K27me3 distribution. To quantify this, we performed ancestral state reconstruction of H3K27me3 on phylogenetic trees of gene families. The set of H3K27me3-marked genes changed less than expected by chance, suggesting that H3K27me3 was retained after gene duplication. This retention suggests that the PRC2-recruiting signal could be encoded in the DNA and also conserved among certain duplicated genes. Indeed, H3K27me3-marked genes were overrepresented among paralogs sharing conserved noncoding sequences (CNSs) that are enriched with transcription factor binding sites. The association of upstream CNSs with H3K27me3-marked genes represents the first genome-wide connection between H3K27me3 and potential regulatory elements in plants. Thus, we propose that CNSs likely function as part of the PRC2 recruitment in plants. PMID:24567304
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.
Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C
2018-08-01
Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.
Time-Course Gene Set Analysis for Longitudinal Gene Expression Data
Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe
2015-01-01
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374
Omony, Jimmy; de Jong, Anne; Krawczyk, Antonina O.; Eijlander, Robyn T.; Kuipers, Oscar P.
2018-01-01
Sporulation is a survival strategy, adapted by bacterial cells in response to harsh environmental adversities. The adaptation potential differs between strains and the variations may arise from differences in gene regulation. Gene networks are a valuable way of studying such regulation processes and establishing associations between genes. We reconstructed and compared sporulation gene co-expression networks (GCNs) of the model laboratory strain Bacillus subtilis 168 and the food-borne industrial isolate Bacillus amyloliquefaciens. Transcriptome data obtained from samples of six stages during the sporulation process were used for network inference. Subsequently, a gene set enrichment analysis was performed to compare the reconstructed GCNs of B. subtilis 168 and B. amyloliquefaciens with respect to biological functions, which showed the enriched modules with coherent functional groups associated with sporulation. On basis of the GCNs and time-evolution of differentially expressed genes, we could identify novel candidate genes strongly associated with sporulation in B. subtilis 168 and B. amyloliquefaciens. The GCNs offer a framework for exploring transcription factors, their targets, and co-expressed genes during sporulation. Furthermore, the methodology described here can conveniently be applied to other species or biological processes. PMID:29424683
Omony, Jimmy; de Jong, Anne; Krawczyk, Antonina O; Eijlander, Robyn T; Kuipers, Oscar P
2018-02-09
Sporulation is a survival strategy, adapted by bacterial cells in response to harsh environmental adversities. The adaptation potential differs between strains and the variations may arise from differences in gene regulation. Gene networks are a valuable way of studying such regulation processes and establishing associations between genes. We reconstructed and compared sporulation gene co-expression networks (GCNs) of the model laboratory strain Bacillus subtilis 168 and the food-borne industrial isolate Bacillus amyloliquefaciens. Transcriptome data obtained from samples of six stages during the sporulation process were used for network inference. Subsequently, a gene set enrichment analysis was performed to compare the reconstructed GCNs of B. subtilis 168 and B. amyloliquefaciens with respect to biological functions, which showed the enriched modules with coherent functional groups associated with sporulation. On basis of the GCNs and time-evolution of differentially expressed genes, we could identify novel candidate genes strongly associated with sporulation in B. subtilis 168 and B. amyloliquefaciens. The GCNs offer a framework for exploring transcription factors, their targets, and co-expressed genes during sporulation. Furthermore, the methodology described here can conveniently be applied to other species or biological processes.
Lavallée-Adam, Mathieu; Rauniyar, Navin; McClatchy, Daniel B; Yates, John R
2014-12-05
The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights.
2015-01-01
The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights. PMID:25177766
Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David
2009-01-01
Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178
Maratou, Klio; Wallace, Victoria C.J.; Hasnie, Fauzia S.; Okuse, Kenji; Hosseini, Ramine; Jina, Nipurna; Blackbeard, Julie; Pheby, Timothy; Orengo, Christine; Dickenson, Anthony H.; McMahon, Stephen B.; Rice, Andrew S.C.
2009-01-01
To elucidate the mechanisms underlying peripheral neuropathic pain in the context of HIV infection and antiretroviral therapy, we measured gene expression in dorsal root ganglia (DRG) of rats subjected to systemic treatment with the anti-retroviral agent, ddC (Zalcitabine) and concomitant delivery of HIV-gp120 to the rat sciatic nerve. L4 and L5 DRGs were collected at day 14 (time of peak behavioural change) and changes in gene expression were measured using Affymetrix whole genome rat arrays. Conventional analysis of this data set and Gene Set Enrichment Analysis (GSEA) was performed to discover biological processes altered in this model. Transcripts associated with G protein coupled receptor signalling and cell adhesion were enriched in the treated animals, while ribosomal proteins and proteasome pathways were associated with gene down-regulation. To identify genes that are directly relevant to neuropathic mechanical hypersensitivity, as opposed to epiphenomena associated with other aspects of the response to a sciatic nerve lesion, we compared the gp120 + ddC-evoked gene expression with that observed in a model of traumatic neuropathic pain (L5 spinal nerve transection), where hypersensitivity to a static mechanical stimulus is also observed. We identified 39 genes/expressed sequence tags that are differentially expressed in the same direction in both models. Most of these have not previously been implicated in mechanical hypersensitivity and may represent novel targets for therapeutic intervention. As an external control, the RNA expression of three genes was examined by RT-PCR, while the protein levels of two were studied using western blot analysis. PMID:18606552
Li, Qike; Schissler, A Grant; Gardeux, Vincent; Achour, Ikbel; Kenost, Colleen; Berghout, Joanne; Li, Haiquan; Zhang, Hao Helen; Lussier, Yves A
2017-05-24
Transcriptome analytic tools are commonly used across patient cohorts to develop drugs and predict clinical outcomes. However, as precision medicine pursues more accurate and individualized treatment decisions, these methods are not designed to address single-patient transcriptome analyses. We previously developed and validated the N-of-1-pathways framework using two methods, Wilcoxon and Mahalanobis Distance (MD), for personal transcriptome analysis derived from a pair of samples of a single patient. Although, both methods uncover concordantly dysregulated pathways, they are not designed to detect dysregulated pathways with up- and down-regulated genes (bidirectional dysregulation) that are ubiquitous in biological systems. We developed N-of-1-pathways MixEnrich, a mixture model followed by a gene set enrichment test, to uncover bidirectional and concordantly dysregulated pathways one patient at a time. We assess its accuracy in a comprehensive simulation study and in a RNA-Seq data analysis of head and neck squamous cell carcinomas (HNSCCs). In presence of bidirectionally dysregulated genes in the pathway or in presence of high background noise, MixEnrich substantially outperforms previous single-subject transcriptome analysis methods, both in the simulation study and the HNSCCs data analysis (ROC Curves; higher true positive rates; lower false positive rates). Bidirectional and concordant dysregulated pathways uncovered by MixEnrich in each patient largely overlapped with the quasi-gold standard compared to other single-subject and cohort-based transcriptome analyses. The greater performance of MixEnrich presents an advantage over previous methods to meet the promise of providing accurate personal transcriptome analysis to support precision medicine at point of care.
Challenges of the information age: the impact of false discovery on pathway identification.
Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E
2012-11-21
Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.
Evaluation of genome-wide association study results through development of ontology fingerprints
Tsoi, Lam C.; Boehnke, Michael; Klein, Richard L.; Zheng, W. Jim
2009-01-01
Motivation: Genome-wide association (GWA) studies may identify multiple variants that are associated with a disease or trait. To narrow down candidates for further validation, quantitatively assessing how identified genes relate to a phenotype of interest is important. Results: We describe an approach to characterize genes or biological concepts (phenotypes, pathways, diseases, etc.) by ontology fingerprint—the set of Gene Ontology (GO) terms that are overrepresented among the PubMed abstracts discussing the gene or biological concept together with the enrichment p-value of these terms generated from a hypergeometric enrichment test. We then quantify the relevance of genes to the trait from a GWA study by calculating similarity scores between their ontology fingerprints using enrichment p-values. We validate this approach by correctly identifying corresponding genes for biological pathways with a 90% average area under the ROC curve (AUC). We applied this approach to rank genes identified through a GWA study that are associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, lipoprotein lipase (LPL) and cholesterol ester transfer protein, plasma for high-density lipoprotein; low-density lipoprotein receptor, APOE and APOB for low-density lipoprotein; and LPL, APOA1 and APOB for triglyceride. In addition, we identified genes relevant to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation. Contact: zhengw@musc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19349285
2010-01-01
One of the important challenges to post-genomic biology is relating observed phenotypic alterations to the underlying collective alterations in genes. Current inferential methods, however, invariably omit large bodies of information on the relationships between genes. We present a method that takes account of such information - expressed in terms of the topology of a correlation network - and we apply the method in the context of current procedures for gene set enrichment analysis. PMID:20187943
Molecular profiles to biology and pathways: a systems biology approach.
Van Laere, Steven; Dirix, Luc; Vermeulen, Peter
2016-06-16
Interpreting molecular profiles in a biological context requires specialized analysis strategies. Initially, lists of relevant genes were screened to identify enriched concepts associated with pathways or specific molecular processes. However, the shortcoming of interpreting gene lists by using predefined sets of genes has resulted in the development of novel methods that heavily rely on network-based concepts. These algorithms have the advantage that they allow a more holistic view of the signaling properties of the condition under study as well as that they are suitable for integrating different data types like gene expression, gene mutation, and even histological parameters.
Watanabe, Kazuhide; Biesinger, Jacob; Salmans, Michael L.; Roberts, Brian S.; Arthur, William T.; Cleary, Michele; Andersen, Bogi; Xie, Xiaohui; Dai, Xing
2014-01-01
Background Deregulation of canonical Wnt/CTNNB1 (beta-catenin) pathway is one of the earliest events in the pathogenesis of colon cancer. Mutations in APC or CTNNB1 are highly frequent in colon cancer and cause aberrant stabilization of CTNNB1, which activates the transcription of Wnt target genes by binding to chromatin via the TCF/LEF transcription factors. Here we report an integrative analysis of genome-wide chromatin occupancy of CTNNB1 by chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) and gene expression profiling by microarray analysis upon RNAi-mediated knockdown of CTNNB1 in colon cancer cells. Results We observed 3629 CTNNB1 binding peaks across the genome and a significant correlation between CTNNB1 binding and knockdown-induced gene expression change. Our integrative analysis led to the discovery of a direct Wnt target signature composed of 162 genes. Gene ontology analysis of this signature revealed a significant enrichment of Wnt pathway genes, suggesting multiple feedback regulations of the pathway. We provide evidence that this gene signature partially overlaps with the Lgr5+ intestinal stem cell signature, and is significantly enriched in normal intestinal stem cells as well as in clinical colorectal cancer samples. Interestingly, while the expression of the CTNNB1 target gene set does not correlate with survival, elevated expression of negative feedback regulators within the signature predicts better prognosis. Conclusion Our data provide a genome-wide view of chromatin occupancy and gene regulation of Wnt/CTNNB1 signaling in colon cancer cells. PMID:24651522
Watanabe, Kazuhide; Biesinger, Jacob; Salmans, Michael L; Roberts, Brian S; Arthur, William T; Cleary, Michele; Andersen, Bogi; Xie, Xiaohui; Dai, Xing
2014-01-01
Deregulation of canonical Wnt/CTNNB1 (beta-catenin) pathway is one of the earliest events in the pathogenesis of colon cancer. Mutations in APC or CTNNB1 are highly frequent in colon cancer and cause aberrant stabilization of CTNNB1, which activates the transcription of Wnt target genes by binding to chromatin via the TCF/LEF transcription factors. Here we report an integrative analysis of genome-wide chromatin occupancy of CTNNB1 by chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) and gene expression profiling by microarray analysis upon RNAi-mediated knockdown of CTNNB1 in colon cancer cells. We observed 3629 CTNNB1 binding peaks across the genome and a significant correlation between CTNNB1 binding and knockdown-induced gene expression change. Our integrative analysis led to the discovery of a direct Wnt target signature composed of 162 genes. Gene ontology analysis of this signature revealed a significant enrichment of Wnt pathway genes, suggesting multiple feedback regulations of the pathway. We provide evidence that this gene signature partially overlaps with the Lgr5+ intestinal stem cell signature, and is significantly enriched in normal intestinal stem cells as well as in clinical colorectal cancer samples. Interestingly, while the expression of the CTNNB1 target gene set does not correlate with survival, elevated expression of negative feedback regulators within the signature predicts better prognosis. Our data provide a genome-wide view of chromatin occupancy and gene regulation of Wnt/CTNNB1 signaling in colon cancer cells.
RNAseq Analysis of the Drosophila Response to the Entomopathogenic Nematode Steinernema
Yadav, Shruti; Daugherty, Sean; Shetty, Amol Carl; Eleftherianos, Ioannis
2017-01-01
Drosophila melanogaster is an outstanding model to study the molecular and functional basis of host–pathogen interactions. Currently, our knowledge of microbial infections in D. melanogaster is well understood; however, the response of flies to nematode infections is still in its infancy. Here, we have used the potent parasitic nematode Steinernema carpocapsae, which lives in mutualism with its endosymbiotic bacteria Xenorhabdus nematophila, to examine the transcriptomic basis of the interaction between D. melanogaster and entomopathogenic nematodes. We have employed next-generation RNA sequencing (RNAseq) to investigate the transcriptomic profile of D. melanogaster larvae in response to infection by S. carpocapsae symbiotic (carrying X. nematophila) or axenic (lacking X. nematophila) nematodes. Bioinformatic analyses have identified the strong induction of genes that are associated with the peritrophic membrane and the stress response, as well as several genes that participate in developmental processes. We have also found that genes with different biological functions are enriched in D. melanogaster larvae responding to either symbiotic or axenic nematodes. We further show that while symbiotic nematode infection enriched certain known immune-related genes, axenic nematode infection enriched several genes associated with chitin binding, lipid metabolic functions, and neuroactive ligand receptors. In addition, we have identified genes with a potential role in nematode recognition and genes with potential antinematode activity. Findings from this study will undoubtedly set the stage for the identification of key regulators of antinematode immune mechanisms in D. melanogaster, as well as in other insects of socioeconomic importance. PMID:28450373
RNAseq Analysis of the Drosophila Response to the Entomopathogenic Nematode Steinernema.
Yadav, Shruti; Daugherty, Sean; Shetty, Amol Carl; Eleftherianos, Ioannis
2017-06-07
Drosophila melanogaster is an outstanding model to study the molecular and functional basis of host-pathogen interactions. Currently, our knowledge of microbial infections in D. melanogaster is well understood; however, the response of flies to nematode infections is still in its infancy. Here, we have used the potent parasitic nematode Steinernema carpocapsae , which lives in mutualism with its endosymbiotic bacteria Xenorhabdus nematophila , to examine the transcriptomic basis of the interaction between D. melanogaster and entomopathogenic nematodes. We have employed next-generation RNA sequencing (RNAseq) to investigate the transcriptomic profile of D. melanogaster larvae in response to infection by S. carpocapsae symbiotic (carrying X. nematophila ) or axenic (lacking X. nematophila ) nematodes. Bioinformatic analyses have identified the strong induction of genes that are associated with the peritrophic membrane and the stress response, as well as several genes that participate in developmental processes. We have also found that genes with different biological functions are enriched in D. melanogaster larvae responding to either symbiotic or axenic nematodes. We further show that while symbiotic nematode infection enriched certain known immune-related genes, axenic nematode infection enriched several genes associated with chitin binding, lipid metabolic functions, and neuroactive ligand receptors. In addition, we have identified genes with a potential role in nematode recognition and genes with potential antinematode activity. Findings from this study will undoubtedly set the stage for the identification of key regulators of antinematode immune mechanisms in D. melanogaster , as well as in other insects of socioeconomic importance. Copyright © 2017 Yadav et al.
Wilson, Paul; Larminie, Christopher; Smith, Rona
2016-01-01
To use literature mining to catalogue Behçet's associated genes, and advanced computational methods to improve the understanding of the pathways and signalling mechanisms that lead to the typical clinical characteristics of Behçet's patients. To extend this technique to identify potential treatment targets for further experimental validation. Text mining methods combined with gene enrichment tools, pathway analysis and causal analysis algorithms. This approach identified 247 human genes associated with Behçet's disease and the resulting disease map, comprising 644 nodes and 19220 edges, captured important details of the relationships between these genes and their associated pathways, as described in diverse data repositories. Pathway analysis has identified how Behçet's associated genes are likely to participate in innate and adaptive immune responses. Causal analysis algorithms have identified a number of potential therapeutic strategies for further investigation. Computational methods have captured pertinent features of the prominent disease characteristics presented in Behçet's disease and have highlighted NOD2, ICOS and IL18 signalling as potential therapeutic strategies.
Upregulation of miR-146a by YY1 depletion correlates with delayed progression of prostate cancer
Huang, Yeqing; Tao, Tao; Liu, Chunhui; Guan, Han; Zhang, Guangyuan; Ling, Zhixin; Zhang, Lei; Lu, Kai; Chen, Shuqiu; Xu, Bin; Chen, Ming
2017-01-01
Previously published studies explained that the excessive expression of miR-146a influences the prostate cancer (PCa) cells in terms of apoptosis, progression, and viability. Although miR-146a acts as a tumor suppressor, current knowledge on the molecular mechanisms that controls its expression in PCa is limited. In this study, gene set enrichment analysis (GSEA) showed negatively enriched expression of miR-146a target gene sets and positively enriched expression of gene sets suppressed by the enhancer of zeste homolog 2 (EZH2) after YY1 depletion in PCa cells. The current results demonstrated that the miR-146a levels in PCa tissues with high Gleason scores (>7) are significantly lower than those in PCa tissues with low Gleason scores (≤7), which were initially observed in the clinical specimens. An inverse relationship between YY1 and miR-146a expression was also observed. Experiments indicated the decrease in cell viability, proliferation, and promoting apoptosis after YY1 depletion, while through inhibiting miR-146a could alleviate the negative effect brought by YY1 depletion. We detected the reversed adjustment of YY1 to accommodate miR-146a transcriptions. On the basis of YY1 depletion, we determined that the expression of miR-146a increased after EZH2 knockdown. We validated the combination of YY1 and its interaction with EZH2 at the miR-146a promoter binding site, thereby prohibiting the transcriptional activity of miR-146a in PCa cells. Our results suggested that YY1 depletion repressed PCa cell viability and proliferation and induced apoptosis at least in a miR-146a-assisted manner. PMID:28101571
Liu, Hsi-Che; Shih, Lee-Yung; May Chen, Mei-Ju; Wang, Chien-Chih; Yeh, Ting-Chi; Lin, Tung-Huei; Chen, Chien-Yu; Lin, Chih-Jen; Liang, Der-Cherng
2011-05-01
In acute myeloid leukemia (AML), the mixed lineage leukemia (MLL) gene may be rearranged to generate a partial tandem duplication (PTD), or fused to partner genes through a chromosomal translocation (tMLL). In this study, we first explored the differentially expressed genes between MLL-PTD and tMLL using gene expression profiling of our cohort (15 MLL-PTD and 10 tMLL) and one published data set. The top 250 probes were chosen from each set, resulting in 29 common probes (21 unique genes) to both sets. The selected genes include four HOXB genes, HOXB2, B3, B5, and B6. The expression values of these HOXB genes significantly differ between MLL-PTD and tMLL cases. Clustering and classification analyses were thoroughly conducted to support our gene selection results. Second, as MLL-PTD, FLT3-ITD, and NPM1 mutations are identified in AML with normal karyotypes, we briefly studied their impact on the HOXB genes. Another contribution of this study is to demonstrate that using public data from other studies enriches samples for analysis and yields more conclusive results. 2011 Elsevier Inc. All rights reserved.
Kim, Jun-Mo; Lim, Kyu-Sang; Byun, Mijeong; Lee, Kyung-Tai; Yang, Young-Rok; Park, Mina; Lim, Dajeong; Chai, Han-Ha; Bang, Han-Tae; Hwangbo, Jong; Choi, Yang-Ho; Cho, Yong-Min; Park, Jong-Eun
2017-11-01
White Pekin duck is an important meat resource in the livestock industries. However, the temperature increase due to global warming has become a serious environmental factor in duck production, because of hyperthermia. Therefore, identifying the gene regulations and understanding the molecular mechanism for adaptation to the warmer environment will provide insightful information on the acclimation system of ducks. This study examined transcriptomic responses to heat stress treatments (3 and 6 h at 35 °C) and control (C, 25 °C) using RNA-sequencing analysis of genes from the breast muscle tissue. Based on three distinct differentially expressed gene (DEG) sets (3H/C, 6H/C, and 6H/3H), the expression patterns of significant DEGs (absolute log2 > 1.0 and false discovery rate < 0.05) were clustered into three responsive gene groups divided into upregulated and downregulated genes. Next, we analyzed the clusters that showed relatively higher expression levels in 3H/C and lower levels in 6H/C with much lower or opposite levels in 6H/3H; we referred to these clusters as the adaptable responsive gene group. These genes were significantly enriched in the ErbB signaling pathway, neuroactive ligand-receptor interaction and type II diabetes mellitus in the KEGG pathways (P < 0.01). From the functional enrichment analysis and significantly regulated genes observed in the enriched pathways, we think that the adaptable responsive genes are responsible for the acclimation mechanism of ducks and suggest that the regulation of phosphoinositide 3-kinase genes including PIK3R6, PIK3R5, and PIK3C2B has an important relationship with the mechanisms of adaptation to heat stress in ducks.
Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J
2008-01-01
ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Jiang, Jiyang; Thalamuthu, Anbupalam; Ho, Jennifer E.; Mahajan, Anubha; Ek, Weronica E.; Brown, David A.; Breit, Samuel N.; Wang, Thomas J.; Gyllensten, Ulf; Chen, Ming-Huei; Enroth, Stefan; Januzzi, James L.; Lind, Lars; Armstrong, Nicola J.; Kwok, John B.; Schofield, Peter R.; Wen, Wei; Trollor, Julian N.; Johansson, Åsa; Morris, Andrew P.; Vasan, Ramachandran S.; Sachdev, Perminder S.; Mather, Karen A.
2018-01-01
Blood levels of growth differentiation factor-15 (GDF-15), also known as macrophage inhibitory cytokine-1 (MIC-1), have been associated with various pathological processes and diseases, including cardiovascular disease and cancer. Prior studies suggest genetic factors play a role in regulating blood MIC-1/GDF-15 concentration. In the current study, we conducted the largest genome-wide association study (GWAS) to date using a sample of ∼5,400 community-based Caucasian participants, to determine the genetic variants associated with MIC-1/GDF-15 blood concentration. Conditional and joint (COJO), gene-based association, and gene-set enrichment analyses were also carried out to identify novel loci, genes, and pathways. Consistent with prior results, a locus on chromosome 19, which includes nine single nucleotide polymorphisms (SNPs) (top SNP, rs888663, p = 1.690 × 10-35), was significantly associated with blood MIC-1/GDF-15 concentration, and explained 21.47% of its variance. COJO analysis showed evidence for two independent signals within this locus. Gene-based analysis confirmed the chromosome 19 locus association and in addition, a putative locus on chromosome 1. Gene-set enrichment analyses showed that the“COPI-mediated anterograde transport” gene-set was associated with MIC-1/GDF15 blood concentration with marginal significance after FDR correction (p = 0.067). In conclusion, a locus on chromosome 19 was associated with MIC-1/GDF-15 blood concentration with genome-wide significance, with evidence for a new locus (chromosome 1). Future studies using independent cohorts are needed to confirm the observed associations especially for the chromosomes 1 locus, and to further investigate and identify the causal SNPs that contribute to MIC-1/GDF-15 levels. PMID:29628937
Jiang, Jiyang; Thalamuthu, Anbupalam; Ho, Jennifer E; Mahajan, Anubha; Ek, Weronica E; Brown, David A; Breit, Samuel N; Wang, Thomas J; Gyllensten, Ulf; Chen, Ming-Huei; Enroth, Stefan; Januzzi, James L; Lind, Lars; Armstrong, Nicola J; Kwok, John B; Schofield, Peter R; Wen, Wei; Trollor, Julian N; Johansson, Åsa; Morris, Andrew P; Vasan, Ramachandran S; Sachdev, Perminder S; Mather, Karen A
2018-01-01
Blood levels of growth differentiation factor-15 (GDF-15), also known as macrophage inhibitory cytokine-1 (MIC-1), have been associated with various pathological processes and diseases, including cardiovascular disease and cancer. Prior studies suggest genetic factors play a role in regulating blood MIC-1/GDF-15 concentration. In the current study, we conducted the largest genome-wide association study (GWAS) to date using a sample of ∼5,400 community-based Caucasian participants, to determine the genetic variants associated with MIC-1/GDF-15 blood concentration. Conditional and joint (COJO), gene-based association, and gene-set enrichment analyses were also carried out to identify novel loci, genes, and pathways. Consistent with prior results, a locus on chromosome 19, which includes nine single nucleotide polymorphisms (SNPs) (top SNP, rs888663, p = 1.690 × 10 -35 ), was significantly associated with blood MIC-1/GDF-15 concentration, and explained 21.47% of its variance. COJO analysis showed evidence for two independent signals within this locus. Gene-based analysis confirmed the chromosome 19 locus association and in addition, a putative locus on chromosome 1. Gene-set enrichment analyses showed that the"COPI-mediated anterograde transport" gene-set was associated with MIC-1/GDF15 blood concentration with marginal significance after FDR correction ( p = 0.067). In conclusion, a locus on chromosome 19 was associated with MIC-1/GDF-15 blood concentration with genome-wide significance, with evidence for a new locus (chromosome 1). Future studies using independent cohorts are needed to confirm the observed associations especially for the chromosomes 1 locus, and to further investigate and identify the causal SNPs that contribute to MIC-1/GDF-15 levels.
Model-based gene set analysis for Bioconductor.
Bauer, Sebastian; Robinson, Peter N; Gagneur, Julien
2011-07-01
Gene Ontology and other forms of gene-category analysis play a major role in the evaluation of high-throughput experiments in molecular biology. Single-category enrichment analysis procedures such as Fisher's exact test tend to flag large numbers of redundant categories as significant, which can complicate interpretation. We have recently developed an approach called model-based gene set analysis (MGSA), that substantially reduces the number of redundant categories returned by the gene-category analysis. In this work, we present the Bioconductor package mgsa, which makes the MGSA algorithm available to users of the R language. Our package provides a simple and flexible application programming interface for applying the approach. The mgsa package has been made available as part of Bioconductor 2.8. It is released under the conditions of the Artistic license 2.0. peter.robinson@charite.de; julien.gagneur@embl.de.
Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won
2014-01-01
Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways.
Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won
2014-01-01
Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways. PMID:24497971
Liedtke, Wolfgang B.; McKinley, Michael J.; Walker, Lesley L.; Zhang, Hao; Pfenning, Andreas R.; Drago, John; Hochendoner, Sarah J.; Hilton, Donald L.; Lawrence, Andrew J.; Denton, Derek A.
2011-01-01
Sodium appetite is an instinct that involves avid specific intention. It is elicited by sodium deficiency, stress-evoked adrenocorticotropic hormone (ACTH), and reproduction. Genome-wide microarrays in sodium-deficient mice or after ACTH infusion showed up-regulation of hypothalamic genes, including dopamine- and cAMP-regulated neuronal phosphoprotein 32 kDa (DARPP-32), dopamine receptors-1 and -2, α-2C- adrenoceptor, and striatally enriched protein tyrosine phosphatase (STEP). Both DARPP-32 and neural plasticity regulator activity-regulated cytoskeleton associated protein (ARC) were up-regulated in lateral hypothalamic orexinergic neurons by sodium deficiency. Administration of dopamine D1 (SCH23390) and D2 receptor (raclopride) antagonists reduced gratification of sodium appetite triggered by sodium deficiency. SCH23390 was specific, having no effect on osmotic-induced water drinking, whereas raclopride also reduced water intake. D1 receptor KO mice had normal sodium appetite, indicating compensatory regulation. Appetite was insensitive to SCH23390, confirming the absence of off-target effects. Bilateral microinjection of SCH23390 (100 nM in 200 nL) into rats’ lateral hypothalamus greatly reduced sodium appetite. Gene set enrichment analysis in hypothalami of mice with sodium appetite showed significant enrichment of gene sets previously linked to addiction (opiates and cocaine). This finding of concerted gene regulation was attenuated on gratification with perplexingly rapid kinetics of only 10 min, anteceding significant absorption of salt from the gut. Salt appetite and hedonic liking of salt taste have evolved over >100 million y (e.g., being present in Metatheria). Drugs causing pleasure and addiction are comparatively recent and likely reflect usurping of evolutionary ancient systems with high survival value by the gratification of contemporary hedonic indulgences. Our findings outline a molecular logic for instinctive behavior encoded by the brain with possible important translational–medical implications. PMID:21746918
Liedtke, Wolfgang B; McKinley, Michael J; Walker, Lesley L; Zhang, Hao; Pfenning, Andreas R; Drago, John; Hochendoner, Sarah J; Hilton, Donald L; Lawrence, Andrew J; Denton, Derek A
2011-07-26
Sodium appetite is an instinct that involves avid specific intention. It is elicited by sodium deficiency, stress-evoked adrenocorticotropic hormone (ACTH), and reproduction. Genome-wide microarrays in sodium-deficient mice or after ACTH infusion showed up-regulation of hypothalamic genes, including dopamine- and cAMP-regulated neuronal phosphoprotein 32 kDa (DARPP-32), dopamine receptors-1 and -2, α-2C- adrenoceptor, and striatally enriched protein tyrosine phosphatase (STEP). Both DARPP-32 and neural plasticity regulator activity-regulated cytoskeleton associated protein (ARC) were up-regulated in lateral hypothalamic orexinergic neurons by sodium deficiency. Administration of dopamine D1 (SCH23390) and D2 receptor (raclopride) antagonists reduced gratification of sodium appetite triggered by sodium deficiency. SCH23390 was specific, having no effect on osmotic-induced water drinking, whereas raclopride also reduced water intake. D1 receptor KO mice had normal sodium appetite, indicating compensatory regulation. Appetite was insensitive to SCH23390, confirming the absence of off-target effects. Bilateral microinjection of SCH23390 (100 nM in 200 nL) into rats' lateral hypothalamus greatly reduced sodium appetite. Gene set enrichment analysis in hypothalami of mice with sodium appetite showed significant enrichment of gene sets previously linked to addiction (opiates and cocaine). This finding of concerted gene regulation was attenuated on gratification with perplexingly rapid kinetics of only 10 min, anteceding significant absorption of salt from the gut. Salt appetite and hedonic liking of salt taste have evolved over >100 million y (e.g., being present in Metatheria). Drugs causing pleasure and addiction are comparatively recent and likely reflect usurping of evolutionary ancient systems with high survival value by the gratification of contemporary hedonic indulgences. Our findings outline a molecular logic for instinctive behavior encoded by the brain with possible important translational-medical implications.
Hatt, Lotte; Aagaard, Mads M; Bach, Cathrine; Graakjaer, Jesper; Sommer, Steffen; Agerholm, Inge E; Kølvraa, Steen; Bojesen, Anders
2016-01-01
Methylation-based non-invasive prenatal testing of fetal aneuploidies is an alternative method that could possibly improve fetal aneuploidy diagnosis, especially for trisomy 13(T13) and trisomy 18(T18). Our aim was to study the methylation landscape in placenta DNA from trisomy 13, 18 and 21 pregnancies in an attempt to find trisomy-specific methylation differences better suited for non-invasive prenatal diagnosis. We have conducted high-resolution methylation specific bead chip microarray analyses assessing more than 450,000 CpGs analyzing placentas from 12 T21 pregnancies, 12 T18 pregnancies and 6 T13 pregnancies. We have compared the methylation landscape of the trisomic placentas to the methylation landscape from normal placental DNA and to maternal blood cell DNA. Comparing trisomic placentas to normal placentas we identified 217 and 219 differentially methylated CpGs for CVS T18 and CVS T13, respectively (delta β>0.2, FDR<0.05), but only three differentially methylated CpGs for T21. However, the methylation differences was only modest (delta β<0.4), making them less suitable as diagnostic markers. Gene ontology enrichment analysis revealed that the gene set connected to theT18 differentially methylated CpGs was highly enriched for GO terms related to"DNA binding" and "transcription factor binding" coupled to the RNA polymerase II transcription. In the gene set connected to the T13 differentially methylated CpGs we found no significant enrichments.
Hatt, Lotte; Aagaard, Mads M.; Bach, Cathrine; Graakjaer, Jesper; Sommer, Steffen; Agerholm, Inge E.; Bojesen, Anders
2016-01-01
Methylation-based non-invasive prenatal testing of fetal aneuploidies is an alternative method that could possibly improve fetal aneuploidy diagnosis, especially for trisomy 13(T13) and trisomy 18(T18). Our aim was to study the methylation landscape in placenta DNA from trisomy 13, 18 and 21 pregnancies in an attempt to find trisomy–specific methylation differences better suited for non-invasive prenatal diagnosis. We have conducted high-resolution methylation specific bead chip microarray analyses assessing more than 450,000 CpGs analyzing placentas from 12 T21 pregnancies, 12 T18 pregnancies and 6 T13 pregnancies. We have compared the methylation landscape of the trisomic placentas to the methylation landscape from normal placental DNA and to maternal blood cell DNA. Comparing trisomic placentas to normal placentas we identified 217 and 219 differentially methylated CpGs for CVS T18 and CVS T13, respectively (delta β>0.2, FDR<0.05), but only three differentially methylated CpGs for T21. However, the methylation differences was only modest (delta β<0.4), making them less suitable as diagnostic markers. Gene ontology enrichment analysis revealed that the gene set connected to theT18 differentially methylated CpGs was highly enriched for GO terms related to”DNA binding” and “transcription factor binding” coupled to the RNA polymerase II transcription. In the gene set connected to the T13 differentially methylated CpGs we found no significant enrichments. PMID:27490343
Ingham, Victoria A; Jones, Christopher M; Pignatelli, Patricia; Balabanidou, Vasileia; Vontas, John; Wagstaff, Simon C; Moore, Jonathan D; Ranson, Hilary
2014-11-25
The elevated expression of enzymes with insecticide metabolism activity can lead to high levels of insecticide resistance in the malaria vector, Anopheles gambiae. In this study, adult female mosquitoes from an insecticide susceptible and resistant strain were dissected into four different body parts. RNA from each of these samples was used in microarray analysis to determine the enrichment patterns of the key detoxification gene families within the mosquito and to identify additional candidate insecticide resistance genes that may have been overlooked in previous experiments on whole organisms. A general enrichment in the transcription of genes from the four major detoxification gene families (carboxylesterases, glutathione transferases, UDP glucornyltransferases and cytochrome P450s) was observed in the midgut and malpighian tubules. Yet the subset of P450 genes that have previously been implicated in insecticide resistance in An gambiae, show a surprisingly varied profile of tissue enrichment, confirmed by qPCR and, for three candidates, by immunostaining. A stringent selection process was used to define a list of 105 genes that are significantly (p ≤0.001) over expressed in body parts from the resistant versus susceptible strain. Over half of these, including all the cytochrome P450s on this list, were identified in previous whole organism comparisons between the strains, but several new candidates were detected, notably from comparisons of the transcriptomes from dissected abdomen integuments. The use of RNA extracted from the whole organism to identify candidate insecticide resistance genes has a risk of missing candidates if key genes responsible for the phenotype have restricted expression within the body and/or are over expression only in certain tissues. However, as transcription of genes implicated in metabolic resistance to insecticides is not enriched in any one single organ, comparison of the transcriptome of individual dissected body parts cannot be recommended as a preferred means to identify new candidate insecticide resistant genes. Instead the rich data set on in vivo sites of transcription should be consulted when designing follow up qPCR validation steps, or for screening known candidates in field populations.
2013-01-01
Background Ginger (Zingiber officinale) and turmeric (Curcuma longa) accumulate important pharmacologically active metabolites at high levels in their rhizomes. Despite their importance, relatively little is known regarding gene expression in the rhizomes of ginger and turmeric. Results In order to identify rhizome-enriched genes and genes encoding specialized metabolism enzymes and pathway regulators, we evaluated an assembled collection of expressed sequence tags (ESTs) from eight different ginger and turmeric tissues. Comparisons to publicly available sorghum rhizome ESTs revealed a total of 777 gene transcripts expressed in ginger/turmeric and sorghum rhizomes but apparently absent from other tissues. The list of rhizome-specific transcripts was enriched for genes associated with regulation of tissue growth, development, and transcription. In particular, transcripts for ethylene response factors and AUX/IAA proteins appeared to accumulate in patterns mirroring results from previous studies regarding rhizome growth responses to exogenous applications of auxin and ethylene. Thus, these genes may play important roles in defining rhizome growth and development. Additional associations were made for ginger and turmeric rhizome-enriched MADS box transcription factors, their putative rhizome-enriched homologs in sorghum, and rhizomatous QTLs in rice. Additionally, analysis of both primary and specialized metabolism genes indicates that ginger and turmeric rhizomes are primarily devoted to the utilization of leaf supplied sucrose for the production and/or storage of specialized metabolites associated with the phenylpropanoid pathway and putative type III polyketide synthase gene products. This finding reinforces earlier hypotheses predicting roles of this enzyme class in the production of curcuminoids and gingerols. Conclusion A significant set of genes were found to be exclusively or preferentially expressed in the rhizome of ginger and turmeric. Specific transcription factors and other regulatory genes were found that were common to the two species and that are excellent candidates for involvement in rhizome growth, differentiation and development. Large classes of enzymes involved in specialized metabolism were also found to have apparent tissue-specific expression, suggesting that gene expression itself may play an important role in regulating metabolite production in these plants. PMID:23410187
Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy
2013-08-01
Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
Huang, Lei; Zhao, Shuangping; Frasor, Jonna M.; Dai, Yang
2011-01-01
Approximately half of estrogen receptor (ER) positive breast tumors will fail to respond to endocrine therapy. Here we used an integrative bioinformatics approach to analyze three gene expression profiling data sets from breast tumors in an attempt to uncover underlying mechanisms contributing to the development of resistance and potential therapeutic strategies to counteract these mechanisms. Genes that are differentially expressed in tamoxifen resistant vs. sensitive breast tumors were identified from three different publically available microarray datasets. These differentially expressed (DE) genes were analyzed using gene function and gene set enrichment and examined in intrinsic subtypes of breast tumors. The Connectivity Map analysis was utilized to link gene expression profiles of tamoxifen resistant tumors to small molecules and validation studies were carried out in a tamoxifen resistant cell line. Despite little overlap in genes that are differentially expressed in tamoxifen resistant vs. sensitive tumors, a high degree of functional similarity was observed among the three datasets. Tamoxifen resistant tumors displayed enriched expression of genes related to cell cycle and proliferation, as well as elevated activity of E2F transcription factors, and were highly correlated with a Luminal intrinsic subtype. A number of small molecules, including phenothiazines, were found that induced a gene signature in breast cancer cell lines opposite to that found in tamoxifen resistant vs. sensitive tumors and the ability of phenothiazines to down-regulate cyclin E2 and inhibit proliferation of tamoxifen resistant breast cancer cells was validated. Our findings demonstrate that an integrated bioinformatics approach to analyze gene expression profiles from multiple breast tumor datasets can identify important biological pathways and potentially novel therapeutic options for tamoxifen-resistant breast cancers. PMID:21789246
Weigt, S Samuel; Wang, Xiaoyan; Palchevskiy, Vyacheslav; Patel, Naman; Derhovanessian, Ariss; Shino, Michael Y; Sayah, David M; Lynch, Joseph P; Saggar, Rajan; Ross, David J; Kubak, Bernie M; Ardehali, Abbas; Palmer, Scott; Husain, Shahid; Belperio, John A
2018-06-01
Aspergillus colonization after lung transplant is associated with an increased risk of chronic lung allograft dysfunction (CLAD). We hypothesized that gene expression during Aspergillus colonization could provide clues to CLAD pathogenesis. We examined transcriptional profiles in 3- or 6-month surveillance bronchoalveolar lavage fluid cell pellets from recipients with Aspergillus fumigatus colonization (n = 12) and without colonization (n = 10). Among the Aspergillus colonized, we also explored profiles in those who developed CLAD (n = 6) or remained CLAD-free (n = 6). Transcription profiles were assayed with the HG-U133 Plus 2.0 microarray (Affymetrix). Differential gene expression was based on an absolute fold difference of 2.0 or greater and unadjusted P value less than 0.05. We used NIH Database for Annotation, Visualization and Integrated Discovery for functional analyses, with false discovery rates less than 5% considered significant. Aspergillus colonization was associated with differential expression of 489 probe sets, representing 404 unique genes. "Defense response" genes and genes in the "cytokine-cytokine receptor" Kyoto Encyclopedia of Genes and Genomes pathway were notably enriched in this list. Among Aspergillus colonized patients, CLAD development was associated with differential expression of 69 probe sets, representing 64 unique genes. This list was enriched for genes involved in "immune response" and "response to wounding", among others. Notably, both chitinase 3-like-1 and chitotriosidase were associated with progression to CLAD. Aspergillus colonization is associated with gene expression profiles related to defense responses including cytokine signaling. Epithelial wounding, as well as the innate immune response to chitin that is present in the fungal cell wall, may be key in the link between Aspergillus colonization and CLAD.
Identifying prognostic signature in ovarian cancer using DirGenerank
Wang, Jian-Yong; Chen, Ling-Ling; Zhou, Xiong-Hui
2017-01-01
Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes’ correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients. PMID:28615526
Variations in the Intragene Methylation Profiles Hallmark Induced Pluripotency
Druzhkov, Pavel; Zolotykh, Nikolay; Meyerov, Iosif; Alsaedi, Ahmed; Shutova, Maria; Ivanchenko, Mikhail; Zaikin, Alexey
2015-01-01
We demonstrate the potential of differentiating embryonic and induced pluripotent stem cells by the regularized linear and decision tree machine learning classification algorithms, based on a number of intragene methylation measures. The resulting average accuracy of classification has been proven to be above 95%, which overcomes the earlier achievements. We propose a constructive and transparent method of feature selection based on classifier accuracy. Enrichment analysis reveals statistically meaningful presence of stemness group and cancer discriminating genes among the selected best classifying features. These findings stimulate the further research on the functional consequences of these differences in methylation patterns. The presented approach can be broadly used to discriminate the cells of different phenotype or in different state by their methylation profiles, identify groups of genes constituting multifeature classifiers, and assess enrichment of these groups by the sets of genes with a functionality of interest. PMID:26618180
A polygenic burden of rare disruptive mutations in schizophrenia
Purcell, Shaun M.; Moran, Jennifer L.; Fromer, Menachem; Ruderfer, Douglas; Solovieff, Nadia; Roussos, Panos; O’Dushlaine, Colm; Chambert, Kimberly; Bergen, Sarah E.; Kähler, Anna; Duncan, Laramie; Stahl, Eli; Genovese, Giulio; Fernández, Esperanza; Collins, Mark O; Komiyama, Noboru H.; Choudhary, Jyoti S.; Magnusson, Patrik K. E.; Banks, Eric; Shakir, Khalid; Garimella, Kiran; Fennell, Tim; de Pristo, Mark; Grant, Seth G.N.; Haggarty, Stephen; Gabriel, Stacey; Scolnick, Edward M.; Lander, Eric S.; Hultman, Christina; Sullivan, Patrick F.; McCarroll, Steven A.; Sklar, Pamela
2014-01-01
By analyzing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we have demonstrated a polygenic burden primarily arising from rare (<1/10,000), disruptive mutations distributed across many genes. Especially enriched genesets included the voltage-gated calcium ion channel and the signaling complex formed by the activity-regulated cytoskeleton-associated (ARC) scaffold protein of the postsynaptic density (PSD), sets previously implicated by genome-wide association studies (GWAS) and copy-number variation (CNV) studies. Similar to reports in autism, targets of the fragile × mental retardation protein (FMRP, product of FMR1) were enriched for case mutations. No individual gene-based test achieved significance after correction for multiple testing and we did not detect any alleles of moderately low frequency (~0.5-1%) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene mapping paradigms in neuropsychiatric disease. PMID:24463508
Pimentel, Harold; Parra, Marilyn; Gee, Sherry L.; ...
2015-11-03
Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentallydynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ~50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclearlocalized. Splicemore » site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. Finally, we conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.« less
Taylor, Brandie D; Zheng, Xiaojing; Darville, Toni; Zhong, Wujuan; Konganti, Kranti; Abiodun-Ojo, Olayinka; Ness, Roberta B; O'Connell, Catherine M; Haggerty, Catherine L
2017-01-01
Ideal management of sexually transmitted infections (STI) may require risk markers for pathology or vaccine development. Previously, we identified common genetic variants associated with chlamydial pelvic inflammatory disease (PID) and reduced fecundity. As this explains only a proportion of the long-term morbidity risk, we used whole-exome sequencing to identify biological pathways that may be associated with STI-related infertility. We obtained stored DNA from 43 non-Hispanic black women with PID from the PID Evaluation and Clinical Health Study. Infertility was assessed at a mean of 84 months. Principal component analysis revealed no population stratification. Potential covariates did not significantly differ between groups. Sequencing kernel association test was used to examine associations between aggregates of variants on a single gene and infertility. The results from the sequencing kernel association test were used to choose "focus genes" (P < 0.01; n = 150) for subsequent Ingenuity Pathway Analysis to identify "gene sets" that are enriched in biologically relevant pathways. Pathway analysis revealed that focus genes were enriched in canonical pathways including, IL-1 signaling, P2Y purinergic receptor signaling, and bone morphogenic protein signaling. Focus genes were enriched in pathways that impact innate and adaptive immunity, protein kinase A activity, cellular growth, and DNA repair. These may alter host resistance or immunopathology after infection. Targeted sequencing of biological pathways identified in this study may provide insight into STI-related infertility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pimentel, Harold; Parra, Marilyn; Gee, Sherry L.
Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentallydynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ~50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclearlocalized. Splicemore » site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. Finally, we conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.« less
The Transcriptional Response to Nonself in the Fungus Podospora anserina
Bidard, Frédérique; Clavé, Corinne; Saupe, Sven J.
2013-01-01
In fungi, heterokaryon incompatibility is a nonself recognition process occurring when filaments of different isolates of the same species fuse. Compatibility is controlled by so-called het loci and fusion of strains of unlike het genotype triggers a complex incompatibility reaction that leads to the death of the fusion cell. Herein, we analyze the transcriptional changes during the incompatibility reaction in Podospora anserina. The incompatibility response was found to be associated with a massive transcriptional reprogramming: 2231 genes were up-regulated by a factor 2 or more during incompatibility. In turn, 2441 genes were down-regulated. HET, NACHT, and HeLo domains previously found to be involved in the control of heterokaryon incompatibility were enriched in the up-regulated gene set. In addition, incompatibility was characterized by an up-regulation of proteolytic and other hydrolytic activities, of secondary metabolism clusters and toxins and effector-like proteins. The up-regulated set was found to be enriched for proteins lacking orthologs in other species and chromosomal distribution of the up-regulated genes was uneven with up-regulated genes residing preferentially in genomic islands and on chromosomes IV and V. There was a significant overlap between regulated genes during incompatibility in P. anserina and Neurospora crassa, indicating similarities in the incompatibility responses in these two species. Globally, this study illustrates that the expression changes occurring during cell fusion incompatibility in P. anserina are in several aspects reminiscent of those described in host-pathogen or symbiotic interactions in other fungal species. PMID:23589521
Provenzano, Paolo P; Inman, David R; Eliceiri, Kevin W; Beggs, Hilary E; Keely, Patricia J
2008-11-01
Focal adhesion kinase (FAK) is a central regulator of the focal adhesion, influencing cell proliferation, survival, and migration. Despite evidence demonstrating FAK overexpression in human cancer, its role in tumor initiation and progression is not well understood. Using Cre/LoxP technology to specifically knockout FAK in the mammary epithelium, we showed that FAK is not required for tumor initiation but is required for tumor progression. The mechanistic underpinnings of these results suggested that FAK regulates clinically relevant gene signatures and multiple signaling complexes associated with tumor progression and metastasis, such as Src, ERK, and p130Cas. Furthermore, a systems-level analysis identified FAK as a major regulator of the tumor transcriptome, influencing genes associated with adhesion and growth factor signaling pathways, and their cross talk. Additionally, FAK was shown to down-regulate the expression of clinically relevant proliferation- and metastasis-associated gene signatures, as well as an enriched group of genes associated with the G(2) and G(2)/M phases of the cell cycle. Computational analysis of transcription factor-binding sites within ontology-enriched or clustered gene sets suggested that the differentially expressed proliferation- and metastasis-associated genes in FAK-null cells were regulated through a common set of transcription factors, including p53. Therefore, FAK acts as a primary node in the activated signaling network in transformed motile cells and is a prime candidate for novel therapeutic interventions to treat aggressive human breast cancers.
Rager, Julia E.; Miller, Sloane; Tulenko, Samantha E.; Smeester, Lisa; Ray, Paul D.; Yosim, Andrew; Currier, Jenna M.; Ishida, María C.; González-Horta, Maria del Carmen; Sánchez-Ramírez, Blanca; Ballinas-Casarrubias, Lourdes; Gutiérrez-Torres, Daniela S.; Drobná, Zuzana; Del Razo, Luz M.; García-Vargas, Gonzalo G.; Kim, William Y.; Zhou, Yi-Hui; Wright, Fred A.; Stýblo, Miroslav; Fry, Rebecca C.
2016-01-01
There is strong epidemiologic evidence linking chronic exposure to inorganic arsenic (iAs) to a myriad of adverse health effects, including cancer of the bladder. The present study set out to identify DNA methylation patterns associated with iAs and its metabolites in exfoliated urothelial cells (EUCs) that originate primarily from the urinary bladder, one of the targets of arsenic (As)-induced carcinogenesis. Genome-wide, gene-specific promoter DNA methylation levels were assessed in EUCs from 46 residents of Chihuahua, Mexico, and the relationship was examined between promoter methylation profiles and the intracellular concentrations of total As (tAs) and As species. A set of 49 differentially methylated genes was identified with increased promoter methylation associated with EUC tAs, iAs, and/or monomethylated As (MMAs) enriched for their roles in metabolic disease and cancer. Notably, no genes had differential methylation associated with EUC dimethylated As (DMAs), suggesting that DMAs may influence DNA methylation-mediated urothelial cell responses to a lesser extent than iAs or MMAs. Further analysis showed that 22 of the 49 As-associated genes (45%) are also differentially methylated in bladder cancer tissue identified using The Cancer Genome Atlas repository. Both the As- and cancer-associated genes are enriched for the binding sites of common transcription factors known to play roles in carcinogenesis, demonstrating a novel potential mechanistic link between iAs exposure and bladder cancer. PMID:26039340
Blatti, Charles; Sinha, Saurabh
2016-07-15
Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren's Syndrome.
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J; Gillespie, Colin S; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T; Emery, Paul; Lanyon, Peter; Hunter, John A; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.
The evolution of CpG density and lifespan in conserved primate and mammalian promoters
McLain, Adam T.
2018-01-01
Gene promoters are evolutionarily conserved across holozoans and enriched in CpG sites, the target for DNA methylation. As animals age, the epigenetic pattern of DNA methylation degrades, with highly methylated CpG sites gradually becoming demethylated while CpG islands increase in methylation. Across vertebrates, aging is a trait that varies among species. We used this variation to determine whether promoter CpG density correlates with species’ maximum lifespan. Human promoter sequences were used to identify conserved regions in 131 mammals and a subset of 28 primate genomes. We identified approximately 1000 gene promoters (5% of the total), that significantly correlated CpG density with lifespan. The correlations were performed via the phylogenetic least squares method to account for trait similarity by common descent using phylogenetic branch lengths. Gene set enrichment analysis revealed no significantly enriched pathways or processes, consistent with the hypothesis that aging is not under positive selection. However, within both mammals and primates, 95% of the promoters showed a positive correlation between increasing CpG density and species lifespan, and two thirds were shared between the primate subset and mammalian datasets. Thus, these genes may require greater buffering capacity against age-related dysregulation of DNA methylation in longer-lived species. PMID:29661983
Veyrieras, Jean-Baptiste; Gaffney, Daniel J.; Pickrell, Joseph K.; Gilad, Yoav; Stephens, Matthew; Pritchard, Jonathan K.
2012-01-01
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3′ untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation. PMID:22359548
Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino
2014-01-30
The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Lucca, Liliana E.; Lerner, Benjamin A.; Gunel, Murat; Raddassi, Khadir; Coric, Vlad; Hafler, David A.; Love, J. Christopher
2017-01-01
Immune checkpoint inhibitors targeting programmed cell death protein 1 (PD-1) have been highly successful in the treatment of cancer. While PD-1 expression has been widely investigated, its role in CD4+ effector T cells in the setting of health and cancer remains unclear, particularly in the setting of glioblastoma multiforme (GBM), the most aggressive and common form of brain cancer. We examined the functional and molecular features of PD-1+CD4+CD25—CD127+Foxp3—effector cells in healthy subjects and in patients with GBM. In healthy subjects, we found that PD-1+CD4+ effector cells are dysfunctional: they do not proliferate but can secrete large quantities of IFNγ. Strikingly, blocking antibodies against PD-1 did not rescue proliferation. RNA-sequencing revealed features of exhaustion in PD-1+ CD4 effectors. In the context of GBM, tumors were enriched in PD-1+ CD4+ effectors that were similarly dysfunctional and unable to proliferate. Furthermore, we found enrichment of PD-1+TIM-3+ CD4+ effectors in tumors, suggesting that co-blockade of PD-1 and TIM-3 in GBM may be therapeutically beneficial. RNA-sequencing of blood and tumors from GBM patients revealed distinct differences between CD4+ effectors from both compartments with enrichment in multiple gene sets from tumor infiltrating PD-1—CD4+ effectors cells. Enrichment of these gene sets in tumor suggests a more metabolically active cell state with signaling through other co-receptors. PD-1 expression on CD4 cells identifies a dysfunctional subset refractory to rescue with PD-1 blocking antibodies, suggesting that the influence of immune checkpoint inhibitors may involve recovery of function in the PD-1—CD4+ T cell compartment. Additionally, co-blockade of PD-1 and TIM-3 in GBM may be therapeutically beneficial. PMID:28880903
Gupta, Mayetri; Cheung, Ching-Lung; Hsu, Yi-Hsiang; Demissie, Serkalem; Cupples, L Adrienne; Kiel, Douglas P; Karasik, David
2011-06-01
Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated phenotypes, as well as in deciphering genetic and phenotypic modularity underlying osteoporosis risk. Copyright © 2011 American Society for Bone and Mineral Research.
Lira-Albarrán, Saúl; Durand, Marta; Barrera, David; Vega, Claudia; Becerra, Rocio García; Díaz, Lorenza; García-Quiroz, Janice; Rangel, Claudia; Larrea, Fernando
2018-04-27
In order to get further information on the effects of ulipristal acetate (UPA) upon the process of decidualization of endometrium, a functional analysis of the differentially expressed genes in endometrium (DEG) from UPA treated-versus control-cycles of normal ovulatory women was performed. A list of 1183 endometrial DEG, from a previously published study by our group, was submitted to gene ontology, gene enrichment and ingenuity pathway analyses (IPA). This functional analysis showed that decidualization was a biological process overrepresented. Gene set enrichment analysis identified LIF, PRL, IL15 and STAT3 among the most down-regulated genes within the JAK STAT canonical pathway. IPA showed that decidualization of uterus was a bio-function predicted as inhibited by UPA. The results demonstrated that this selective progesterone receptor modulator, when administered during the periovulatory phase of the menstrual cycle, may affect the molecular mechanisms leading to endometrial decidualization in response to progesterone during the period of maximum embryo receptivity. Copyright © 2018 Elsevier B.V. All rights reserved.
Characterization of embryo-specific genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sung, Z.R.
1988-01-01
The objective of the proposed research is to characterize the structure and function of a set of genes whose expression is regulated in embryo development, and that are not expressed in mature tissues -- the embryogenic genes. In order to isolate these genes, we immunized a rabbit with total extracts of somatic embryos of carrot, and enriched the anti-embryo antiserum for antibodies reacting with extracts of carrot somatic embryos. Using this enriched antiserum, we screened a lambda gt11 cDNA library constructed from embryo poly A{sup +} RNA, and isolated 10 cDNA clones that detect embryogenic mRNAs. Monospecific antibodies have beenmore » purified for proteins corresponding to each cDNA sequence. Four cDNA clones were further characterized in terms of the expression of their corresponding mRNA and protein in somatic embryos of carrot. In some cases, comparable gene sequences or products have been detected in somatic and zygotic embryos of other plant species. The characteristics of these 4 cDNA clones -- clone Nos. 8, 59, and 66 -- are described in this report. 3 figs.« less
Pendse, Salil N; Maertens, Alexandra; Rosenberg, Michael; Roy, Dipanwita; Fasani, Rick A; Vantangoli, Marguerite M; Madnick, Samantha J; Boekelheide, Kim; Fornace, Albert J; Odwin, Shelly-Ann; Yager, James D; Hartung, Thomas; Andersen, Melvin E; McMullen, Patrick D
2017-04-01
The twenty-first century vision for toxicology involves a transition away from high-dose animal studies to in vitro and computational models (NRC in Toxicity testing in the 21st century: a vision and a strategy, The National Academies Press, Washington, DC, 2007). This transition requires mapping pathways of toxicity by understanding how in vitro systems respond to chemical perturbation. Uncovering transcription factors/signaling networks responsible for gene expression patterns is essential for defining pathways of toxicity, and ultimately, for determining the chemical modes of action through which a toxicant acts. Traditionally, transcription factor identification is achieved via chromatin immunoprecipitation studies and summarized by calculating which transcription factors are statistically associated with up- and downregulated genes. These lists are commonly determined via statistical or fold-change cutoffs, a procedure that is sensitive to statistical power and may not be as useful for determining transcription factor associations. To move away from an arbitrary statistical or fold-change-based cutoff, we developed, in the context of the Mapping the Human Toxome project, an enrichment paradigm called information-dependent enrichment analysis (IDEA) to guide identification of the transcription factor network. We used a test case of activation in MCF-7 cells by 17β estradiol (E2). Using this new approach, we established a time course for transcriptional and functional responses to E2. ERα and ERβ were associated with short-term transcriptional changes in response to E2. Sustained exposure led to recruitment of additional transcription factors and alteration of cell cycle machinery. TFAP2C and SOX2 were the transcription factors most highly correlated with dose. E2F7, E2F1, and Foxm1, which are involved in cell proliferation, were enriched only at 24 h. IDEA should be useful for identifying candidate pathways of toxicity. IDEA outperforms gene set enrichment analysis (GSEA) and provides similar results to weighted gene correlation network analysis, a platform that helps to identify genes not annotated to pathways.
Lavallée-Adam, Mathieu; Yates, John R
2016-03-24
PSEA-Quant analyzes quantitative mass spectrometry-based proteomics datasets to identify enrichments of annotations contained in repositories such as the Gene Ontology and Molecular Signature databases. It allows users to identify the annotations that are significantly enriched for reproducibly quantified high abundance proteins. PSEA-Quant is available on the Web and as a command-line tool. It is compatible with all label-free and isotopic labeling-based quantitative proteomics methods. This protocol describes how to use PSEA-Quant and interpret its output. The importance of each parameter as well as troubleshooting approaches are also discussed. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
PathScore: a web tool for identifying altered pathways in cancer data.
Gaffney, Stephen G; Townsend, Jeffrey P
2016-12-01
PathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects. Web application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at: github.com/sggaffney/pathscore with a GPLv3 license. stephen.gaffney@yale.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Moreno, Marta; Fernández, Virginia; Monllau, Josep M.; Borrell, Víctor; Lerin, Carles; de la Iglesia, Núria
2015-01-01
Summary Neural stem cells (NSCs) reside in a hypoxic microenvironment within the brain. However, the crucial transcription factors (TFs) that regulate NSC biology under physiologic hypoxia are poorly understood. Here we have performed gene set enrichment analysis (GSEA) of microarray datasets from hypoxic versus normoxic NSCs with the aim of identifying pathways and TFs that are activated under oxygen concentrations mimicking normal brain tissue microenvironment. Integration of TF target (TFT) and pathway enrichment analysis identified the calcium-regulated TF NFATc4 as a major candidate to regulate hypoxic NSC functions. Nfatc4 expression was coordinately upregulated by top hypoxia-activated TFs, while NFATc4 target genes were enriched in hypoxic NSCs. Loss-of-function analyses further revealed that the calcineurin-NFATc4 signaling axis acts as a major regulator of NSC self-renewal and proliferation in vitro and in vivo by promoting the expression of TFs, including Id2, that contribute to the maintenance of the NSC state. PMID:26235896
Spaceflight Activates Autophagy Programs and the Proteasome in Mouse Liver
Blaber, Elizabeth A.; Pecaut, Michael J.
2017-01-01
Increased oxidative stress is an unavoidable consequence of exposure to the space environment. Our previous studies showed that mice exposed to space for 13.5 days had decreased glutathione levels, suggesting impairments in oxidative defense. Here we performed unbiased, unsupervised and integrated multi-‘omic analyses of metabolomic and transcriptomic datasets from mice flown aboard the Space Shuttle Atlantis. Enrichment analyses of metabolite and gene sets showed significant changes in osmolyte concentrations and pathways related to glycerophospholipid and sphingolipid metabolism, likely consequences of relative dehydration of the spaceflight mice. However, we also found increased enrichment of aminoacyl-tRNA biosynthesis and purine metabolic pathways, concomitant with enrichment of genes associated with autophagy and the ubiquitin-proteasome. When taken together with a downregulation in nuclear factor (erythroid-derived 2)-like 2-mediated signaling, our analyses suggest that decreased hepatic oxidative defense may lead to aberrant tRNA post-translational processing, induction of degradation programs and senescence-associated mitochondrial dysfunction in response to the spaceflight environment. PMID:28953266
Spaceflight Activates Autophagy Programs and the Proteasome in Mouse Liver.
Blaber, Elizabeth A; Pecaut, Michael J; Jonscher, Karen R
2017-09-27
Increased oxidative stress is an unavoidable consequence of exposure to the space environment. Our previous studies showed that mice exposed to space for 13.5 days had decreased glutathione levels, suggesting impairments in oxidative defense. Here we performed unbiased, unsupervised and integrated multi-'omic analyses of metabolomic and transcriptomic datasets from mice flown aboard the Space Shuttle Atlantis. Enrichment analyses of metabolite and gene sets showed significant changes in osmolyte concentrations and pathways related to glycerophospholipid and sphingolipid metabolism, likely consequences of relative dehydration of the spaceflight mice. However, we also found increased enrichment of aminoacyl-tRNA biosynthesis and purine metabolic pathways, concomitant with enrichment of genes associated with autophagy and the ubiquitin-proteasome. When taken together with a downregulation in nuclear factor (erythroid-derived 2)-like 2-mediated signaling, our analyses suggest that decreased hepatic oxidative defense may lead to aberrant tRNA post-translational processing, induction of degradation programs and senescence-associated mitochondrial dysfunction in response to the spaceflight environment.
High resolution array CGH and gene expression profiling of alveolar soft part sarcoma
Selvarajah, Shamini; Pyne, Saumyadipta; Chen, Eleanor; Sompallae, Ramakrishna; Ligon, Azra H.; Nielsen, Gunnlaugur P.; Dranoff, Glenn; Stack, Edward; Loda, Massimo; Flavin, Richard
2014-01-01
Purpose Alveolar soft part sarcoma (ASPS) is a soft tissue sarcoma with poor prognosis, and little molecular evidence for its origin, initiation and progression. The aim of this study was to elucidate candidate molecular pathways involved in tumor pathogenesis. Experimental Design We employed high-throughput array comparative genomic hybridization and cDNA-Mediated Annealing, Selection, Ligation, and Extension Assay to profile the genomic and expression signatures of primary and metastatic ASPS from 17 tumors derived from 11 patients. We used an integrative bioinformatics approach to elucidate the molecular pathways associated with ASPS progression. Fluorescence in situ hybridization was performed to validate the presence of the t(X;17)(p11.2;q25) ASPL-TFE3 fusion and hence confirm the aCGH observations. Results FISH analysis identified the ASPL-TFE3 fusion in all cases. ArrayCGH revealed a higher number of numerical aberrations in metastatic tumors relative to primaries, but failed to identify consistent alterations in either group. Gene expression analysis highlighted 1,063 genes which were differentially expressed between the two groups. Gene set enrichment analysis identified 16 enriched gene sets (p < 0.1) associated with differentially expressed genes. Notable among these were several stem cell gene expression signatures and pathways related to differentiation. In particular, the paired box transcription factor PAX6 was up-regulated in the primary tumors, along with several genes whose mouse orthologs have previously been implicated in Pax6-DNA binding during neural stem cell differentiation. Conclusion In addition to suggesting a tentative neural line of differentiation for ASPS, these results implicate transcriptional deregulation from fusion genes in the pathogenesis of ASPS. PMID:24493828
PyPathway: Python Package for Biological Network Analysis and Visualization.
Xu, Yang; Luo, Xiao-Chun
2018-05-01
Life science studies represent one of the biggest generators of large data sets, mainly because of rapid sequencing technological advances. Biological networks including interactive networks and human curated pathways are essential to understand these high-throughput data sets. Biological network analysis offers a method to explore systematically not only the molecular complexity of a particular disease but also the molecular relationships among apparently distinct phenotypes. Currently, several packages for Python community have been developed, such as BioPython and Goatools. However, tools to perform comprehensive network analysis and visualization are still needed. Here, we have developed PyPathway, an extensible free and open source Python package for functional enrichment analysis, network modeling, and network visualization. The network process module supports various interaction network and pathway databases such as Reactome, WikiPathway, STRING, and BioGRID. The network analysis module implements overrepresentation analysis, gene set enrichment analysis, network-based enrichment, and de novo network modeling. Finally, the visualization and data publishing modules enable users to share their analysis by using an easy web application. For package availability, see the first Reference.
Lee, Min-Young; Yu, Ji Hea; Kim, Ji Yeon; Seo, Jung Hwa; Park, Eun Sook; Kim, Chul Hoon; Kim, Hyongbum; Cho, Sung-Rae
2013-01-01
Housing animals in an enriched environment (EE) enhances behavioral function. However, the mechanism underlying this EE-mediated functional improvement and the resultant changes in gene expression have yet to be elucidated. We attempted to investigate the underlying mechanisms associated with long-term exposure to an EE by evaluating gene expression patterns. We housed 6-week-old CD-1 (ICR) mice in standard cages or an EE comprising a running wheel, novel objects, and social interaction for 2 months. Motor and cognitive performances were evaluated using the rotarod test and passive avoidance test, and gene expression profile was investigated in the cerebral hemispheres using microarray and gene set enrichment analysis (GSEA). In behavioral assessment, an EE significantly enhanced rotarod performance and short-term working memory. Microarray analysis revealed that genes associated with neuronal activity were significantly altered by an EE. GSEA showed that genes involved in synaptic transmission and postsynaptic signal transduction were globally upregulated, whereas those associated with reuptake by presynaptic neurotransmitter transporters were downregulated. In particular, both microarray and GSEA demonstrated that EE exposure increased opioid signaling, acetylcholine release cycle, and postsynaptic neurotransmitter receptors but decreased Na+ / Cl- -dependent neurotransmitter transporters, including dopamine transporter Slc6a3 in the brain. Western blotting confirmed that SLC6A3, DARPP32 (PPP1R1B), and P2RY12 were largely altered in a region-specific manner. An EE enhanced motor and cognitive function through the alteration of synaptic activity-regulating genes, improving the efficient use of neurotransmitters and synaptic plasticity by the upregulation of genes associated with postsynaptic receptor activity and downregulation of presynaptic reuptake by neurotransmitter transporters.
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
Casucci, Monica; Falcone, Laura; Camisa, Barbara; Norelli, Margherita; Porcellini, Simona; Stornaiuolo, Anna; Ciceri, Fabio; Traversari, Catia; Bordignon, Claudio; Bonini, Chiara; Bondanza, Attilio
2018-01-01
Chimeric antigen receptor (CAR)-T cell immunotherapy is at the forefront of innovative cancer therapeutics. However, lack of standardization of cellular products within the same clinical trial and lack of harmonization between different trials have hindered the clear identification of efficacy and safety determinants that should be unveiled in order to advance the field. With the aim of facilitating the isolation and in vivo tracking of CAR-T cells, we here propose the inclusion within the CAR molecule of a novel extracellular spacer based on the low-affinity nerve-growth-factor receptor (NGFR). We screened four different spacer designs using as target antigen the CD44 isoform variant 6 (CD44v6). We successfully generated NGFR-spaced CD44v6 CAR-T cells that could be efficiently enriched with clinical-grade immuno-magnetic beads without negative consequences on subsequent expansion, immuno-phenotype, in vitro antitumor reactivity, and conditional ablation when co-expressing a suicide gene. Most importantly, these cells could be tracked with anti-NGFR monoclonal antibodies in NSG mice, where they expanded, persisted, and exerted potent antitumor effects against both high leukemia and myeloma burdens. Similar results were obtained with NGFR-enriched CAR-T cells specific for CD19 or CEA, suggesting the universality of this strategy. In conclusion, we have demonstrated that the incorporation of the NGFR marker gene within the CAR sequence allows for a single molecule to simultaneously work as a therapeutic and selection/tracking gene. Looking ahead, NGFR spacer enrichment might allow good manufacturing procedures-manufacturing of standardized CAR-T cell products with high therapeutic potential, which could be harmonized in different clinical trials and used in combination with a suicide gene for future application in the allogeneic setting. PMID:29619024
Transcriptomics of cortical gray matter thickness decline during normal aging
Kochunov, P; Charlesworth, J; Winkler, A; Hong, LE; Nichols, T; Curran, JE; Sprooten, E; Jahanshad, N; Thompson, PM; Johnson, MP; Kent, JW; Landman, BA; Mitchell, B; Cole, SA; Dyer, TD; Moses, EK; Goring, HHH; Almasy, L; Duggirala, R; Olvera, RL; Glahn, DC; Blangero, J
2013-01-01
Introduction We performed a whole-transcriptome correlation analysis, followed by the pathway enrichment and testing of innate immune response pathways analyses to evaluate the hypothesis that transcriptional activity can predict cortical gray matter thickness (GMT) variability during normal cerebral aging Methods Transcriptome and GMT data were availabe for 379 individuals (age range=28–85) community-dwelling members of large extended Mexican-American families. Collection of transcriptome data preceded that of neuroimaging data by 17 years. Genome-wide gene transcriptome data consisted of 20,413 heritable lymphocytes-based transcripts. GMT measurements were performed from high-resolution (isotropic 800µm) T1-weighted MRI. Transcriptome-wide and pathway enrichment analysis was used to classify genes correlated with GMT. Transcripts for sixty genes from seven innate immune pathways were tested as specific predictors of GMT variability. Results Transcripts for eight genes (IGFBP3, LRRN3, CRIP2, SCD, IDS, TCF4, GATA3, HN1) passed the transcriptome-wide significance threshold. Four orthogonal factors extracted from this set predicted 31.9% of the variability in the whole-brain and between 23.4 and 35% of regional GMT measurements. Pathway enrichment analysis identified six functional categories including cellular proliferation, aggregation, differentiation, viral infection, and metabolism. The integrin signaling pathway was significantly (p<10−6) enriched with GMT. Finally, three innate immune pathways (complement signaling, toll-receptors and scavenger and immunoglobulins) were significantly associated with GMT. Conclusion Expression activity for the genes that regulate cellular proliferation, adhesion, differentiation and inflammation can explain a significant proportion of individual variability in cortical GMT. Our findings suggest that normal cerebral aging is the product of a progressive decline in regenerative capacity and increased neuroinflammation. PMID:23707588
Transcriptomics of cortical gray matter thickness decline during normal aging.
Kochunov, P; Charlesworth, J; Winkler, A; Hong, L E; Nichols, T E; Curran, J E; Sprooten, E; Jahanshad, N; Thompson, P M; Johnson, M P; Kent, J W; Landman, B A; Mitchell, B; Cole, S A; Dyer, T D; Moses, E K; Goring, H H H; Almasy, L; Duggirala, R; Olvera, R L; Glahn, D C; Blangero, J
2013-11-15
We performed a whole-transcriptome correlation analysis, followed by the pathway enrichment and testing of innate immune response pathway analyses to evaluate the hypothesis that transcriptional activity can predict cortical gray matter thickness (GMT) variability during normal cerebral aging. Transcriptome and GMT data were available for 379 individuals (age range=28-85) community-dwelling members of large extended Mexican American families. Collection of transcriptome data preceded that of neuroimaging data by 17 years. Genome-wide gene transcriptome data consisted of 20,413 heritable lymphocytes-based transcripts. GMT measurements were performed from high-resolution (isotropic 800 μm) T1-weighted MRI. Transcriptome-wide and pathway enrichment analysis was used to classify genes correlated with GMT. Transcripts for sixty genes from seven innate immune pathways were tested as specific predictors of GMT variability. Transcripts for eight genes (IGFBP3, LRRN3, CRIP2, SCD, IDS, TCF4, GATA3, and HN1) passed the transcriptome-wide significance threshold. Four orthogonal factors extracted from this set predicted 31.9% of the variability in the whole-brain and between 23.4 and 35% of regional GMT measurements. Pathway enrichment analysis identified six functional categories including cellular proliferation, aggregation, differentiation, viral infection, and metabolism. The integrin signaling pathway was significantly (p<10(-6)) enriched with GMT. Finally, three innate immune pathways (complement signaling, toll-receptors and scavenger and immunoglobulins) were significantly associated with GMT. Expression activity for the genes that regulate cellular proliferation, adhesion, differentiation and inflammation can explain a significant proportion of individual variability in cortical GMT. Our findings suggest that normal cerebral aging is the product of a progressive decline in regenerative capacity and increased neuroinflammation. Copyright © 2013 Elsevier Inc. All rights reserved.
Sun, Zhengda; Wang, Chih-Yang; Lawson, Devon A; Kwek, Serena; Velozo, Hugo Gonzalez; Owyong, Mark; Lai, Ming-Derg; Fong, Lawrence; Wilson, Mark; Su, Hua; Werb, Zena; Cooke, Daniel L
2018-02-16
Tumor endothelial cells (TEC) play an indispensible role in tumor growth and metastasis although much of the detailed mechanism still remains elusive. In this study we characterized and compared the global gene expression profiles of TECs and control ECs isolated from human breast cancerous tissues and reduction mammoplasty tissues respectively by single cell RNA sequencing (scRNA-seq). Based on the qualified scRNA-seq libraries that we made, we found that 1302 genes were differentially expressed between these two EC phenotypes. Both principal component analysis (PCA) and heat map-based hierarchical clustering separated the cancerous versus control ECs as two distinctive clusters, and MetaCore disease biomarker analysis indicated that these differentially expressed genes are highly correlated with breast neoplasm diseases. Gene Set Enrichment Analysis software (GSEA) enriched these genes to extracellular matrix (ECM) signal pathways and highlighted 127 ECM-associated genes. External validation verified some of these ECM-associated genes are not only generally overexpressed in various cancer tissues but also specifically overexpressed in colorectal cancer ECs and lymphoma ECs. In conclusion, our data demonstrated that ECM-associated genes play pivotal roles in breast cancer EC biology and some of them could serve as potential TEC biomarkers for various cancers.
Transcriptomic analysis of instinctive and learned reward-related behaviors in honey bees
Naeger, Nicholas L.
2016-01-01
ABSTRACT We used transcriptomics to compare instinctive and learned, reward-based honey bee behaviors with similar spatio-temporal components: mating flights by males (drones) and time-trained foraging flights by females (workers), respectively. Genome-wide gene expression profiling via RNA sequencing was performed on the mushroom bodies, a region of the brain known for multi-modal sensory integration and responsive to various types of reward. Differentially expressed genes (DEGs) associated with the onset of mating (623 genes) were enriched for the gene ontology (GO) categories of Transcription, Unfolded Protein Binding, Post-embryonic Development, and Neuron Differentiation. DEGs associated with the onset of foraging (473) were enriched for Lipid Transport, Regulation of Programmed Cell Death, and Actin Cytoskeleton Organization. These results demonstrate that there are fundamental molecular differences between similar instinctive and learned behaviors. In addition, there were 166 genes with strong similarities in expression across the two behaviors – a statistically significant overlap in gene expression, also seen in Weighted Gene Co-Expression Network Analysis. This finding indicates that similar instinctive and learned behaviors also share common molecular architecture. This common set of DEGs was enriched for Regulation of RNA Metabolic Process, Transcription Factor Activity, and Response to Ecdysone. These findings provide a starting point for better understanding the relationship between instincts and learned behaviors. In addition, because bees collect food for their colony rather than for themselves, these results also support the idea that altruistic behavior relies, in part, on elements of brain reward systems associated with selfish behavior. PMID:27852762
Good, Benjamin M; Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I
2014-07-29
Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.
Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D
2017-12-03
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Artificial neural network classifier predicts neuroblastoma patients' outcome.
Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi
2016-11-08
More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic status of the tumor in patients with poor prognosis. We developed a robust classifier predicting neuroblastoma patients' outcome with a very low error rate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the potential of using hypoxia as target for neuroblastoma treatment.
Transcriptome analysis of trigeminal ganglia following masseter muscle inflammation in rats
Park, Jennifer; Asgar, Jamila; Ro, Jin Y.
2016-01-01
Background Chronic pain in masticatory muscles is a major medical problem. Although mechanisms underlying persistent pain in masticatory muscles are not fully understood, sensitization of nociceptive primary afferents following muscle inflammation or injury contributes to muscle hyperalgesia. It is well known that craniofacial muscle injury or inflammation induces regulation of multiple genes in trigeminal ganglia, which is associated with muscle hyperalgesia. However, overall transcriptional profiles within trigeminal ganglia following masseter inflammation have not yet been determined. In the present study, we performed RNA sequencing assay in rat trigeminal ganglia to identify transcriptome profiles of genes relevant to hyperalgesia following inflammation of the rat masseter muscle. Results Masseter inflammation differentially regulated >3500 genes in trigeminal ganglia. Predominant biological pathways were predicted to be related with activation of resident non-neuronal cells within trigeminal ganglia or recruitment of immune cells. To focus our analysis on the genes more relevant to nociceptors, we selected genes implicated in pain mechanisms, genes enriched in small- to medium-sized sensory neurons, and genes enriched in TRPV1-lineage nociceptors. Among the 2320 candidate genes, 622 genes showed differential expression following masseter inflammation. When the analysis was limited to these candidate genes, pathways related with G protein-coupled signaling and synaptic plasticity were predicted to be enriched. Inspection of individual gene expression changes confirmed the transcriptional changes of multiple nociceptor genes associated with masseter hyperalgesia (e.g., Trpv1, Trpa1, P2rx3, Tac1, and Bdnf) and also suggested a number of novel probable contributors (e.g., Piezo2, Tmem100, and Hdac9). Conclusion These findings should further advance our understanding of peripheral mechanisms involved in persistent craniofacial muscle pain conditions and provide a rational basis for identifying novel genes or sets of genes that can be potentially targeted for treating such conditions. PMID:27702909
Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing
2018-04-23
Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis and benefit the therapy improvement.
Basu, Baidehi; Chakraborty, Joyeeta; Chandra, Aditi; Katarkar, Atul; Baldevbhai, Jadav Ritesh Kumar; Dhar Chowdhury, Debjit; Ray, Jay Gopal; Chaudhuri, Keya; Chatterjee, Raghunath
2017-01-01
Oral squamous cell carcinoma (OSCC) is one of the common malignancies in Southeast Asia. Epigenetic changes, mainly the altered DNA methylation, have been implicated in many cancers. Considering the varied environmental and genotoxic exposures among the Indian population, we conducted a genome-wide DNA methylation study on paired tumor and adjacent normal tissues of ten well-differentiated OSCC patients and validated in an additional 53 well-differentiated OSCC and adjacent normal samples. Genome-wide DNA methylation analysis identified several novel differentially methylated regions associated with OSCC. Hypermethylation is primarily enriched in the CpG-rich regions, while hypomethylation is mainly in the open sea. Distinct epigenetic drifts for hypo- and hypermethylation across CpG islands suggested independent mechanisms of hypo- and hypermethylation in OSCC development. Aberrant DNA methylation in the promoter regions are concomitant with gene expression. Hypomethylation of immune genes reflect the lymphocyte infiltration into the tumor microenvironment. Comparison of methylome data with 312 TCGA HNSCC samples identified a unique set of hypomethylated promoters among the OSCC patients in India. Pathway analysis of unique hypomethylated promoters indicated that the OSCC patients in India induce an anti-tumor T cell response, with mobilization of T lymphocytes in the neoplastic environment. Survival analysis of these epigenetically regulated immune genes suggested their prominent role in OSCC progression. Our study identified a unique set of hypomethylated regions, enriched in the promoters of immune response genes, and indicated the presence of a strong immune component in the tumor microenvironment. These methylation changes may serve as potential molecular markers to define risk and to monitor the prognosis of OSCC patients in India.
Trevisi, Paolo; Priori, Davide; Motta, Vincenzo; Luise, Diana; Jansman, Alfons J M; Koopmans, Sietse-Jan; Bosi, Paolo
2017-01-01
The stomach is an underestimated key interface between the ingesta and the digestive system, affecting the digestion and playing an important role in several endocrine functions. The quality of starter microbiota and the early life feeding of medium chain triglycerides may affect porcine gastric maturation. Two trials (T1, T2) were carried out on 12 and 24 cesarean-delivered piglets (birth, d0), divided over two microbiota treatments, but slaughtered and sampled at two or three weeks of age, respectively. All piglets were fed orally: sow serum (T1) or pasteurized sow colostrum (T2) on d0; simple starter microbiota ( Lactobacillus amylovorus , Clostridium glycolicum and Parabacteroides spp.) (d1-d3); complex microbiota inoculum (sow diluted feces, CA) or a placebo (simple association, SA) (d3-d4) and milk replacer ad libitum (d0-d4). The The T1 piglets and half of the T2 piglets were then fed a moist diet (CTRL); the remaining half of the T2 piglets were fed the CTRL diet fortified with medium chain triglycerides and 7% coconut oil (MCT). Total mRNA from the oxyntic mucosa was analyzed using Affymetrix©Porcine Gene array strips. Exploratory functional analysis of the resulting values was carried out using Gene Set Enrichment Analysis. Complex microbiota upregulated 11 gene sets in piglets of each age group vs. SA. Of these sets, 6 were upregulated at both ages, including the set of gene markers of oxyntic mucosa. In comparison with the piglets receiving SA, the CA enriched the genes in the sets related to interferon response when the CTRL diet was given while the same sets were impoverished by CA with the MCT diet. Early colonization with a complex starter microbiota promoted the functional maturation of the oxyntic mucosa in an age-dependent manner. The dietary fatty acid source may have affected the recruitment and the maturation of the immune cells, particularly when the piglets were early associated with a simplified starter microbiota.
Heart morphogenesis gene regulatory networks revealed by temporal expression analysis.
Hill, Jonathon T; Demarest, Bradley; Gorsi, Bushra; Smith, Megan; Yost, H Joseph
2017-10-01
During embryogenesis the heart forms as a linear tube that then undergoes multiple simultaneous morphogenetic events to obtain its mature shape. To understand the gene regulatory networks (GRNs) driving this phase of heart development, during which many congenital heart disease malformations likely arise, we conducted an RNA-seq timecourse in zebrafish from 30 hpf to 72 hpf and identified 5861 genes with altered expression. We clustered the genes by temporal expression pattern, identified transcription factor binding motifs enriched in each cluster, and generated a model GRN for the major gene batteries in heart morphogenesis. This approach predicted hundreds of regulatory interactions and found batteries enriched in specific cell and tissue types, indicating that the approach can be used to narrow the search for novel genetic markers and regulatory interactions. Subsequent analyses confirmed the GRN using two mutants, Tbx5 and nkx2-5 , and identified sets of duplicated zebrafish genes that do not show temporal subfunctionalization. This dataset provides an essential resource for future studies on the genetic/epigenetic pathways implicated in congenital heart defects and the mechanisms of cardiac transcriptional regulation. © 2017. Published by The Company of Biologists Ltd.
Ebot, Ericka M; Gerke, Travis; Labbé, David P; Sinnott, Jennifer A; Zadra, Giorgia; Rider, Jennifer R; Tyekucheva, Svitlana; Wilson, Kathryn M; Kelly, Rachel S; Shui, Irene M; Loda, Massimo; Kantoff, Philip W; Finn, Stephen; Vander Heiden, Matthew G; Brown, Myles; Giovannucci, Edward L; Mucci, Lorelei A
2017-11-01
Obese men are at higher risk of advanced prostate cancer and cancer-specific mortality; however, the biology underlying this association remains unclear. This study examined gene expression profiles of prostate tissue to identify biological processes differentially expressed by obesity status and lethal prostate cancer. Gene expression profiling was performed on tumor (n = 402) and adjacent normal (n = 200) prostate tissue from participants in 2 prospective cohorts who had been diagnosed with prostate cancer from 1982 to 2005. Body mass index (BMI) was calculated from the questionnaire immediately preceding cancer diagnosis. Men were followed for metastases or prostate cancer-specific death (lethal disease) through 2011. Gene Ontology biological processes differentially expressed by BMI were identified using gene set enrichment analysis. Pathway scores were computed by averaging the signal intensities of member genes. Odds ratios (ORs) for lethal prostate cancer were estimated with logistic regression. Among 402 men, 48% were healthy weight, 31% were overweight, and 21% were very overweight/obese. Fifteen gene sets were enriched in tumor tissue, but not normal tissue, of very overweight/obese men versus healthy-weight men; 5 of these were related to chromatin modification and remodeling (false-discovery rate < 0.25). Patients with high tumor expression of chromatin-related genes had worse clinical characteristics (Gleason grade > 7, 41% vs 17%; P = 2 × 10 -4 ) and an increased risk of lethal disease that was independent of grade and stage (OR, 5.26; 95% confidence interval, 2.37-12.25). This study improves our understanding of the biology of aggressive prostate cancer and identifies a potential mechanistic link between obesity and prostate cancer death that warrants further study. Cancer 2017;123:4130-4138. © 2017 American Cancer Society. © 2017 American Cancer Society.
Perucca, Simone; Di Palma, Andrea; Piccaluga, Pier Paolo; Gemelli, Claudia; Zoratti, Elisa; Bassi, Giulio; Giacopuzzi, Edoardo; Lojacono, Andrea; Borsani, Giuseppe; Tagliafico, Enrico; Scupoli, Maria Teresa; Bernardi, Simona; Zanaglio, Camilla; Cattina, Federica; Cancelli, Valeria; Malagola, Michele; Krampera, Mauro; Marini, Mirella; Almici, Camillo; Ferrari, Sergio; Russo, Domenico
2017-01-01
A human bone marrow-derived mesenchymal stromal cell (MSCs) and cord blood-derived CD34+ stem cell co-culture system was set up in order to evaluate the proliferative and differentiative effects induced by MSCs on CD34+ stem cells, and the reciprocal influences on gene expression profiles. After 10 days of co-culture, non-adherent (SN-fraction) and adherent (AD-fraction) CD34+ stem cells were collected and analysed separately. In the presence of MSCs, a significant increase in CD34+ cell number was observed (fold increase = 14.68), mostly in the SN-fraction (fold increase = 13.20). This was combined with a significant increase in CD34+ cell differentiation towards the BFU-E colonies and with a decrease in the CFU-GM. These observations were confirmed by microarray analysis. Through gene set enrichment analysis (GSEA), we noted a significant enrichment in genes involved in heme metabolism (e.g. LAMP2, CLCN3, BMP2K), mitotic spindle formation and proliferation (e.g. PALLD, SOS1, CCNA1) and TGF-beta signalling (e.g. ID1) and a down-modulation of genes participating in myeloid and lymphoid differentiation (e.g. PCGF2) in the co-cultured CD34+ stem cells. On the other hand, a significant enrichment in genes involved in oxygen-level response (e.g. TNFAIP3, SLC2A3, KLF6) and angiogenesis (e.g. VEGFA, IGF1, ID1) was found in the co-cultured MSCs. Taken together, our results suggest that MSCs can exert a priming effect on CD34+ stem cells, regulating their proliferation and erythroid differentiation. In turn, CD34+ stem cells seem to be able to polarise the BM-niche towards the vascular compartment by modulating molecular pathways related to hypoxia and angiogenesis. PMID:28231331
Molecular profiling identifies prognostic markers of stage IA lung adenocarcinoma.
Zhang, Jie; Shao, Jinchen; Zhu, Lei; Zhao, Ruiying; Xing, Jie; Wang, Jun; Guo, Xiaohui; Tu, Shichun; Han, Baohui; Yu, Keke
2017-09-26
We previously showed that different pathologic subtypes were associated with different prognostic values in patients with stage IA lung adenocarcinoma (AC). We hypothesize that differential gene expression profiles of different subtypes may be valuable factors for prognosis in stage IA lung adenocarcinoma. We performed microarray gene expression profiling on tumor tissues micro-dissected from patients with acinar and solid predominant subtypes of stage IA lung adenocarcinoma. These patients had undergone a lobectomy and mediastinal lymph node dissection at the Shanghai Chest Hospital, Shanghai, China in 2012. No patient had preoperative treatment. We performed the Gene Set Enrichment Analysis (GSEA) analysis to look for gene expression signatures associated with tumor subtypes. The histologic subtypes of all patients were classified according to the 2015 WHO lung Adenocarcinoma classification. We found that patients with the solid predominant subtype are enriched for genes involved in RNA polymerase activity as well as inactivation of the p53 pathway. Further, we identified a list of genes that may serve as prognostic markers for stage IA lung adenocarcinoma. Validation in the TCGA database shows that these genes are correlated with survival, suggesting that they are novel prognostic factors for stage IA lung adenocarcinoma. In conclusion, we have uncovered novel prognostic factors for stage IA lung adenocarcinoma using gene expression profiling in combination with histopathology subtyping.
Microarray Analysis of Differential Gene Expression Profile Between Human Fetal and Adult Heart.
Geng, Zhimin; Wang, Jue; Pan, Lulu; Li, Ming; Zhang, Jitai; Cai, Xueli; Chu, Maoping
2017-04-01
Although many changes have been discovered during heart maturation, the genetic mechanisms involved in the changes between immature and mature myocardium have only been partially elucidated. Here, gene expression profile changed between the human fetal and adult heart was characterized. A human microarray was applied to define the gene expression signatures of the fetal (13-17 weeks of gestation, n = 4) and adult hearts (30-40 years old, n = 4). Gene ontology analyses, pathway analyses, gene set enrichment analyses, and signal transduction network were performed to predict the function of the differentially expressed genes. Ten mRNAs were confirmed by quantificational real-time polymerase chain reaction. 5547 mRNAs were found to be significantly differentially expressed. "Cell cycle" was the most enriched pathway in the down-regulated genes. EFGR, IGF1R, and ITGB1 play a central role in the regulation of heart development. EGFR, IGF1R, and FGFR2 were the core genes regulating cardiac cell proliferation. The quantificational real-time polymerase chain reaction results were concordant with the microarray data. Our data identified the transcriptional regulation of heart development in the second trimester and the potential regulators that play a prominent role in the regulation of heart development and cardiac cells proliferation.
Zhao, Zhongming; Guo, An-Yuan; van den Oord, Edwin J C G; Aliev, Fazil; Jia, Peilin; Edenberg, Howard J; Riley, Brien P; Dick, Danielle M; Bettinger, Jill C; Davies, Andrew G; Grotewiel, Michael S; Schuckit, Marc A; Agrawal, Arpana; Kramer, John; Nurnberger, John I; Kendler, Kenneth S; Webb, Bradley T; Miles, Michael F
2012-01-01
A variety of species and experimental designs have been used to study genetic influences on alcohol dependence, ethanol response, and related traits. Integration of these heterogeneous data can be used to produce a ranked target gene list for additional investigation. In this study, we performed a unique multi-species evidence-based data integration using three microarray experiments in mice or humans that generated an initial alcohol dependence (AD) related genes list, human linkage and association results, and gene sets implicated in C. elegans and Drosophila. We then used permutation and false discovery rate (FDR) analyses on the genome-wide association studies (GWAS) dataset from the Collaborative Study on the Genetics of Alcoholism (COGA) to evaluate the ranking results and weighting matrices. We found one weighting score matrix could increase FDR based q-values for a list of 47 genes with a score greater than 2. Our follow up functional enrichment tests revealed these genes were primarily involved in brain responses to ethanol and neural adaptations occurring with alcoholism. These results, along with our experimental validation of specific genes in mice, C. elegans and Drosophila, suggest that a cross-species evidence-based approach is useful to identify candidate genes contributing to alcoholism.
Analysis of disease-associated objects at the Rat Genome Database
Wang, Shur-Jen; Laulederkind, Stanley J. F.; Hayman, G. T.; Smith, Jennifer R.; Petri, Victoria; Lowry, Timothy F.; Nigam, Rajni; Dwinell, Melinda R.; Worthey, Elizabeth A.; Munzenmaier, Diane H.; Shimoyama, Mary; Jacob, Howard J.
2013-01-01
The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene–disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, ‘regulation of programmed cell death’ was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where ‘lipid metabolic process’ was the most enriched term. ‘Cytosol’ and ‘nucleus’ were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with ‘nucleus’ annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term–annotated gene list showed enrichment in physiologically related diseases. For example, the ‘regulation of blood pressure’ genes were enriched with cardiovascular disease annotations, and the ‘lipid metabolic process’ genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological diseases by combining ‘G-protein coupled receptor binding’ annotated genes with ‘protein kinase binding’ annotated genes. Database URL: http://rgd.mcw.edu PMID:23794737
The Ets Transcription Factor EHF as a Regulator of Cornea Epithelial Cell Identity*
Stephens, Denise N.; Klein, Rachel Herndon; Salmans, Michael L.; Gordon, William; Ho, Hsiang; Andersen, Bogi
2013-01-01
The cornea is the clear, outermost portion of the eye composed of three layers: an epithelium that provides a protective barrier while allowing transmission of light into the eye, a collagen-rich stroma, and an endothelium monolayer. How cornea development and aging is controlled is poorly understood. Here we characterize the mouse cornea transcriptome from early embryogenesis through aging and compare it with transcriptomes of other epithelial tissues, identifying cornea-enriched genes, pathways, and transcriptional regulators. Additionally, we profiled cornea epithelium and stroma, defining genes enriched in these layers. Over 10,000 genes are differentially regulated in the mouse cornea across the time course, showing dynamic expression during development and modest expression changes in fewer genes during aging. A striking transition time point for gene expression between postnatal days 14 and 28 corresponds with completion of cornea development at the transcriptional level. Clustering classifies co-expressed, and potentially co-regulated, genes into biologically informative categories, including groups that exhibit epithelial or stromal enriched expression. Based on these findings, and through loss of function studies and ChIP-seq, we show that the Ets transcription factor EHF promotes cornea epithelial fate through complementary gene activating and repressing activities. Furthermore, we identify potential interactions between EHF, KLF4, and KLF5 in promoting cornea epithelial differentiation. These data provide insights into the mechanisms underlying epithelial development and aging, identifying EHF as a regulator of cornea epithelial identity and pointing to interactions between Ets and KLF factors in promoting epithelial fate. Furthermore, this comprehensive gene expression data set for the cornea is a powerful tool for discovery of novel cornea regulators and pathways. PMID:24142692
Identification of a neuronal transcription factor network involved in medulloblastoma development.
Lastowska, Maria; Al-Afghani, Hani; Al-Balool, Haya H; Sheth, Harsh; Mercer, Emma; Coxhead, Jonathan M; Redfern, Chris P F; Peters, Heiko; Burt, Alastair D; Santibanez-Koref, Mauro; Bacon, Chris M; Chesler, Louis; Rust, Alistair G; Adams, David J; Williamson, Daniel; Clifford, Steven C; Jackson, Michael S
2013-07-11
Medulloblastomas, the most frequent malignant brain tumours affecting children, comprise at least 4 distinct clinicogenetic subgroups. Aberrant sonic hedgehog (SHH) signalling is observed in approximately 25% of tumours and defines one subgroup. Although alterations in SHH pathway genes (e.g. PTCH1, SUFU) are observed in many of these tumours, high throughput genomic analyses have identified few other recurring mutations. Here, we have mutagenised the Ptch+/- murine tumour model using the Sleeping Beauty transposon system to identify additional genes and pathways involved in SHH subgroup medulloblastoma development. Mutagenesis significantly increased medulloblastoma frequency and identified 17 candidate cancer genes, including orthologs of genes somatically mutated (PTEN, CREBBP) or associated with poor outcome (PTEN, MYT1L) in the human disease. Strikingly, these candidate genes were enriched for transcription factors (p=2x10-5), the majority of which (6/7; Crebbp, Myt1L, Nfia, Nfib, Tead1 and Tgif2) were linked within a single regulatory network enriched for genes associated with a differentiated neuronal phenotype. Furthermore, activity of this network varied significantly between the human subgroups, was associated with metastatic disease, and predicted poor survival specifically within the SHH subgroup of tumours. Igf2, previously implicated in medulloblastoma, was the most differentially expressed gene in murine tumours with network perturbation, and network activity in both mouse and human tumours was characterised by enrichment for multiple gene-sets indicating increased cell proliferation, IGF signalling, MYC target upregulation, and decreased neuronal differentiation. Collectively, our data support a model of medulloblastoma development in SB-mutagenised Ptch+/- mice which involves disruption of a novel transcription factor network leading to Igf2 upregulation, proliferation of GNPs, and tumour formation. Moreover, our results identify rational therapeutic targets for SHH subgroup tumours, alongside prognostic biomarkers for the identification of poor-risk SHH patients.
NASA Astrophysics Data System (ADS)
Bhargava, Maneesh
Rationale: In rodent model systems, the sequential changes in lung morphology resulting from hyperoxic injury are well characterized, and are similar to changes in human acute respiratory distress syndrome (ARDS). In the injured lung, alveolar type two (AT2) epithelial cells play a critical role restoring the normal alveolar structure. Thus characterizing the changes in AT2 cells will provide insights into the mechanisms underpinning the recovery from lung injury. Methods: We applied an unbiased systems level proteomics approach to elucidate molecular mechanisms contributing to lung repair in a rat hyperoxic lung injury model. AT2 cells were isolated from rat lungs at predetermined intervals during hyperoxic injury and recovery. Protein expression profiles were determined by using iTRAQRTM with tandem mass spectrometry. Results: Of 959 distinct proteins identified, 183 significantly changed in abundance during the injury-recovery cycle. Gene Ontology enrichment analysis identified cell cycle, cell differentiation, cell metabolism, ion homeostasis, programmed cell death, ubiquitination, and cell migration to be significantly enriched by these proteins. Gene Set Enrichment Analysis of data acquired during lung repair revealed differential expression of gene sets that control multicellular organismal development, systems development, organ development, and chemical homeostasis. More detailed analysis identified activity in two regulatory pathways, JNK and miR 374. A Short Time-series Expression Miner (STEM) algorithm identified protein clusters with coherent changes during injury and repair. Conclusion: Coherent changes occur in the AT2 cell proteome in response to hyperoxic stress. These findings offer guidance regarding the specific molecular mechanisms governing repair of the injured lung.
Wu, Hao; Wu, Runliu; Chen, Miao; Li, Daojiang; Dai, Jing; Zhang, Yi; Gao, Kai; Yu, Jun; Hu, Gui; Guo, Yihang; Lin, Changwei; Li, Xiaorong
2017-03-28
Growing evidence suggests that long non-coding RNAs (lncRNAs) play a key role in tumorigenesis. However, the mechanism remains largely unknown. Thousands of significantly dysregulated lncRNAs and mRNAs were identified by microarray. Furthermore, a miR-133b-meditated lncRNA-mRNA ceRNA network was revealed, a subset of which was validated in 14 paired CRC patient tumor/non-tumor samples. Gene set enrichment analysis (GSEA) results demonstrated that lncRNAs ENST00000520055 and ENST00000535511 shared KEGG pathways with miR-133b target genes. We used microarrays to survey the lncRNA and mRNA expression profiles of colorectal cancer and para-cancer tissues. Gene Ontology (GO) and KEGG pathway enrichment analyses were performed to explore the functions of the significantly dysregulated genes. An innovate method was employed that combined analyses of two microarray data sets to construct a miR-133b-mediated lncRNA-mRNA competing endogenous RNAs (ceRNA) network. Quantitative RT-PCR analysis was used to validate part of this network. GSEA was used to predict the potential functions of these lncRNAs. This study identifies and validates a new method to investigate the miR-133b-mediated lncRNA-mRNA ceRNA network and lays the foundation for future investigation into the role of lncRNAs in colorectal cancer.
Bellucci, Elisa; Bitocchi, Elena; Ferrarini, Alberto; Benazzo, Andrea; Biagetti, Eleonora; Klie, Sebastian; Minio, Andrea; Rau, Domenico; Rodriguez, Monica; Panziera, Alex; Venturini, Luca; Attene, Giovanna; Albertini, Emidio; Jackson, Scott A.; Nanni, Laura; Fernie, Alisdair R.; Nikoloski, Zoran; Bertorelle, Giorgio; Delledonne, Massimo; Papa, Roberto
2014-01-01
Using RNA sequencing technology and de novo transcriptome assembly, we compared representative sets of wild and domesticated accessions of common bean (Phaseolus vulgaris) from Mesoamerica. RNA was extracted at the first true-leaf stage, and de novo assembly was used to develop a reference transcriptome; the final data set consists of ∼190,000 single nucleotide polymorphisms from 27,243 contigs in expressed genomic regions. A drastic reduction in nucleotide diversity (∼60%) is evident for the domesticated form, compared with the wild form, and almost 50% of the contigs that are polymorphic were brought to fixation by domestication. In parallel, the effects of domestication decreased the diversity of gene expression (18%). While the coexpression networks for the wild and domesticated accessions demonstrate similar seminal network properties, they show distinct community structures that are enriched for different molecular functions. After simulating the demographic dynamics during domestication, we found that 9% of the genes were actively selected during domestication. We also show that selection induced a further reduction in the diversity of gene expression (26%) and was associated with 5-fold enrichment of differentially expressed genes. While there is substantial evidence of positive selection associated with domestication, in a few cases, this selection has increased the nucleotide diversity in the domesticated pool at target loci associated with abiotic stress responses, flowering time, and morphology. PMID:24850850
A comparative analysis of biclustering algorithms for gene expression data
Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.
2013-01-01
The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters. PMID:22772837
The Role of Vitamin D in the Transcriptional Program of Human Pregnancy
Al-Garawi, Amal; Carey, Vincent J.; Chhabra, Divya; Morrow, Jarrett; Lasky-Su, Jessica; Qiu, Weiliang; Laranjo, Nancy; Litonjua, Augusto A.; Weiss, Scott T.
2016-01-01
Background Patterns of gene expression of human pregnancy are poorly understood. In a trial of vitamin D supplementation in pregnant women, peripheral blood transcriptomes were measured longitudinally on 30 women and used to characterize gene co-expression networks. Objective Studies suggest that increased maternal Vitamin D levels may reduce the risk of asthma in early life, yet the underlying mechanisms have not been examined. In this study, we used a network-based approach to examine changes in gene expression profiles during the course of normal pregnancy and evaluated their association with maternal Vitamin D levels. Design The VDAART study is a randomized clinical trial of vitamin D supplementation in pregnancy for reduction of pediatric asthma risk. The trial enrolled 881 women at 10–18 weeks of gestation. Longitudinal gene expression measures were obtained on thirty pregnant women, using RNA isolated from peripheral blood samples obtained in the first and third trimesters. Differentially expressed genes were identified using significance of analysis of microarrays (SAM), and clustered using a weighted gene co-expression network analysis (WGCNA). Gene-set enrichment was performed to identify major biological pathways. Results Comparison of transcriptional profiles between first and third trimesters of pregnancy identified 5839 significantly differentially expressed genes (FDR<0.05). Weighted gene co-expression network analysis clustered these transcripts into 14 co-expression modules of which two showed significant correlation with maternal vitamin D levels. Pathway analysis of these two modules revealed genes enriched in immune defense pathways and extracellular matrix reorganization as well as genes enriched in notch signaling and transcription factor networks. Conclusion Our data show that gene expression profiles of healthy pregnant women change during the course of pregnancy and suggest that maternal Vitamin D levels influence transcriptional profiles. These alterations of the maternal transcriptome may contribute to fetal immune imprinting and reduce allergic sensitization in early life. Trial Registration clinicaltrials.gov NCT00920621 PMID:27711190
Use of RecA protein to enrich for homologous genes in a genomic library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taidi-Laskowski, B.; Grumet, F.C.; Tyan, D.
1988-08-25
RecA protein-coated probe has been utilized to enrich genomic digests for desired genes in order to facilitate cloning from genomic libraries. Using a previously cloned HLA-B27 gene as the recA-coated enrichment probe, the authors obtained a mean 108x increase in the ratio of specific to nonspecific plaques in lambda libraries screened for B27 variant alleles of estimated 99% homology to the probe. Class I genes of lesser homology were less enriched. Loss of genomic DNA during the enrichment procedure can, however, restrict application of this technique whenever starting genomic DNA is very limited. Nevertheless, the impressive reduction in cloning effortmore » and material makes recA enrichment a useful new tool for cloning homologous genes from genomic DNA.« less
Omrani, Rahma; Spini, Giulia; Puglisi, Edoardo; Saidane, Dalila
2018-04-01
Environmental microbial communities are key players in the bioremediation of hydrocarbon pollutants. Here we assessed changes in bacterial abundance and diversity during the degradation of Tunisian Zarzatine oil by four indigenous bacterial consortia enriched from a petroleum station soil, a refinery reservoir soil, a harbor sediment and seawater. The four consortia were found to efficiently degrade up to 92.0% of total petroleum hydrocarbons after 2 months of incubation. Illumina 16S rRNA gene sequencing revealed that the consortia enriched from soil and sediments were dominated by species belonging to Pseudomonas and Acinetobacter genera, while in the seawater-derived consortia Dietzia, Fusobacterium and Mycoplana emerged as dominant genera. We identified a number of species whose relative abundances bloomed from small to high percentages: Dietzia daqingensis in the seawater microcosms, and three OTUs classified as Acinetobacter venetianus in all two soils and sediment derived microcosms. Functional analyses on degrading genes were conducted by comparing PCR results of the degrading genes alkB, ndoB, cat23, xylA and nidA1 with inferences obtained by PICRUSt analysis of 16S amplicon data: the two data sets were partly in agreement and suggest a relationship between the catabolic genes detected and the rate of biodegradation obtained. The work provides detailed insights about the modulation of bacterial communities involved in petroleum biodegradation and can provide useful information for in situ bioremediation of oil-related pollution.
Martinez, Diego A.; Oliver, Brian G.; Gräser, Yvonne; Goldberg, Jonathan M.; Li, Wenjun; Martinez-Rossi, Nilce M.; Monod, Michel; Shelest, Ekaterina; Barton, Richard C.; Birch, Elizabeth; Brakhage, Axel A.; Chen, Zehua; Gurr, Sarah J.; Heiman, David; Heitman, Joseph; Kosti, Idit; Rossi, Antonio; Saif, Sakina; Samalova, Marketa; Saunders, Charles W.; Shea, Terrance; Summerbell, Richard C.; Xu, Jun; Young, Sarah; Zeng, Qiandong; Birren, Bruce W.; Cuomo, Christina A.; White, Theodore C.
2012-01-01
ABSTRACT The major cause of athlete’s foot is Trichophyton rubrum, a dermatophyte or fungal pathogen of human skin. To facilitate molecular analyses of the dermatophytes, we sequenced T. rubrum and four related species, Trichophyton tonsurans, Trichophyton equinum, Microsporum canis, and Microsporum gypseum. These species differ in host range, mating, and disease progression. The dermatophyte genomes are highly colinear yet contain gene family expansions not found in other human-associated fungi. Dermatophyte genomes are enriched for gene families containing the LysM domain, which binds chitin and potentially related carbohydrates. These LysM domains differ in sequence from those in other species in regions of the peptide that could affect substrate binding. The dermatophytes also encode novel sets of fungus-specific kinases with unknown specificity, including nonfunctional pseudokinases, which may inhibit phosphorylation by competing for kinase sites within substrates, acting as allosteric effectors, or acting as scaffolds for signaling. The dermatophytes are also enriched for a large number of enzymes that synthesize secondary metabolites, including dermatophyte-specific genes that could synthesize novel compounds. Finally, dermatophytes are enriched in several classes of proteases that are necessary for fungal growth and nutrient acquisition on keratinized tissues. Despite differences in mating ability, genes involved in mating and meiosis are conserved across species, suggesting the possibility of cryptic mating in species where it has not been previously detected. These genome analyses identify gene families that are important to our understanding of how dermatophytes cause chronic infections, how they interact with epithelial cells, and how they respond to the host immune response. PMID:22951933
Distinct transcriptomes define rostral and caudal serotonin neurons
Wylie, Christi J.; Hendricks, Timothy J.; Zhang, Bing; Wang, Lily; Lu, Pengcheng; Leahy, Patrick; Fox, Stephanie; Maeno, Hiroshi; Deneris, Evan S.
2012-01-01
The molecular architecture of developing serotonin (5HT) neurons is poorly understood yet its determination is likely to be essential for elucidating functional heterogeneity of these cells and the contribution of serotonergic dysfunction to disease pathogenesis. Here, we describe the purification of postmitotic embryonic 5HT neurons by flow cytometry for whole genome microarray expression profiling of this unitary monoaminergic neuron type. Our studies identified significantly enriched expression of hundreds of unique genes in 5HT neurons thus providing an abundance of new serotonergic markers. Furthermore, we identified several hundred transcripts encoding homeodomain, axon guidance, cell adhesion, intracellular signaling, ion transport, and imprinted genes associated with various neurodevelopmental disorders that were differentially enriched in developing rostral and caudal 5HT neurons. These findings suggested a homeodomain code that distinguishes rostral and caudal 5HT neurons. Indeed, verification studies demonstrated that Hmx homeodomain and Hox gene expression defined an Hmx+ rostral subtype and Hox+ caudal subtype. Expression of engrailed genes in a subset of 5HT neurons in the rostral domain further distinguished two subtypes defined as Hmx+En+ and Hmx+En-. The differential enrichment of gene sets for different canonical pathways and gene ontology categories provided additional evidence for heterogeneity between rostral and caudal 5HT neurons. These findings demonstrate a deep transcriptome and biological pathway duality for neurons that give rise to the ascending and descending serotonergic subsystems. Our databases provide a rich, clinically relevant, resource for definition of 5HT neuron subtypes and elucidation of the genetic networks required for serotonergic function. PMID:20071532
Willsey, A. Jeremy; Sanders, Stephan J.; Li, Mingfeng; Dong, Shan; Tebbenkamp, Andrew T.; Muhle, Rebecca A.; Reilly, Steven K.; Lin, Leon; Fertuzinhos, Sofia; Miller, Jeremy A.; Murtha, Michael T.; Bichsel, Candace; Niu, Wei; Cotney, Justin; Ercan-Sencicek, A. Gulhan; Gockley, Jake; Gupta, Abha; Han, Wenqi; He, Xin; Hoffman, Ellen; Klei, Lambertus; Lei, Jing; Liu, Wenzhong; Liu, Li; Lu, Cong; Xu, Xuming; Zhu, Ying; Mane, Shrikant M.; Lein, Edward S.; Wei, Liping; Noonan, James P.; Roeder, Kathryn; Devlin, Bernie; Šestan, Nenad; State, Matthew W.
2013-01-01
SUMMARY Autism spectrum disorder (ASD) is a complex developmental syndrome of unknown etiology. Recent studies employing exome- and genome-wide sequencing have identified nine high-confidence ASD (hcASD) genes. Working from the hypothesis that ASD-associated mutations in these biologically pleiotropic genes will disrupt intersecting developmental processes to contribute to a common phenotype, we have attempted to identify time periods, brain regions, and cell types in which these genes converge. We have constructed coexpression networks based on the hcASD “seed” genes, leveraging a rich expression data set encompassing multiple human brain regions across human development and into adulthood. By assessing enrichment of an independent set of probable ASD (pASD) genes, derived from the same sequencing studies, we demonstrate a key point of convergence in midfetal layer 5/6 cortical projection neurons. This approach informs when, where, and in what cell types mutations in these specific genes may be productively studied to clarify ASD pathophysiology. PMID:24267886
Gene Expression Profiles of Sporadic Canine Hemangiosarcoma Are Uniquely Associated with Breed
Tamburini, Beth A.; Trapp, Susan; Phang, Tzu Lip; Schappa, Jill T.; Hunter, Lawrence E.; Modiano, Jaime F.
2009-01-01
The role an individual's genetic background plays on phenotype and biological behavior of sporadic tumors remains incompletely understood. We showed previously that lymphomas from Golden Retrievers harbor defined, recurrent chromosomal aberrations that occur less frequently in lymphomas from other dog breeds, suggesting spontaneous canine tumors provide suitable models to define how heritable traits influence cancer genotypes. Here, we report a complementary approach using gene expression profiling in a naturally occurring endothelial sarcoma of dogs (hemangiosarcoma). Naturally occurring hemangiosarcomas of Golden Retrievers clustered separately from those of non-Golden Retrievers, with contributions from transcription factors, survival factors, and from pro-inflammatory and angiogenic genes, and which were exclusively present in hemangiosarcoma and not in other tumors or normal cells (i.e., they were not due simply to variation in these genes among breeds). Vascular Endothelial Growth Factor Receptor 1 (VEGFR1) was among genes preferentially enriched within known pathways derived from gene set enrichment analysis when characterizing tumors from Golden Retrievers versus other breeds. Heightened VEGFR1 expression in these tumors also was apparent at the protein level and targeted inhibition of VEGFR1 increased proliferation of hemangiosarcoma cells derived from tumors of Golden Retrievers, but not from other breeds. Our results suggest heritable factors mold gene expression phenotypes, and consequently biological behavior in sporadic, naturally occurring tumors. PMID:19461996
Kakati, Tulika; Kashyap, Hirak; Bhattacharyya, Dhruba K
2016-11-30
There exist many tools and methods for construction of co-expression network from gene expression data and for extraction of densely connected gene modules. In this paper, a method is introduced to construct co-expression network and to extract co-expressed modules having high biological significance. The proposed method has been validated on several well known microarray datasets extracted from a diverse set of species, using statistical measures, such as p and q values. The modules obtained in these studies are found to be biologically significant based on Gene Ontology enrichment analysis, pathway analysis, and KEGG enrichment analysis. Further, the method was applied on an Alzheimer's disease dataset and some interesting genes are found, which have high semantic similarity among them, but are not significantly correlated in terms of expression similarity. Some of these interesting genes, such as MAPT, CASP2, and PSEN2, are linked with important aspects of Alzheimer's disease, such as dementia, increase cell death, and deposition of amyloid-beta proteins in Alzheimer's disease brains. The biological pathways associated with Alzheimer's disease, such as, Wnt signaling, Apoptosis, p53 signaling, and Notch signaling, incorporate these interesting genes. The proposed method is evaluated in regard to existing literature.
Sun, Haimeng; Yang, Zhongchen; Wei, Caijie; Wu, Weizhong
2018-04-26
An up-flow vertical flow constructed wetland (AC-VFCW) filled with ceramsite and 5% external carbon source poly(3-hydroxybutyrate-hydroxyvalerate) (PHBV) as substrate was set for nitrogen removal with micro aeration. Simultaneous nitrification and denitrification process was observed with 90.4% NH 4 + -N and 92.1% TN removal efficiencies. Nitrification and denitrification genes were both preferentially enriched on the surface of PHBV. Nitrogen transformation along the flow direction showed that NH 4 + -N was oxidized to NO 3 - -N at the lowermost 10 cm of the substrate and NO 3 - -N gradually degraded over the depth. AmoA gene was more enriched at -10 and -50 cm layers. NirS gene was the dominant functional gene at the bottom layer with the abundance of 2.05 × 10 7 copies g -1 substrate while nosZ gene was predominantly abundant with 7.51 × 10 6 and 2.64 × 10 6 copies g -1 substrate at the middle and top layer, respectively, indicating that functional division of dominant nitrogen functional genes forms along the flow direction in AC-VFCW. Copyright © 2018. Published by Elsevier Ltd.
Kakati, Tulika; Kashyap, Hirak; Bhattacharyya, Dhruba K.
2016-01-01
There exist many tools and methods for construction of co-expression network from gene expression data and for extraction of densely connected gene modules. In this paper, a method is introduced to construct co-expression network and to extract co-expressed modules having high biological significance. The proposed method has been validated on several well known microarray datasets extracted from a diverse set of species, using statistical measures, such as p and q values. The modules obtained in these studies are found to be biologically significant based on Gene Ontology enrichment analysis, pathway analysis, and KEGG enrichment analysis. Further, the method was applied on an Alzheimer’s disease dataset and some interesting genes are found, which have high semantic similarity among them, but are not significantly correlated in terms of expression similarity. Some of these interesting genes, such as MAPT, CASP2, and PSEN2, are linked with important aspects of Alzheimer’s disease, such as dementia, increase cell death, and deposition of amyloid-beta proteins in Alzheimer’s disease brains. The biological pathways associated with Alzheimer’s disease, such as, Wnt signaling, Apoptosis, p53 signaling, and Notch signaling, incorporate these interesting genes. The proposed method is evaluated in regard to existing literature. PMID:27901073
Alterations of the human gut microbiome in liver cirrhosis.
Qin, Nan; Yang, Fengling; Li, Ang; Prifti, Edi; Chen, Yanfei; Shao, Li; Guo, Jing; Le Chatelier, Emmanuelle; Yao, Jian; Wu, Lingjiao; Zhou, Jiawei; Ni, Shujun; Liu, Lin; Pons, Nicolas; Batto, Jean Michel; Kennedy, Sean P; Leonard, Pierre; Yuan, Chunhui; Ding, Wenchao; Chen, Yuanting; Hu, Xinjun; Zheng, Beiwen; Qian, Guirong; Xu, Wei; Ehrlich, S Dusko; Zheng, Shusen; Li, Lanjuan
2014-09-04
Liver cirrhosis occurs as a consequence of many chronic liver diseases that are prevalent worldwide. Here we characterize the gut microbiome in liver cirrhosis by comparing 98 patients and 83 healthy control individuals. We build a reference gene set for the cohort containing 2.69 million genes, 36.1% of which are novel. Quantitative metagenomics reveals 75,245 genes that differ in abundance between the patients and healthy individuals (false discovery rate < 0.0001) and can be grouped into 66 clusters representing cognate bacterial species; 28 are enriched in patients and 38 in control individuals. Most (54%) of the patient-enriched, taxonomically assigned species are of buccal origin, suggesting an invasion of the gut from the mouth in liver cirrhosis. Biomarkers specific to liver cirrhosis at gene and function levels are revealed by a comparison with those for type 2 diabetes and inflammatory bowel disease. On the basis of only 15 biomarkers, a highly accurate patient discrimination index is created and validated on an independent cohort. Thus microbiota-targeted biomarkers may be a powerful tool for diagnosis of different diseases.
Caberlotto, Laura; Lauria, Mario; Nguyen, Thanh-Phuong; Scotti, Marco
2013-01-01
Alzheimer's disease is the most common cause of dementia worldwide, affecting the elderly population. It is characterized by the hallmark pathology of amyloid-β deposition, neurofibrillary tangle formation, and extensive neuronal degeneration in the brain. Wealth of data related to Alzheimer's disease has been generated to date, nevertheless, the molecular mechanism underlying the etiology and pathophysiology of the disease is still unknown. Here we described a method for the combined analysis of multiple types of genome-wide data aimed at revealing convergent evidence interest that would not be captured by a standard molecular approach. Lists of Alzheimer-related genes (seed genes) were obtained from different sets of data on gene expression, SNPs, and molecular targets of drugs. Network analysis was applied for identifying the regions of the human protein-protein interaction network showing a significant enrichment in seed genes, and ultimately, in genes associated to Alzheimer's disease, due to the cumulative effect of different combinations of the starting data sets. The functional properties of these enriched modules were characterized, effectively considering the role of both Alzheimer-related seed genes and genes that closely interact with them. This approach allowed us to present evidence in favor of one of the competing theories about AD underlying processes, specifically evidence supporting a predominant role of metabolism-associated biological process terms, including autophagy, insulin and fatty acid metabolic processes in Alzheimer, with a focus on AMP-activated protein kinase. This central regulator of cellular energy homeostasis regulates a series of brain functions altered in Alzheimer's disease and could link genetic perturbation with neuronal transmission and energy regulation, representing a potential candidate to be targeted by therapy.
Tian, Wenlan; Paudel, Dev
2017-01-01
Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822
Gu, Liqiang; Yu, Jun; Wang, Qing; Xu, Bin; Ji, Liechen; Yu, Lin; Zhang, Xipeng; Cai, Hui
2018-05-03
The present study aimed to investigate potential prognostic long noncoding RNAs (lncRNAs) associated with colorectal cancer (CRC). An mRNA‑seq dataset obtained from The Cancer Genome Atlas was employed to identify the differentially expressed lncRNAs (DELs) between CRC patients with good and poor prognoses. Subsequently, univariate and multivariate Cox regression analyses were conducted to analyze the prognosis‑associated lncRNAs among all DELs. In addition, a risk scoring system was developed according to the expression levels of the prognostic lncRNAs, which was then applied to a training set and an independent testing set. Furthermore, the co‑expressed genes of prognostic lncRNAs were screened using a Multi‑Experiment Matrix online tool for construction of lncRNA‑gene networks. Finally, Kyoto Encyclopedia of Genes and Genomes pathway and Gene Ontology (GO) function enrichment analyses were performed on genes in the lncRNA‑gene networks using KOBAS, GOATOOLS and ClusterProfiler. The present study identified 82 DELs, of which long intergenic nonprotein coding RNA 2159, RP11‑452L6.6, RP11‑894P9.1 and RP11‑69M1.6, and whey acidic protein four‑disulfide core domain 21 (WFDC21P) were reported to be independently associated with the prognosis of patients with CRC. A 5‑lncRNA signature‑based risk scoring system was developed, which may be used to classify patients into low‑ and high‑risk groups with significantly different recurrence‑free survival times in the training and testing sets (P<0.05). Co‑expressed genes of WFDC21P or RP11‑69M1.6 were utilized to construct the lncRNA‑gene networks. Genes in the networks were significantly enriched in 'tight junction', 'focal adhesion' and 'regulation of actin cytoskeleton' pathways, and numerous GO terms associated with 'reactive oxygen species metabolism' and 'nitric oxide metabolism'. The present study proposed a 5‑lncRNA signature‑based risk scoring system for predicting the prognosis of patients with CRC, and revealed the associated signaling pathways and biological processes. The results of the present study may help improve prognostic evaluation in clinical practice.
Kaushik, Abhinav; Bhatia, Yashuma; Ali, Shakir; Gupta, Dinesh
2015-01-01
Metastatic melanoma patients have a poor prognosis, mainly attributable to the underlying heterogeneity in melanoma driver genes and altered gene expression profiles. These characteristics of melanoma also make the development of drugs and identification of novel drug targets for metastatic melanoma a daunting task. Systems biology offers an alternative approach to re-explore the genes or gene sets that display dysregulated behaviour without being differentially expressed. In this study, we have performed systems biology studies to enhance our knowledge about the conserved property of disease genes or gene sets among mutually exclusive datasets representing melanoma progression. We meta-analysed 642 microarray samples to generate melanoma reconstructed networks representing four different stages of melanoma progression to extract genes with altered molecular circuitry wiring as compared to a normal cellular state. Intriguingly, a majority of the melanoma network-rewired genes are not differentially expressed and the disease genes involved in melanoma progression consistently modulate its activity by rewiring network connections. We found that the shortlisted disease genes in the study show strong and abnormal network connectivity, which enhances with the disease progression. Moreover, the deviated network properties of the disease gene sets allow ranking/prioritization of different enriched, dysregulated and conserved pathway terms in metastatic melanoma, in agreement with previous findings. Our analysis also reveals presence of distinct network hubs in different stages of metastasizing tumor for the same set of pathways in the statistically conserved gene sets. The study results are also presented as a freely available database at http://bioinfo.icgeb.res.in/m3db/. The web-based database resource consists of results from the analysis presented here, integrated with cytoscape web and user-friendly tools for visualization, retrieval and further analysis. PMID:26558755
Dunachie, Susanna; Berthoud, Tamara; Hill, Adrian V.S.; Fletcher, Helen A.
2015-01-01
Introduction The complexity of immunity to malaria is well known, and clear correlates of protection against malaria have not been established. A better understanding of immune markers induced by candidate malaria vaccines would greatly enhance vaccine development, immunogenicity monitoring and estimation of vaccine efficacy in the field. We have previously reported complete or partial efficacy against experimental sporozoite challenge by several vaccine regimens in healthy malaria-naïve subjects in Oxford. These include a prime-boost regimen with RTS,S/AS02A and modified vaccinia virus Ankara (MVA) expressing the CSP antigen, and a DNA-prime, MVA-boost regimen expressing the ME TRAP antigens. Using samples from these trials we performed transcriptional profiling, allowing a global assessment of responses to vaccination. Methods We used Human RefSeq8 Bead Chips from Illumina to examine gene expression using PBMC (peripheral blood mononuclear cells) from 16 human volunteers. To focus on antigen-specific changes, comparisons were made between PBMC stimulated with CSP or TRAP peptide pools and unstimulated PBMC post vaccination. We then correlated gene expression with protection against malaria in a human Plasmodium falciparum malaria challenge model. Results Differentially expressed genes induced by both vaccine regimens were predominantly in the IFN-γ pathway. Gene set enrichment analysis revealed antigen-specific effects on genes associated with IFN induction and proteasome modules after vaccination. Genes associated with IFN induction and antigen presentation modules were positively enriched in subjects with complete protection from malaria challenge, while genes associated with haemopoietic stem cells, regulatory monocytes and the myeloid lineage modules were negatively enriched in protected subjects. Conclusions These results represent novel insights into the immune repertoires involved in malaria vaccination. PMID:26256523
Dunachie, Susanna; Berthoud, Tamara; Hill, Adrian V S; Fletcher, Helen A
2015-09-29
The complexity of immunity to malaria is well known, and clear correlates of protection against malaria have not been established. A better understanding of immune markers induced by candidate malaria vaccines would greatly enhance vaccine development, immunogenicity monitoring and estimation of vaccine efficacy in the field. We have previously reported complete or partial efficacy against experimental sporozoite challenge by several vaccine regimens in healthy malaria-naïve subjects in Oxford. These include a prime-boost regimen with RTS,S/AS02A and modified vaccinia virus Ankara (MVA) expressing the CSP antigen, and a DNA-prime, MVA-boost regimen expressing the ME TRAP antigens. Using samples from these trials we performed transcriptional profiling, allowing a global assessment of responses to vaccination. We used Human RefSeq8 Bead Chips from Illumina to examine gene expression using PBMC (peripheral blood mononuclear cells) from 16 human volunteers. To focus on antigen-specific changes, comparisons were made between PBMC stimulated with CSP or TRAP peptide pools and unstimulated PBMC post vaccination. We then correlated gene expression with protection against malaria in a human Plasmodium falciparum malaria challenge model. Differentially expressed genes induced by both vaccine regimens were predominantly in the IFN-γ pathway. Gene set enrichment analysis revealed antigen-specific effects on genes associated with IFN induction and proteasome modules after vaccination. Genes associated with IFN induction and antigen presentation modules were positively enriched in subjects with complete protection from malaria challenge, while genes associated with haemopoietic stem cells, regulatory monocytes and the myeloid lineage modules were negatively enriched in protected subjects. These results represent novel insights into the immune repertoires involved in malaria vaccination. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Wang, Hao; Sun, Xuming; Chou, Jeff; Lin, Marina; Ferrario, Carlos M; Zapata-Sudo, Gisele; Groban, Leanne
2017-08-01
Activation of G protein-coupled estrogen receptor (GPER) by its agonist, G1, protects the heart from stressors such as pressure-overload, ischemia, a high-salt diet, estrogen loss, and aging, in various male and female animal models. Due to nonspecific effects of G1, the exact functions of cardiac GPER cannot be concluded from studies using systemic G1 administration. Moreover, global knockdown of GPER affects glucose homeostasis, blood pressure, and many other cardiovascular-related systems, thereby confounding interpretation of its direct cardiac actions. We generated a cardiomyocyte-specific GPER knockout (KO) mouse model to specifically investigate the functions of GPER in cardiomyocytes. Compared to wild type mice, cardiomyocyte-specific GPER KO mice exhibited adverse alterations in cardiac structure and impaired systolic and diastolic function, as measured by echocardiography. Gene deletion effects on left ventricular dimensions were more profound in male KO mice compared to female KO mice. Analysis of DNA microarray data from isolated cardiomyocytes of wild type and KO mice revealed sex-based differences in gene expression profiles affecting multiple transcriptional networks. Gene Set Enrichment Analysis (GSEA) revealed that mitochondrial genes are enriched in GPER KO females, whereas inflammatory response genes are enriched in GPER KO males, compared to their wild type counterparts of the same sex. The cardiomyocyte-specific GPER KO mouse model provides us with a powerful tool to study the functions of GPER in cardiomyocytes. The gene expression profiles of the GPER KO mice provide foundational information for further study of the mechanisms underlying sex-specific cardioprotection by GPER. Copyright © 2016 Elsevier B.V. All rights reserved.
Genome-wide screen identifies a novel prognostic signature for breast cancer survival
Mao, Xuan Y.; Lee, Matthew J.; Zhu, Jeffrey; ...
2017-01-21
Large genomic datasets in combination with clinical data can be used as an unbiased tool to identify genes important in patient survival and discover potential therapeutic targets. We used a genome-wide screen to identify 587 genes significantly and robustly deregulated across four independent breast cancer (BC) datasets compared to normal breast tissue. Gene expression of 381 genes was significantly associated with relapse-free survival (RFS) in BC patients. We used a gene co-expression network approach to visualize the genetic architecture in normal breast and BCs. In normal breast tissue, co-expression cliques were identified enriched for cell cycle, gene transcription, cell adhesion,more » cytoskeletal organization and metabolism. In contrast, in BC, only two major co-expression cliques were identified enriched for cell cycle-related processes or blood vessel development, cell adhesion and mammary gland development processes. Interestingly, gene expression levels of 7 genes were found to be negatively correlated with many cell cycle related genes, highlighting these genes as potential tumor suppressors and novel therapeutic targets. A forward-conditional Cox regression analysis was used to identify a 12-gene signature associated with RFS. A prognostic scoring system was created based on the 12-gene signature. This scoring system robustly predicted BC patient RFS in 60 sampling test sets and was further validated in TCGA and METABRIC BC data. Our integrated study identified a 12-gene prognostic signature that could guide adjuvant therapy for BC patients and includes novel potential molecular targets for therapy.« less
Genome-wide screen identifies a novel prognostic signature for breast cancer survival
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mao, Xuan Y.; Lee, Matthew J.; Zhu, Jeffrey
Large genomic datasets in combination with clinical data can be used as an unbiased tool to identify genes important in patient survival and discover potential therapeutic targets. We used a genome-wide screen to identify 587 genes significantly and robustly deregulated across four independent breast cancer (BC) datasets compared to normal breast tissue. Gene expression of 381 genes was significantly associated with relapse-free survival (RFS) in BC patients. We used a gene co-expression network approach to visualize the genetic architecture in normal breast and BCs. In normal breast tissue, co-expression cliques were identified enriched for cell cycle, gene transcription, cell adhesion,more » cytoskeletal organization and metabolism. In contrast, in BC, only two major co-expression cliques were identified enriched for cell cycle-related processes or blood vessel development, cell adhesion and mammary gland development processes. Interestingly, gene expression levels of 7 genes were found to be negatively correlated with many cell cycle related genes, highlighting these genes as potential tumor suppressors and novel therapeutic targets. A forward-conditional Cox regression analysis was used to identify a 12-gene signature associated with RFS. A prognostic scoring system was created based on the 12-gene signature. This scoring system robustly predicted BC patient RFS in 60 sampling test sets and was further validated in TCGA and METABRIC BC data. Our integrated study identified a 12-gene prognostic signature that could guide adjuvant therapy for BC patients and includes novel potential molecular targets for therapy.« less
Pauler, Florian M.; Sloane, Mathew A.; Huang, Ru; Regha, Kakkad; Koerner, Martha V.; Tamir, Ido; Sommer, Andreas; Aszodi, Andras; Jenuwein, Thomas; Barlow, Denise P.
2009-01-01
In mammals, genome-wide chromatin maps and immunofluorescence studies show that broad domains of repressive histone modifications are present on pericentromeric and telomeric repeats and on the inactive X chromosome. However, only a few autosomal loci such as silent Hox gene clusters have been shown to lie in broad domains of repressive histone modifications. Here we present a ChIP-chip analysis of the repressive H3K27me3 histone modification along chr 17 in mouse embryonic fibroblast cells using an algorithm named broad local enrichments (BLOCs), which allows the identification of broad regions of histone modifications. Our results, confirmed by BLOC analysis of a whole genome ChIP-seq data set, show that the majority of H3K27me3 modifications form BLOCs rather than focal peaks. H3K27me3 BLOCs modify silent genes of all types, plus flanking intergenic regions and their distribution indicates a negative correlation between H3K27me3 and transcription. However, we also found that some nontranscribed gene-poor regions lack H3K27me3. We therefore performed a low-resolution analysis of whole mouse chr 17, which revealed that H3K27me3 is enriched in mega-base-pair-sized domains that are also enriched for genes, short interspersed elements (SINEs) and active histone modifications. These genic H3K27me3 domains alternate with similar-sized gene-poor domains. These are deficient in active histone modifications, as well as H3K27me3, but are enriched for long interspersed elements (LINEs) and long-terminal repeat (LTR) transposons and H3K9me3 and H4K20me3. Thus, an autosome can be seen to contain alternating chromatin bands that predominantly separate genes from one retrotransposon class, which could offer unique domains for the specific regulation of genes or the silencing of autonomous retrotransposons. PMID:19047520
Transcriptional Signatures of Sleep Duration Discordance in Monozygotic Twins.
Watson, N F; Buchwald, D; Delrow, J J; Altemeier, W A; Vitiello, M V; Pack, A I; Bamshad, M; Noonan, C; Gharib, S A
2017-01-01
Habitual short sleep duration is associated with adverse metabolic, cardiovascular, and inflammatory effects. Co-twin study methodologies account for familial (eg, genetics and shared environmental) confounding, allowing assessment of subtle environmental effects, such as the effect of habitual short sleep duration on gene expression. Therefore, we investigated gene expression in monozygotic twins discordant for actigraphically phenotyped habitual sleep duration. Eleven healthy monozygotic twin pairs (82% female; mean age 42.7 years; SD = 18.1), selected based on subjective sleep duration discordance, were objectively phenotyped for habitual sleep duration with 2 weeks of wrist actigraphy. Peripheral blood leukocyte (PBL) RNA from fasting blood samples was obtained on the final day of actigraphic measurement and hybridized to Illumina humanHT-12 microarrays. Differential gene expression was determined between paired samples and mapped to functional categories using Gene Ontology. Finally, a more comprehensive gene set enrichment analysis was performed based on the entire PBL transcriptome. The mean 24-hour sleep duration of the total sample was 439.2 minutes (SD = 46.8 minutes; range 325.4-521.6 minutes). Mean within-pair sleep duration difference per 24 hours was 64.4 minutes (SD = 21.2; range 45.9-114.6 minutes). The twin cohort displayed distinctive pathway enrichment based on sleep duration differences. Habitual short sleep was associated with up-regulation of genes involved in transcription, ribosome, translation, and oxidative phosphorylation. Unexpectedly, genes down-regulated in short sleep twins were highly enriched in immuno-inflammatory pathways such as interleukin signaling and leukocyte activation, as well as developmental programs, coagulation cascade, and cell adhesion. Objectively assessed habitual sleep duration in monozygotic twin pairs appears to be associated with distinct patterns of differential gene expression and pathway enrichment. By accounting for familial confounding and measuring real life sleep duration, our study shows the transcriptomic effects of habitual short sleep on dysregulated immune response and provides a potential link between sleep deprivation and adverse metabolic, cardiovascular, and inflammatory outcomes. © Sleep Research Society 2017. Published by Oxford University Press on behalf of the Sleep Research Society. All rights reserved. For permissions, please e-mail journals.permissions@oup.com.
Detecting cooperative sequences in the binding of RNA Polymerase-II
NASA Astrophysics Data System (ADS)
Glass, Kimberly; Rozenberg, Julian; Girvan, Michelle; Losert, Wolfgang; Ott, Ed; Vinson, Charles
2008-03-01
Regulation of the expression level of genes is a key biological process controlled largely by the 1000 base pair (bp) sequence preceding each gene (the promoter region). Within that region transcription factor binding sites (TFBS), 5-10 bp long sequences, act individually or cooperate together in the recruitment of, and therefore subsequent gene transcription by, RNA Polymerase-II (RNAP). We have measured the binding of RNAP to promoters on a genome-wide basis using Chromatin Immunoprecipitation (ChIP-on-Chip) microarray assays. Using all 8-base pair long sequences as a test set, we have identified the DNA sequences that are enriched in promoters with high RNAP binding values. We are able to demonstrate that virtually all sequences enriched in such promoters contain a CpG dinucleotide, indicating that TFBS that contain the CpG dinucleotide are involved in RNAP binding to promoters. Further analysis shows that the presence of pairs of CpG containing sequences cooperate to enhance the binding of RNAP to the promoter.
Krzmarzick, Mark J.; Miller, Hanna R.; Yan, Tao
2014-01-01
Although the abundance and diversity of natural organochlorines are well established, much is still unknown about the degradation of these compounds. Triplicate microcosms were used to determine whether, and which, bacterial communities could dechlorinate two chlorinated xanthones (2,7-dichloroxanthone and 5,7-dichloro-1,3-dihydroxylxanthone), analogues of a diverse class of natural organochlorines. According to quantitative-PCR (qPCR) results, several known dechlorinating genera were either not present or not enriched during dechlorination of the xanthones. Denaturing gradient gel electrophoresis, however, indicated that several Firmicutes were enriched in the dechlorinating cultures compared to triplicate controls amended with nonchlorinated xanthones. One such group, herein referred to as the Gopher group, was further studied with a novel qPCR method that confirmed enrichment of Gopher group 16S rRNA genes in the dechlorinating cultures. The enrichment of the Gopher group was again tested with two new sets of triplicate microcosms. Enrichment was observed during chlorinated xanthone dechlorination in one set of these triplicate microcosms. In the other set, two microcosms showed clear enrichment while a third did not. The Gopher group is a previously unidentified group of Firmicutes, distinct from but related to the Dehalobacter and Desulfitobacterium genera; this group also contains clones from at least four unique cultures capable of dechlorinating anthropogenic organochlorines that have been previously described in the literature. This study suggests that natural chlorinated xanthones may be effective biostimulants to enhance the remediation of pollutants and highlights the idea that novel genera of dechlorinators likely exist and may be active in bioremediation and the natural cycling of chlorine. PMID:24296507
Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano
2011-04-11
In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
Ray, Pradipta; Torck, Andrew; Quigley, Lilyana; Wangzhou, Andi; Neiman, Matthew; Rao, Chandranshu; Lam, Tiffany; Kim, Ji-Young; Kim, Tae Hoon; Zhang, Michael Q; Dussor, Gregory; Price, Theodore J
2018-03-20
Molecular neurobiological insight into human nervous tissues is needed to generate next generation therapeutics for neurological disorders like chronic pain. We obtained human Dorsal Root Ganglia (DRG) samples from organ donors and performed RNA-sequencing (RNA-seq) to study the human DRG (hDRG) transcriptional landscape, systematically comparing it with publicly available data from a variety of human and orthologous mouse tissues, including mouse DRG (mDRG). We characterized the hDRG transcriptional profile in terms of tissue-restricted gene co-expression patterns and putative transcriptional regulators, and formulated an information-theoretic framework to quantify DRG enrichment. Relevant gene families and pathways were also analyzed, including transcription factors (TFs), g-protein coupled receptors (GCPRs) and ion channels. Our analyses reveal a hDRG-enriched protein-coding gene set (∼140), some of which have not been described in the context of DRG or pain signaling. A majority of these show conserved enrichment in mDRG, and were mined for known drug - gene product interactions. Conserved enrichment of the vast majority of TFs suggest that the mDRG is a faithful model system for studying hDRGs, due to evolutionarily conserved regulatory programs. Comparison of hDRG and tibial nerve transcriptomes suggest trafficking of neuronal mRNA to axons in adult hDRG, and are consistent with studies of axonal transport in rodent sensory neurons. We present our work as an online, searchable repository (https://www.utdallas.edu/bbs/painneurosciencelab/sensoryomics/drgtxome), creating a resource for the community. Our analyses provide insight into DRG biology for guiding development of novel therapeutics, and a blueprint for cross-species transcriptomic analyses.
Dai, Wei; Siddiq, Afshan; Walley, Andrew J; Limpaiboon, Temduang; Brown, Robert
2013-01-01
Genetic abnormalities of cholangiocarcinoma have been widely studied; however, epigenomic changes related to cholangiocarcinogenesis have been less well characterised. We have profiled the DNA methylomes of 28 primary cholangiocarcinoma and six matched adjacent normal tissues using Infinium’s HumanMethylation27 BeadChips with the aim of identifying gene sets aberrantly epigenetically regulated in this tumour type. Using a linear model for microarray data we identified 1610 differentially methylated autosomal CpG sites with 809 CpG sites (representing 603 genes) being hypermethylated and 801 CpG sites (representing 712 genes) being hypomethylated in cholangiocarcinoma versus adjacent normal tissues (false discovery rate ≤ 0.05). Gene ontology and gene set enrichment analyses identified gene sets significantly associated with hypermethylation at linked CpG sites in cholangiocarcinoma including homeobox genes and target genes of PRC2, EED, SUZ12 and histone H3 trimethylation at lysine 27. We confirmed frequent hypermethylation at the homeobox genes HOXA9 and HOXD9 by bisulfite pyrosequencing in a larger cohort of cholangiocarcinoma (n = 102). Our findings indicate a key role for hypermethylation of multiple CpG sites at genes associated with a stem cell-like phenotype as a common molecular aberration in cholangiocarcinoma. These data have implications for cholangiocarcinogenesis, as well as possible novel treatment options using histone methyltransferase inhibitors. PMID:24089088
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren’s Syndrome
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J.; Gillespie, Colin S.; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T.; Emery, Paul; Lanyon, Peter; Hunter, John A.; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E.; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Background Fatigue is a debilitating condition with a significant impact on patients’ quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren’s Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Methods Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren’s Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Results Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Conclusions Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS. PMID:26694930
Gao, Liyang; Chen, Bing; Li, Jinhong; Yang, Fan; Cen, Xuecheng; Liao, Zhuangbing; Long, Xiao’ao
2017-01-01
The Wnt signaling pathway is necessary for the development of the central nervous system and is associated with tumorigenesis in various cancers. However, the mechanism of the Wnt signaling pathway in glioma cells has yet to be elucidated. Small-molecule Wnt modulators such as ICG-001 and AZD2858 were used to inhibit and stimulate the Wnt/β-catenin signaling pathway. Techniques including cell proliferation assay, colony formation assay, Matrigel cell invasion assay, cell cycle assay and Genechip microarray were used. Gene Ontology Enrichment Analysis and Gene Set Enrichment Analysis have enriched many biological processes and signaling pathways. Both the inhibiting and stimulating Wnt/β-catenin signaling pathways could influence the cell cycle, moreover, reduce the proliferation and survival of U87 glioma cells. However, Affymetrix expression microarray indicated that biological processes and networks of signaling pathways between stimulating and inhibiting the Wnt/β-catenin signaling pathway largely differ. We propose that Wnt/β-catenin signaling pathway might prove to be a valuable therapeutic target for glioma. PMID:28837560
Prognostic Power of a Tumor Differentiation Gene Signature for Bladder Urothelial Carcinomas.
Mo, Qianxing; Nikolos, Fotis; Chen, Fengju; Tramel, Zoe; Lee, Yu-Cheng; Hayashi, Kazukuni; Xiao, Jing; Shen, Jianjun; Chan, Keith Syson
2018-05-01
Muscle-invasive bladder cancers (MIBCs) cause approximately 150 000 deaths per year worldwide. Survival for MIBC patients is heterogeneous, with no clinically validated molecular markers that predict clinical outcome. Non-MIBCs (NMIBCs) generally have favorable outcome; however, a portion progress to MIBC. Hence, development of a prognostic tool that can guide decision-making is crucial for improving clinical management of bladder urothelial carcinomas. Tumor grade is defined by pathologic evaluation of tumor cell differentiation, and it often associates with clinical outcome. The current study extrapolates this conventional wisdom and combines it with molecular profiling. We developed an 18-gene signature that molecularly defines urothelial cellular differentiation, thus classifying MIBCs and NMIBCs into two subgroups: basal and differentiated. We evaluated the prognostic capability of this "tumor differentiation signature" and three other existing gene signatures including the The Cancer Genome Atlas (TCGA; 2707 genes), MD Anderson Cancer Center (MDA; 2252 genes/2697 probes), and University of North Carolina at Chapel Hill (UNC; 47 genes) using five gene expression data sets derived from MIBC and NMIBC patients. All statistical tests were two-sided. The tumor differentiation signature demonstrated consistency and statistical robustness toward stratifying MIBC patients into different overall survival outcomes (TCGA cohort 1, P = .03; MDA discovery, P = .009; MDA validation, P = .01), while the other signatures were not as consistent. In addition, we analyzed the progression (Ta/T1 progressing to ≥T2) probability of NMIBCs. NMIBC patients with a basal tumor differentiation signature associated with worse progression outcome (P = .008). Gene functional term enrichment and gene set enrichment analyses revealed that genes involved in the biologic process of immune response and inflammatory response are among the most elevated within basal bladder cancers, implicating them as candidates for immune checkpoint therapies. These results provide definitive evidence that a biology-prioritizing clustering methodology generates meaningful insights into patient stratification and reveals targetable molecular pathways to impact future therapeutic approach.
Lin, Huapeng; Zhang, Qian; Li, Xiaocheng; Wu, Yushen; Liu, Ye; Hu, Yingchun
2018-01-01
Abstract Hepatitis B virus-associated acute liver failure (HBV-ALF) is a rare but life-threatening syndrome that carried a high morbidity and mortality. Our study aimed to explore the possible molecular mechanisms of HBV-ALF by means of bioinformatics analysis. In this study, genes expression microarray datasets of HBV-ALF from Gene Expression Omnibus were collected, and then we identified differentially expressed genes (DEGs) by the limma package in R. After functional enrichment analysis, we constructed the protein–protein interaction (PPI) network by the Search Tool for the Retrieval of Interacting Genes online database and weighted genes coexpression network by the WGCNA package in R. Subsequently, we picked out the hub genes among the DEGs. A total of 423 DEGs with 198 upregulated genes and 225 downregulated genes were identified between HBV-ALF and normal samples. The upregulated genes were mainly enriched in immune response, and the downregulated genes were mainly enriched in complement and coagulation cascades. Orosomucoid 1 (ORM1), orosomucoid 2 (ORM2), plasminogen (PLG), and aldehyde oxidase 1 (AOX1) were picked out as the hub genes that with a high degree in both PPI network and weighted genes coexpression network. The weighted genes coexpression network analysis found out 3 of the 5 modules that upregulated genes enriched in were closely related to immune system. The downregulated genes enriched in only one module, and the genes in this module majorly enriched in the complement and coagulation cascades pathway. In conclusion, 4 genes (ORM1, ORM2, PLG, and AOX1) with immune response and the complement and coagulation cascades pathway may take part in the pathogenesis of HBV-ALF, and these candidate genes and pathways could be therapeutic targets for HBV-ALF. PMID:29384847
Lee, Hyeonjeong; Shin, Miyoung
2017-01-01
The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.
Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes
Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; ...
2014-01-01
Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Pazos Obregón, Flavio; Papalardo, Cecilia; Castro, Sebastián; Guerberoff, Gustavo; Cantera, Rafael
2015-09-15
Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Although roughly a thousand genes are expected to be important for this function in Drosophila melanogaster, just a few hundreds of them are known so far. In this work we trained three learning algorithms to predict a "synaptic function" for genes of Drosophila using data from a whole-body developmental transcriptome published by others. Using statistical and biological criteria to analyze and combine the predictions, we obtained a gene catalogue that is highly enriched in genes of relevance for Drosophila synapse assembly and function but still not recognized as such. The utility of our approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.
Wang, Pingping; Zheng, Min; Liu, Jian; Liu, Yongzhuang; Lu, Jianguo; Sun, Xiaowen
2016-08-26
In this study, we performed a comprehensive analysis of the transcriptome of one- and two-year-old male and female brains of Cynoglossus semilaevis by high-throughput Illumina sequencing. A total of 77,066 transcripts, corresponding to 21,475 unigenes, were obtained with a N50 value of 4349 bp. Of these unigenes, 33 genes were found to have significant differential expression and potentially associated with growth, from which 18 genes were down-regulated and 12 genes were up-regulated in two-year-old males, most of these genes had no significant differences in expression among one-year-old males and females and two-year-old females. A similar analysis was conducted to look for genes associated with reproduction; 25 genes were identified, among them, five genes were found to be down regulated and 20 genes up regulated in two-year-old males, again, most of the genes had no significant expression differences among the other three. The performance of up regulated genes in Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was significantly different between two-year-old males and females. Males had a high gene expression in genetic information processing, while female's highly expressed genes were mainly enriched on organismal systems. Our work identified a set of sex-biased genes potentially associated with growth and reproduction that might be the candidate factors affecting sexual dimorphism of tongue sole, laying the foundation to understand the complex process of sex determination of this economic valuable species.
Gibbons, Taylor C; Metzger, David C H; Healy, Timothy M; Schulte, Patricia M
2017-05-01
Phenotypic plasticity is thought to facilitate the colonization of novel environments and shape the direction of evolution in colonizing populations. However, the relative prevalence of various predicted patterns of changes in phenotypic plasticity following colonization remains unclear. Here, we use a whole-transcriptome approach to characterize patterns of gene expression plasticity in the gills of a freshwater-adapted and a saltwater-adapted ecotype of threespine stickleback (Gasterosteus aculeatus) exposed to a range of salinities. The response of the gill transcriptome to environmental salinity had a large shared component common to both ecotypes (2159 genes) with significant enrichment of genes involved in transmembrane ion transport and the restructuring of the gill epithelium. This transcriptional response to freshwater acclimation is induced at salinities below two parts per thousand. There was also differentiation in gene expression patterns between ecotypes (2515 genes), particularly in processes important for changes in the gill structure and permeability. Only 508 genes that differed between ecotypes also responded to salinity and no specific processes were enriched among this gene set, and an even smaller number (87 genes) showed evidence of changes in the extent of the response to salinity acclimation between ecotypes. No pattern of relative expression dominated among these genes, suggesting that neither gains nor losses of plasticity dominated the changes in expression patterns between the ecotypes. These data demonstrate that multiple patterns of changes in gene expression plasticity can occur following colonization of novel habitats. © 2017 John Wiley & Sons Ltd.
Wu, Chengjiang; Zhao, Yangjing; Lin, Yu; Yang, Xinxin; Yan, Meina; Min, Yujiao; Pan, Zihui; Xia, Sheng; Shao, Qixiang
2018-01-01
DNA microarray and high-throughput sequencing have been widely used to identify the differentially expressed genes (DEGs) in systemic lupus erythematosus (SLE). However, the big data from gene microarrays are also challenging to work with in terms of analysis and processing. The presents study combined data from the microarray expression profile (GSE65391) and bioinformatics analysis to identify the key genes and cellular pathways in SLE. Gene ontology (GO) and cellular pathway enrichment analyses of DEGs were performed to investigate significantly enriched pathways. A protein-protein interaction network was constructed to determine the key genes in the occurrence and development of SLE. A total of 310 DEGs were identified in SLE, including 193 upregulated genes and 117 downregulated genes. GO analysis revealed that the most significant biological process of DEGs was immune system process. Kyoto Encyclopedia of Genes and Genome pathway analysis showed that these DEGs were enriched in signaling pathways associated with the immune system, including the RIG-I-like receptor signaling pathway, intestinal immune network for IgA production, antigen processing and presentation and the toll-like receptor signaling pathway. The current study screened the top 10 genes with higher degrees as hub genes, which included 2′-5′-oligoadenylate synthetase 1, MX dynamin like GTPase 2, interferon induced protein with tetratricopeptide repeats 1, interferon regulatory factor 7, interferon induced with helicase C domain 1, signal transducer and activator of transcription 1, ISG15 ubiquitin-like modifier, DExD/H-box helicase 58, interferon induced protein with tetratricopeptide repeats 3 and 2′-5′-oligoadenylate synthetase 2. Module analysis revealed that these hub genes were also involved in the RIG-I-like receptor signaling, cytosolic DNA-sensing, toll-like receptor signaling and ribosome biogenesis pathways. In addition, these hub genes, from different probe sets, exhibited significant co-expressed tendency in multi-experiment microarray datasets (P<0.01). In conclusion, these key genes and cellular pathways may improve the current understanding of the underlying mechanism of development of SLE. These key genes may be potential biomarkers of diagnosis, therapy and prognosis for SLE. PMID:29257335
Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I
2014-01-01
Background Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. Objective The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player’s prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Methods We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Results Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. Conclusions The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge. PMID:25654473
Wang, Yinxiao; Wang, Wensheng; Zhao, Xiuqin; Zhang, Shilai; Zhang, Jing; Hu, Fengyi; Li, Zhikang
2017-01-01
Rice (Oryza sativa) is very sensitive to chilling stress at seedling and reproductive stages, whereas wild rice, O. longistaminata, tolerates non-freezing cold temperatures and has overwintering ability. Elucidating the molecular mechanisms of chilling tolerance (CT) in O. longistaminata should thus provide a basis for rice CT improvement through molecular breeding. In this study, high-throughput RNA sequencing was performed to profile global transcriptome alterations and crucial genes involved in response to long-term low temperature in O. longistaminata shoots and rhizomes subjected to 7 days of chilling stress. A total of 605 and 403 genes were respectively identified as up- and down-regulated in O. longistaminata under 7 days of chilling stress, with 354 and 371 differentially expressed genes (DEGs) found exclusively in shoots and rhizomes, respectively. GO enrichment and KEGG pathway analyses revealed that multiple transcriptional regulatory pathways were enriched in commonly induced genes in both tissues; in contrast, only the photosynthesis pathway was prevalent in genes uniquely induced in shoots, whereas several key metabolic pathways and the programmed cell death process were enriched in genes induced only in rhizomes. Further analysis of these tissue-specific DEGs showed that the CBF/DREB1 regulon and other transcription factors (TFs), including AP2/EREBPs, MYBs, and WRKYs, were synergistically involved in transcriptional regulation of chilling stress response in shoots. Different sets of TFs, such as OsERF922, OsNAC9, OsWRKY25, and WRKY74, and eight genes encoding antioxidant enzymes were exclusively activated in rhizomes under long-term low-temperature treatment. Furthermore, several cis-regulatory elements, including the ICE1-binding site, the GATA element for phytochrome regulation, and the W-box for WRKY binding, were highly abundant in both tissues, confirming the involvement of multiple regulatory genes and complex networks in the transcriptional regulation of CT in O. longistaminata. Finally, most chilling-induced genes with alternative splicing exclusive to shoots were associated with photosynthesis and regulation of gene expression, while those enriched in rhizomes were primarily related to stress signal transduction; this indicates that tissue-specific transcriptional and post-transcriptional regulation mechanisms synergistically contribute to O. longistaminata long-term CT. Our findings provide an overview of the complex regulatory networks of CT in O. longistaminata. PMID:29190752
Zhang, Ting; Huang, Liyu; Wang, Yinxiao; Wang, Wensheng; Zhao, Xiuqin; Zhang, Shilai; Zhang, Jing; Hu, Fengyi; Fu, Binying; Li, Zhikang
2017-01-01
Rice (Oryza sativa) is very sensitive to chilling stress at seedling and reproductive stages, whereas wild rice, O. longistaminata, tolerates non-freezing cold temperatures and has overwintering ability. Elucidating the molecular mechanisms of chilling tolerance (CT) in O. longistaminata should thus provide a basis for rice CT improvement through molecular breeding. In this study, high-throughput RNA sequencing was performed to profile global transcriptome alterations and crucial genes involved in response to long-term low temperature in O. longistaminata shoots and rhizomes subjected to 7 days of chilling stress. A total of 605 and 403 genes were respectively identified as up- and down-regulated in O. longistaminata under 7 days of chilling stress, with 354 and 371 differentially expressed genes (DEGs) found exclusively in shoots and rhizomes, respectively. GO enrichment and KEGG pathway analyses revealed that multiple transcriptional regulatory pathways were enriched in commonly induced genes in both tissues; in contrast, only the photosynthesis pathway was prevalent in genes uniquely induced in shoots, whereas several key metabolic pathways and the programmed cell death process were enriched in genes induced only in rhizomes. Further analysis of these tissue-specific DEGs showed that the CBF/DREB1 regulon and other transcription factors (TFs), including AP2/EREBPs, MYBs, and WRKYs, were synergistically involved in transcriptional regulation of chilling stress response in shoots. Different sets of TFs, such as OsERF922, OsNAC9, OsWRKY25, and WRKY74, and eight genes encoding antioxidant enzymes were exclusively activated in rhizomes under long-term low-temperature treatment. Furthermore, several cis-regulatory elements, including the ICE1-binding site, the GATA element for phytochrome regulation, and the W-box for WRKY binding, were highly abundant in both tissues, confirming the involvement of multiple regulatory genes and complex networks in the transcriptional regulation of CT in O. longistaminata. Finally, most chilling-induced genes with alternative splicing exclusive to shoots were associated with photosynthesis and regulation of gene expression, while those enriched in rhizomes were primarily related to stress signal transduction; this indicates that tissue-specific transcriptional and post-transcriptional regulation mechanisms synergistically contribute to O. longistaminata long-term CT. Our findings provide an overview of the complex regulatory networks of CT in O. longistaminata.
Raghunath, Pendru; Acharya, Sadananda; Bhanumathi, Amarbahadur; Karunasagar, Iddya; Karunasagar, Indrani
2008-09-01
The levels of total and tdh(+)Vibrio parahaemolyticus were estimated in 83 seafood samples from southwest coast of India by colony hybridization. Conventional enrichment and isolation technique was also used to study the prevalence. Polymerase chain reaction (PCR) was performed on bacterial cell lyates for detection of total and pathogenic V. parahaemolyticus by amplification of specific genes. Of 83 samples tested, V. parahaemolyticus could be detected in 74 (89.2%) samples and tdh(+)V. parahaemolyticus in 5 (6.0%) samples by colony hybridization. V. parahaemolyticus was detected in 68 (81.9%) of 83 samples after 18 h of enrichment by PCR, and isolated from 63 (75.9%) of 83 samples by conventional isolation. The virulence genes tdh and trh could be detected in 8.4% and 25.3%, respectively, in the sample enrichment broths by PCR. Use of colony hybridization following enrichment to achieve sensitive detection of tdh(+)V. parahaemolyticus in seafood was evaluated using another set of 58 seafood samples. Thirty pathogenic V. parahaemolyticus strains isolated during the study were screened by PCR for genetic markers to be specific for the detection of the pandemic clone. Results of this study suggest that the GS-PCR may serve as a reliable genetic marker for the pandemic clone of V. parahaemolyticus.
Zhang, Wei-Dong; Zhao, Yong; Zhang, Hong-Fu; Wang, Shu-Kun; Hao, Zhi-Hui; Liu, Jing; Yuan, Yu-Qing; Zhang, Peng-Fei; Yang, Hong-Di; Shen, Wei; Li, Lan
2016-08-01
Granulosa cells (GCs) are those somatic cells closest to the female germ cell. GCs play a vital role in oocyte growth and development, and the oocyte is necessary for multiplication of a species. Zinc oxide (ZnO) nanoparticles (NPs) readily cross biologic barriers to be absorbed into biologic systems that make them promising candidates as food additives. The objective of the present investigation was to explore the impact of intact NPs on gene expression and the functional classification of altered genes in hen GCs in vivo, to compare the data from in vivo and in vitro studies, and finally to point out the adverse effects of ZnO NPs on the reproductive system. After a 24-week treatment, hen GCs were isolated and gene expression was quantified. Intact NPs were found in the ovary and other organs. Zn levels were similar in ZnO-NP-100 mg/kg- and ZnSO4-100 mg/kg-treated hen ovaries. ZnO-NP-100 mg/kg and ZnSO4-100 mg/kg regulated the expression of the same sets of genes, and they also altered the expression of different sets of genes individually. The number of genes altered by the ZnO-NP-100 mg/kg and ZnSO4-100 mg/kg treatments was different. Gene Ontology (GO) functional analysis reported that different results for the two treatments and, in Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, 12 pathways (out of the top 20 pathways) in each treatment were different. These results suggested that intact NPs and Zn(2+) had different effects on gene expression in GCs in vivo. In our recent publication, we noted that intact NPs and Zn(2+) differentially altered gene expression in GCs in vitro. However, GO functional classification and KEGG pathway enrichment analyses revealed close similarities for the changed genes in vivo and in vitro after ZnO NP treatment. Furthermore, close similarities were observed for the changed genes after ZnSO4 treatments in vivo and in vitro by GO functional classification and KEGG pathway enrichment analyses. Therefore, the effects of ZnO NPs on gene expression in vitro might represent their effects on gene expression in vivo. The results from this study and our earlier studies support previous findings indicating ZnO NPs promote adverse effects on organisms. Therefore, precautions should be taken when ZnO NPs are used as diet additives for hens because they might cause reproductive issues. Copyright © 2016 Elsevier Inc. All rights reserved.
CD25 Preselective Anti-HIV Vectors for Improved HIV Gene Therapy
Kalomoiris, Stefanos; Lawson, Je'Tai; Chen, Rachel X.; Bauer, Gerhard; Nolta, Jan A.
2012-01-01
Abstract As HIV continues to be a global public health problem with no effective vaccine available, new and innovative therapies, including HIV gene therapies, need to be developed. Due to low transduction efficiencies that lead to low in vivo gene marking, therapeutically relevant efficacy of HIV gene therapy has been difficult to achieve in a clinical setting. Methods to improve the transplantation of enriched populations of anti-HIV vector-transduced cells may greatly increase the in vivo efficacy of HIV gene therapies. Here we describe the development of preselective anti-HIV lentiviral vectors that allow for the purification of vector-transduced cells to achieve an enriched population of HIV-resistant cells. A selectable protein, human CD25, not normally found on CD34+ hematopoietic progenitor cells (HPCs), was incorporated into a triple combination anti-HIV lentiviral vector. Upon purification of cells transduced with the preselective anti-HIV vector, safety was demonstrated in CD34+ HPCs and in HPC-derived macrophages in vitro. Upon challenge with HIV-1, improved efficacy was observed in purified preselective anti-HIV vector-transduced macrophages compared to unpurified cells. These proof-of-concept results highlight the potential use of this method to improve HIV stem cell gene therapy for future clinical applications. PMID:23216020
Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C
2009-02-01
Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.
Learning about Inheritance in an Out-of-School Setting
ERIC Educational Resources Information Center
Dairianathan, Anne; Subramaniam, R.
2011-01-01
The purpose of this study was to investigate primary students' learning through participation in an out-of-school enrichment programme, held in a science centre, which focused on DNA and genes and whether participation in the programme led to an increased understanding of inheritance as well as promoted interest in the topic. The sample consisted…
Evans, Joseph R.; Zhao, Shuang G.; Chang, S. Laura; Tomlins, Scott A.; Erho, Nicholas; Sboner, Andrea; Schiewer, Matthew J.; Spratt, Daniel E.; Kothari, Vishal; Klein, Eric A.; Den, Robert B.; Dicker, Adam P.; Karnes, R. Jeffrey; Yu, Xiaochun; Nguyen, Paul L.; Rubin, Mark A.; de Bono, Johann; Knudsen, Karen E.; Davicioni, Elai; Feng, Felix Y.
2017-01-01
IMPORTANCE A substantial number of patients diagnosed with high-risk prostate cancer are at risk for metastatic progression after primary treatment. Better biomarkers are needed to identify patients at the highest risk to guide therapy intensification. OBJECTIVE To create a DNA damage and repair (DDR) pathway profiling method for use as a prognostic signature biomarker in high-risk prostate cancer. DESIGN, SETTING, AND PARTICIPANTS A cohort of 1090 patients with high-risk prostate cancer who underwent prostatectomy and were treated at 3 different academic institutions were divided into a training cohort (n = 545) and 3 pooled validation cohorts (n = 232, 130, and 183) assembled for case-control or case-cohort studies. Profiling of 9 DDR pathways using 17 gene sets for GSEA (Gene Set Enrichment Analysis) of high-density microarray gene expression data from formalin-fixed paraffin-embedded prostatectomy samples with median 10.3 years follow-up was performed. Prognostic signature development from DDR pathway profiles was studied, and DDR pathway gene mutation in published cohorts was analyzed. MAIN OUTCOMES AND MEASURES Biochemical recurrence-free, metastasis-free, and overall survival. RESULTS Across the training cohort and pooled validation cohorts, 1090 men were studied; mean (SD) age at diagnosis was 65.3 (6.4) years. We found that there are distinct clusters of DDR pathways within the cohort, and DDR pathway enrichment is only weakly correlated with clinical variables such as age (Spearman ρ [ρ], range, −0.07 to 0.24), Gleason score (ρ, range, 0.03 to 0.20), prostate-specific antigen level (ρ, range, −0.07 to 0.10), while 13 of 17 DDR gene sets are strongly correlated with androgen receptor pathway enrichment (ρ, range, 0.33 to 0.82). In published cohorts, DDR pathway genes are rarely mutated. A DDR pathway profile prognostic signature built in the training cohort was significantly associated with biochemical recurrence-free, metastasis-free, and overall survival in the pooled validation cohorts independent of standard clinicopathological variables. The prognostic performance of the signature for metastasis-free survival appears to be stronger in the younger patients (HR, 1.67; 95%CI, 1.12–2.50) than in the older patients (HR, 0.77; 95%CI, 0.29–2.07) on multivariate Cox analysis. CONCLUSIONS AND RELEVANCE DNA damage and repair pathway profiling revealed patient-level variations and the DDR pathways are rarely affected by mutation. A DDR pathway signature showed strong prognostic performance with the long-term outcomes of metastasis-free and overall survival that may be useful for risk stratification of high-risk prostate cancer patients. PMID:26746117
Xu, Yan; Chen, Yan; Li, Daliang; Liu, Qing; Xuan, Zhenyu; Li, Wen-Hong
2017-02-01
MicroRNAs are small non-coding RNAs acting as posttranscriptional repressors of gene expression. Identifying mRNA targets of a given miRNA remains an outstanding challenge in the field. We have developed a new experimental approach, TargetLink, that applied locked nucleic acid (LNA) as the affinity probe to enrich target genes of a specific microRNA in intact cells. TargetLink also consists a rigorous and systematic data analysis pipeline to identify target genes by comparing LNA-enriched sequences between experimental and control samples. Using miR-21 as a test microRNA, we identified 12 target genes of miR-21 in a human colorectal cancer cell by this approach. The majority of the identified targets interacted with miR-21 via imperfect seed pairing. Target validation confirmed that miR-21 repressed the expression of the identified targets. The cellular abundance of the identified miR-21 target transcripts varied over a wide range, with some targets expressed at a rather low level, confirming that both abundant and rare transcripts are susceptible to regulation by microRNAs, and that TargetLink is an efficient approach for identifying the target set of a specific microRNA in intact cells. C20orf111, one of the novel targets identified by TargetLink, was found to reside in the nuclear speckle and to be reliably repressed by miR-21 through the interaction at its coding sequence.
Shahmanesh, Mohsen; Phillips, Kenneth; Boothby, Meg; Tomlinson, Jeremy W.
2015-01-01
Objective To compare changes in gene expression by microarray from subcutaneous adipose tissue from HIV treatment naïve patients treated with efavirenz based regimens containing abacavir (ABC), tenofovir (TDF) or zidovidine (AZT). Design Subcutaneous fat biopsies were obtained before, at 6- and 18–24-months after treatment, and from HIV negative controls. Groups were age, ethnicity, weight, biochemical profile, and pre-treatment CD4 count matched. Microarray data was generated using the Agilent Whole Human Genome Microarray. Identification of differentially expressed genes and genomic response pathways was performed using limma and gene set enrichment analysis. Results There were significant divergences between ABC and the other two groups 6 months after treatment in genes controlling cell adhesion and environmental information processing, with some convergence at 18–24 months. Compared to controls the ABC group, but not AZT or TDF showed enrichment of genes controlling adherence junction, at 6 months and 18–24 months (adjusted p<0.05) and focal adhesions and tight junction at 6 months (p<0.5). Genes controlling leukocyte transendothelial migration (p<0.05) and ECM-receptor interactions (p = 0.04) were over-expressed in ABC compared to TDF and AZT at 6 months but not at 18–24 months. Enrichment of pathways and individual genes controlling cell adhesion and environmental information processing were specifically dysregulated in the ABC group in comparison with other treatments. There was little difference between AZT and TDF. Conclusion After initiating treatment, there is divergence in the expression of genes controlling cell adhesion and environmental information processing between ABC and both TDF and AZT in subcutaneous adipose tissue. If similar changes are also taking place in other tissues including the coronary vasculature they may contribute to the increased risk of cardiovascular events reported in patients recently started on abacavir-containing regimens. PMID:25617630
The Transcriptome Signature of the Receptive Bovine Uterus Determined at Early Gestation
Binelli, Mario; Scolari, Saara C.; Pugliesi, Guilherme; Van Hoeck, Veerle; Gonella-Diaza, Angela M.; Andrade, Sónia C. S.; Gasparin, Gustavo R.; Coutinho, Luiz L.
2015-01-01
Pregnancy success is critical to the profitability of cattle operations. However, the molecular events driving the uterine tissue towards embryo receptivity are poorly understood. This study aimed to characterize the uterine transcriptome profiles of pregnant (P) versus non-pregnant (NP) cows during early pregnancy and attempted to define a potential set of marker genes that can be valuable for predicting pregnancy outcome. Therefore, beef cows were synchronized (n=51) and artificially inseminated (n=36) at detected estrus. Six days after AI (D6), jugular blood samples and a biopsy from the uterine horn contralateral to the ovary containing the corpus luteum were collected. Based on pregnancy outcome on D30, samples were retrospectively allocated to the following groups: P (n=6) and NP (n=5). Both groups had similar plasma progesterone concentrations on D6. Uterine biopsies were submitted to RNA-Seq analysis in a Illumina platform. The 272,685,768 million filtered reads were mapped to the Bos Taurus reference genome and 14,654 genes were analyzed for differential expression between groups. Transcriptome data showed that 216 genes are differently expressed when comparing NP versus P uterine tissue (Padj≤0.1). More specifically, 36 genes were up-regulated in P cows and 180 are up-regulated in NP cows. Functional enrichment and pathway analyses revealed enriched expression of genes associated with extracellular matrix remodeling in the NP cows and nucleotide binding, microsome and vesicular fraction in the P cows. From the 40 top-ranked genes, the transcript levels of nine genes were re-evaluated using qRT-PCR. In conclusion, this study characterized a unique set of genes, expressed in the uterus 6 days after insemination, that indicate a receptive state leading to pregnancy success. Furthermore, expression of such genes can be used as potential markers to efficiently predict pregnancy success. PMID:25849079
Yang, Xinan Holly; Li, Meiyi; Wang, Bin; Zhu, Wanqi; Desgardin, Aurelie; Onel, Kenan; de Jong, Jill; Chen, Jianjun; Chen, Luonan; Cunningham, John M
2015-03-24
Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational "gene-to-function, snapshot-to-dynamics, and biology-to-clinic" framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform. We introduced an adjustable "soft threshold" to a functional gene-set algorithm and found that two different analysis methods identified distinct gene-set signatures from the same samples. We identified a 30-gene cluster that characterizes leukemic stem cell (LSC)-depleted cells and a 25-gene cluster that characterizes LSC-enriched cells in parallel; both mark favorable-prognosis in AML. Genes within each signature significantly share common biological processes and/or molecular functions (empirical p = 6e-5 and 0.03 respectively). The 25-gene signature reflects the abnormal development of stem cells in AML, such as AURKA over-expression. We subsequently determined that the clinical relevance of both signatures is independent of known clinical risk classifications in 214 patients with cytogenetically normal AML. We successfully validated the prognosis of both signatures in two independent cohorts of 91 and 242 patients respectively (log-rank p < 0.0015 and 0.05; empirical p < 0.015 and 0.08). The proposed algorithms and computational framework will harness systems biology research because they efficiently translate gene-sets (rather than single genes) into biological discoveries about AML and other complex diseases.
Miraldi Utz, Virginia
2017-01-01
Myopia is the most common eye disorder and major cause of visual impairment worldwide. As the incidence of myopia continues to rise, the need to further understand the complex roles of molecular and environmental factors controlling variation in refractive error is of increasing importance. Tkatchenko and colleagues applied a systematic approach using a combination of gene set enrichment analysis, genome-wide association studies, and functional analysis of a murine model to identify a myopia susceptibility gene, APLP2. Differential expression of refractive error was associated with time spent reading for those with low frequency variants in this gene. This provides support for the longstanding hypothesis of gene-environment interactions in refractive error development.
Genetic identification of brain cell types underlying schizophrenia.
Skene, Nathan G; Bryois, Julien; Bakken, Trygve E; Breen, Gerome; Crowley, James J; Gaspar, Héléna A; Giusti-Rodriguez, Paola; Hodge, Rebecca D; Miller, Jeremy A; Muñoz-Manchado, Ana B; O'Donovan, Michael C; Owen, Michael J; Pardiñas, Antonio F; Ryge, Jesper; Walters, James T R; Linnarsson, Sten; Lein, Ed S; Sullivan, Patrick F; Hjerling-Leffler, Jens
2018-06-01
With few exceptions, the marked advances in knowledge about the genetic basis of schizophrenia have not converged on findings that can be confidently used for precise experimental modeling. By applying knowledge of the cellular taxonomy of the brain from single-cell RNA sequencing, we evaluated whether the genomic loci implicated in schizophrenia map onto specific brain cell types. We found that the common-variant genomic results consistently mapped to pyramidal cells, medium spiny neurons (MSNs) and certain interneurons, but far less consistently to embryonic, progenitor or glial cells. These enrichments were due to sets of genes that were specifically expressed in each of these cell types. We also found that many of the diverse gene sets previously associated with schizophrenia (genes involved in synaptic function, those encoding mRNAs that interact with FMRP, antipsychotic targets, etc.) generally implicated the same brain cell types. Our results suggest a parsimonious explanation: the common-variant genetic results for schizophrenia point at a limited set of neurons, and the gene sets point to the same cells. The genetic risk associated with MSNs did not overlap with that of glutamatergic pyramidal cells and interneurons, suggesting that different cell types have biologically distinct roles in schizophrenia.
Ji, S C; Pan, Y T; Lu, Q Y; Sun, Z Y; Liu, Y Z
2014-03-17
The purpose of this study was to identify critical genes associated with septic multiple trauma by comparing peripheral whole blood samples from multiple trauma patients with and without sepsis. A microarray data set was downloaded from the Gene Expression Omnibus (GEO) database. This data set included 70 samples, 36 from multiple trauma patients with sepsis and 34 from multiple trauma patients without sepsis (as a control set). The data were preprocessed, and differentially expressed genes (DEGs) were then screened for using packages of the R language. Functional analysis of DEGs was performed with DAVID. Interaction networks were then established for the most up- and down-regulated genes using HitPredict. Pathway-enrichment analysis was conducted for genes in the networks using WebGestalt. Fifty-eight DEGs were identified. The expression levels of PLAU (down-regulated) and MMP8 (up-regulated) presented the largest fold-changes, and interaction networks were established for these genes. Further analysis revealed that PLAT (plasminogen activator, tissue) and SERPINF2 (serpin peptidase inhibitor, clade F, member 2), which interact with PLAU, play important roles in the pathway of the component and coagulation cascade. We hypothesize that PLAU is a major regulator of the component and coagulation cascade, and down-regulation of PLAU results in dysfunction of the pathway, causing sepsis.
Yoo, Seungyeul; Takikawa, Sachiko; Geraghty, Patrick; Argmann, Carmen; Campbell, Joshua; Lin, Luan; Huang, Tao; Tu, Zhidong; Foronjy, Robert F; Feronjy, Robert; Spira, Avrum; Schadt, Eric E; Powell, Charles A; Zhu, Jun
2015-01-01
Chronic Obstructive Pulmonary Disease (COPD) is a complex disease. Genetic, epigenetic, and environmental factors are known to contribute to COPD risk and disease progression. Therefore we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity. EPAS1 is distinct in comparison with other key regulators in terms of methylation profile and downstream target genes. Genes predicted to be regulated by EPAS1 were enriched for biological processes including signaling, cell communications, and system development. We confirmed that EPAS1 protein levels are lower in human COPD lung tissue compared to non-disease controls and that Epas1 gene expression is reduced in mice chronically exposed to cigarette smoke. As EPAS1 downstream genes were significantly enriched for hypoxia responsive genes in endothelial cells, we tested EPAS1 function in human endothelial cells. EPAS1 knockdown by siRNA in endothelial cells impacted genes that significantly overlapped with EPAS1 downstream genes in lung tissue including hypoxia responsive genes, and genes associated with emphysema severity. Our first integrative analysis of genome-wide DNA methylation and gene expression profiles illustrates that not only does DNA methylation play a 'causal' role in the molecular pathophysiology of COPD, but it can be leveraged to directly identify novel key mediators of this pathophysiology.
Campos, Bruno; Fletcher, Danielle; Piña, Benjamín; Tauler, Romà; Barata, Carlos
2018-05-18
Unravelling the link between genes and environment across the life cycle is a challenging goal that requires model organisms with well-characterized life-cycles, ecological interactions in nature, tractability in the laboratory, and available genomic tools. Very few well-studied invertebrate model species meet these requirements, being the waterflea Daphnia magna one of them. Here we report a full genome transcription profiling of D. magna during its life-cycle. The study was performed using a new microarray platform designed from the complete set of gene models representing the whole transcribed genome of D. magna. Up to 93% of the existing 41,317 D. magna gene models showed differential transcription patterns across the developmental stages of D. magna, 59% of which were functionally annotated. Embryos showed the highest number of unique transcribed genes, mainly related to DNA, RNA, and ribosome biogenesis, likely related to cellular proliferation and morphogenesis of the several body organs. Adult females showed an enrichment of transcripts for genes involved in reproductive processes. These female-specific transcripts were essentially absent in males, whose transcriptome was enriched in specific genes of male sexual differentiation genes, like doublesex. Our results define major characteristics of transcriptional programs involved in the life-cycle, differentiate males and females, and show that large scale gene-transcription data collected in whole animals can be used to identify genes involved in specific biological and biochemical processes.
de Jong, Simone; Boks, Marco P. M.; Fuller, Tova F.; Strengman, Eric; Janson, Esther; de Kovel, Carolien G. F.; Ori, Anil P. S.; Vi, Nancy; Mulder, Flip; Blom, Jan Dirk; Glenthøj, Birte; Schubart, Chris D.; Cahn, Wiepke; Kahn, René S.; Horvath, Steve; Ophoff, Roel A.
2012-01-01
Despite large-scale genome-wide association studies (GWAS), the underlying genes for schizophrenia are largely unknown. Additional approaches are therefore required to identify the genetic background of this disorder. Here we report findings from a large gene expression study in peripheral blood of schizophrenia patients and controls. We applied a systems biology approach to genome-wide expression data from whole blood of 92 medicated and 29 antipsychotic-free schizophrenia patients and 118 healthy controls. We show that gene expression profiling in whole blood can identify twelve large gene co-expression modules associated with schizophrenia. Several of these disease related modules are likely to reflect expression changes due to antipsychotic medication. However, two of the disease modules could be replicated in an independent second data set involving antipsychotic-free patients and controls. One of these robustly defined disease modules is significantly enriched with brain-expressed genes and with genetic variants that were implicated in a GWAS study, which could imply a causal role in schizophrenia etiology. The most highly connected intramodular hub gene in this module (ABCF1), is located in, and regulated by the major histocompatibility (MHC) complex, which is intriguing in light of the fact that common allelic variants from the MHC region have been implicated in schizophrenia. This suggests that the MHC increases schizophrenia susceptibility via altered gene expression of regulatory genes in this network. PMID:22761806
2011-11-16
nickel, cadmium, and chromium are toxic industrial chemicals with an exposure. While these substances are known to produce adverse health effects leading...in both occupational and environmental settings that may cause harmful outcomes. While these substances are known to produce adverse health effects...that particular bin. A chi-squared test was used to test bin enrichment ( p ≤0.05). Probe sets that did not contain any biological process annotation were
RAMONA: a Web application for gene set analysis on multilevel omics data.
Sass, Steffen; Buettner, Florian; Mueller, Nikola S; Theis, Fabian J
2015-01-01
Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data, which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds. RAMONA is implemented as ASP.NET Web application and publicly available at http://icb.helmholtz-muenchen.de/ramona. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Biswas, Nidhan K; Chandra, Vikas; Sarkar-Roy, Neeta; Das, Tapojyoti; Bhattacharya, Rabindra N; Tripathy, Laxmi N; Basu, Sunandan K; Kumar, Shantanu; Das, Subrata; Chatterjee, Ankita; Mukherjee, Ankur; Basu, Pryiadarshi; Maitra, Arindam; Chattopadhyay, Ansuman; Basu, Analabha; Dhara, Surajit
2015-01-21
Neoplastic cells of Glioblastoma multiforme (GBM) may or may not show sustained response to temozolomide (TMZ) chemotherapy. We hypothesize that TMZ chemotherapy response in GBM is predetermined in its neoplastic clones via a specific set of mutations that alter relevant pathways. We describe exome-wide enrichment of variant allele frequencies (VAFs) in neurospheres displaying contrasting phenotypes of sustained versus reversible TMZ-responses in vitro. Enrichment of VAFs was found on genes ST5, RP6KA1 and PRKDC in cells showing sustained TMZ-effect whereas on genes FREM2, AASDH and STK36, in cells showing reversible TMZ-effect. Ingenuity pathway analysis (IPA) revealed that these genes alter cell-cycle, G2/M-checkpoint-regulation and NHEJ pathways in sustained TMZ-effect cells whereas the lysine-II&V/phenylalanine degradation and sonic hedgehog (Hh) pathways in reversible TMZ-effect cells. Next, we validated the likely involvement of the Hh-pathway in TMZ-response on additional GBM neurospheres as well as on GBM patients, by extracting RNA-sequencing-based gene expression data from the TCGA-GBM database. Finally, we demonstrated TMZ-sensitization of a TMZ non-responder neurosphere in vitro by treating them with the FDA-approved pharmacological Hh-pathway inhibitor vismodegib. Altogether, our results indicate that the Hh-pathway impedes sustained TMZ-response in GBM and could be a potential therapeutic target to enhance TMZ-response in this malignancy.
Evans, Joseph R; Zhao, Shuang G; Chang, S Laura; Tomlins, Scott A; Erho, Nicholas; Sboner, Andrea; Schiewer, Matthew J; Spratt, Daniel E; Kothari, Vishal; Klein, Eric A; Den, Robert B; Dicker, Adam P; Karnes, R Jeffrey; Yu, Xiaochun; Nguyen, Paul L; Rubin, Mark A; de Bono, Johann; Knudsen, Karen E; Davicioni, Elai; Feng, Felix Y
2016-04-01
A substantial number of patients diagnosed with high-risk prostate cancer are at risk for metastatic progression after primary treatment. Better biomarkers are needed to identify patients at the highest risk to guide therapy intensification. To create a DNA damage and repair (DDR) pathway profiling method for use as a prognostic signature biomarker in high-risk prostate cancer. A cohort of 1090 patients with high-risk prostate cancer who underwent prostatectomy and were treated at 3 different academic institutions were divided into a training cohort (n = 545) and 3 pooled validation cohorts (n = 232, 130, and 183) assembled for case-control or case-cohort studies. Profiling of 9 DDR pathways using 17 gene sets for GSEA (Gene Set Enrichment Analysis) of high-density microarray gene expression data from formalin-fixed paraffin-embedded prostatectomy samples with median 10.3 years follow-up was performed. Prognostic signature development from DDR pathway profiles was studied, and DDR pathway gene mutation in published cohorts was analyzed. Biochemical recurrence-free, metastasis-free, and overall survival. Across the training cohort and pooled validation cohorts, 1090 men were studied; mean (SD) age at diagnosis was 65.3 (6.4) years. We found that there are distinct clusters of DDR pathways within the cohort, and DDR pathway enrichment is only weakly correlated with clinical variables such as age (Spearman ρ [ρ], range, -0.07 to 0.24), Gleason score (ρ, range, 0.03 to 0.20), prostate-specific antigen level (ρ, range, -0.07 to 0.10), while 13 of 17 DDR gene sets are strongly correlated with androgen receptor pathway enrichment (ρ, range, 0.33 to 0.82). In published cohorts, DDR pathway genes are rarely mutated. A DDR pathway profile prognostic signature built in the training cohort was significantly associated with biochemical recurrence-free, metastasis-free, and overall survival in the pooled validation cohorts independent of standard clinicopathological variables. The prognostic performance of the signature for metastasis-free survival appears to be stronger in the younger patients (HR, 1.67; 95% CI, 1.12-2.50) than in the older patients (HR, 0.77; 95% CI, 0.29-2.07) on multivariate Cox analysis. DNA damage and repair pathway profiling revealed patient-level variations and the DDR pathways are rarely affected by mutation. A DDR pathway signature showed strong prognostic performance with the long-term outcomes of metastasis-free and overall survival that may be useful for risk stratification of high-risk prostate cancer patients.
Tejera, Eduardo; Cruz-Monteagudo, Maykel; Burgos, Germán; Sánchez, María-Eugenia; Sánchez-Rodríguez, Aminael; Pérez-Castillo, Yunierkis; Borges, Fernanda; Cordeiro, Maria Natália Dias Soeiro; Paz-Y-Miño, César; Rebelo, Irene
2017-08-08
Preeclampsia is a multifactorial disease with unknown pathogenesis. Even when recent studies explored this disease using several bioinformatics tools, the main objective was not directed to pathogenesis. Additionally, consensus prioritization was proved to be highly efficient in the recognition of genes-disease association. However, not information is available about the consensus ability to early recognize genes directly involved in pathogenesis. Therefore our aim in this study is to apply several theoretical approaches to explore preeclampsia; specifically those genes directly involved in the pathogenesis. We firstly evaluated the consensus between 12 prioritization strategies to early recognize pathogenic genes related to preeclampsia. A communality analysis in the protein-protein interaction network of previously selected genes was done including further enrichment analysis. The enrichment analysis includes metabolic pathways as well as gene ontology. Microarray data was also collected and used in order to confirm our results or as a strategy to weight the previously enriched pathways. The consensus prioritized gene list was rationally filtered to 476 genes using several criteria. The communality analysis showed an enrichment of communities connected with VEGF-signaling pathway. This pathway is also enriched considering the microarray data. Our result point to VEGF, FLT1 and KDR as relevant pathogenic genes, as well as those connected with NO metabolism. Our results revealed that consensus strategy improve the detection and initial enrichment of pathogenic genes, at least in preeclampsia condition. Moreover the combination of the first percent of the prioritized genes with protein-protein interaction network followed by communality analysis reduces the gene space. This approach actually identifies well known genes related with pathogenesis. However, genes like HSP90, PAK2, CD247 and others included in the first 1% of the prioritized list need to be further explored in preeclampsia pathogenesis through experimental approaches.
Prediction of gene expression in embryonic structures of Drosophila melanogaster.
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-07-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-01-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945
McMullin, Ryan P; Wittner, Ben S; Yang, Chuanwei; Denton-Schneider, Benjamin R; Hicks, Daniel; Singavarapu, Raj; Moulis, Sharon; Lee, Jeongeun; Akbari, Mohammad R; Narod, Steven A; Aldape, Kenneth D; Steeg, Patricia S; Ramaswamy, Sridhar; Sgroi, Dennis C
2014-03-14
There is an unmet clinical need for biomarkers to identify breast cancer patients at an increased risk of developing brain metastases. The objective is to identify gene signatures and biological pathways associated with human epidermal growth factor receptor 2-positive (HER2+) brain metastasis. We combined laser capture microdissection and gene expression microarrays to analyze malignant epithelium from HER2+ breast cancer brain metastases with that from HER2+ nonmetastatic primary tumors. Differential gene expression was performed including gene set enrichment analysis (GSEA) using publicly available breast cancer gene expression data sets. In a cohort of HER2+ breast cancer brain metastases, we identified a gene expression signature that anti-correlates with overexpression of BRCA1. Sequence analysis of the HER2+ brain metastases revealed no pathogenic mutations of BRCA1, and therefore the aforementioned signature was designated BRCA1 Deficient-Like (BD-L). Evaluation of an independent cohort of breast cancer metastases demonstrated that BD-L values are significantly higher in brain metastases as compared to other metastatic sites. Although the BD-L signature is present in all subtypes of breast cancer, it is significantly higher in BRCA1 mutant primary tumors as compared with sporadic breast tumors. Additionally, BD-L signature values are significantly higher in HER2-/ER- primary tumors as compared with HER2+/ER + and HER2-/ER + tumors. The BD-L signature correlates with breast cancer cell line pharmacologic response to a combination of poly (ADP-ribose) polymerase (PARP) inhibitor and temozolomide, and the signature outperformed four published gene signatures of BRCA1/2 deficiency. A BD-L signature is enriched in HER2+ breast cancer brain metastases without pathogenic BRCA1 mutations. Unexpectedly, elevated BD-L values are found in a subset of primary tumors across all breast cancer subtypes. Evaluation of pharmacological sensitivity in breast cancer cell lines representing all breast cancer subtypes suggests the BD-L signature may serve as a biomarker to identify sporadic breast cancer patients who might benefit from a therapeutic combination of PARP inhibitor and temozolomide and may be indicative of a dysfunctional BRCA1-associated pathway.
2014-01-01
Introduction There is an unmet clinical need for biomarkers to identify breast cancer patients at an increased risk of developing brain metastases. The objective is to identify gene signatures and biological pathways associated with human epidermal growth factor receptor 2-positive (HER2+) brain metastasis. Methods We combined laser capture microdissection and gene expression microarrays to analyze malignant epithelium from HER2+ breast cancer brain metastases with that from HER2+ nonmetastatic primary tumors. Differential gene expression was performed including gene set enrichment analysis (GSEA) using publicly available breast cancer gene expression data sets. Results In a cohort of HER2+ breast cancer brain metastases, we identified a gene expression signature that anti-correlates with overexpression of BRCA1. Sequence analysis of the HER2+ brain metastases revealed no pathogenic mutations of BRCA1, and therefore the aforementioned signature was designated BRCA1 Deficient-Like (BD-L). Evaluation of an independent cohort of breast cancer metastases demonstrated that BD-L values are significantly higher in brain metastases as compared to other metastatic sites. Although the BD-L signature is present in all subtypes of breast cancer, it is significantly higher in BRCA1 mutant primary tumors as compared with sporadic breast tumors. Additionally, BD-L signature values are significantly higher in HER2-/ER- primary tumors as compared with HER2+/ER + and HER2-/ER + tumors. The BD-L signature correlates with breast cancer cell line pharmacologic response to a combination of poly (ADP-ribose) polymerase (PARP) inhibitor and temozolomide, and the signature outperformed four published gene signatures of BRCA1/2 deficiency. Conclusions A BD-L signature is enriched in HER2+ breast cancer brain metastases without pathogenic BRCA1 mutations. Unexpectedly, elevated BD-L values are found in a subset of primary tumors across all breast cancer subtypes. Evaluation of pharmacological sensitivity in breast cancer cell lines representing all breast cancer subtypes suggests the BD-L signature may serve as a biomarker to identify sporadic breast cancer patients who might benefit from a therapeutic combination of PARP inhibitor and temozolomide and may be indicative of a dysfunctional BRCA1-associated pathway. PMID:24625110
Pollak, Julia; Rai, Karan G; Funk, Cory C; Arora, Sonali; Lee, Eunjee; Zhu, Jun; Price, Nathan D; Paddison, Patrick J; Ramirez, Jan-Marino; Rostomily, Robert C
2017-01-01
Ion channels and transporters have increasingly recognized roles in cancer progression through the regulation of cell proliferation, migration, and death. Glioblastoma stem-like cells (GSCs) are a source of tumor formation and recurrence in glioblastoma multiforme, a highly aggressive brain cancer, suggesting that ion channel expression may be perturbed in this population. However, little is known about the expression and functional relevance of ion channels that may contribute to GSC malignancy. Using RNA sequencing, we assessed the enrichment of ion channels in GSC isolates and non-tumor neural cell types. We identified a unique set of GSC-enriched ion channels using differential expression analysis that is also associated with distinct gene mutation signatures. In support of potential clinical relevance, expression of selected GSC-enriched ion channels evaluated in human glioblastoma databases of The Cancer Genome Atlas and Ivy Glioblastoma Atlas Project correlated with patient survival times. Finally, genetic knockdown as well as pharmacological inhibition of individual or classes of GSC-enriched ion channels constrained growth of GSCs compared to normal neural stem cells. This first-in-kind global examination characterizes ion channels enriched in GSCs and explores their potential clinical relevance to glioblastoma molecular subtypes, gene mutations, survival outcomes, regional tumor expression, and experimental responses to loss-of-function. Together, the data support the potential biological and therapeutic impact of ion channels on GSC malignancy and provide strong rationale for further examination of their mechanistic and therapeutic importance.
Pollak, Julia; Rai, Karan G.; Funk, Cory C.; Arora, Sonali; Lee, Eunjee; Zhu, Jun; Price, Nathan D.; Paddison, Patrick J.; Ramirez, Jan-Marino; Rostomily, Robert C.
2017-01-01
Ion channels and transporters have increasingly recognized roles in cancer progression through the regulation of cell proliferation, migration, and death. Glioblastoma stem-like cells (GSCs) are a source of tumor formation and recurrence in glioblastoma multiforme, a highly aggressive brain cancer, suggesting that ion channel expression may be perturbed in this population. However, little is known about the expression and functional relevance of ion channels that may contribute to GSC malignancy. Using RNA sequencing, we assessed the enrichment of ion channels in GSC isolates and non-tumor neural cell types. We identified a unique set of GSC-enriched ion channels using differential expression analysis that is also associated with distinct gene mutation signatures. In support of potential clinical relevance, expression of selected GSC-enriched ion channels evaluated in human glioblastoma databases of The Cancer Genome Atlas and Ivy Glioblastoma Atlas Project correlated with patient survival times. Finally, genetic knockdown as well as pharmacological inhibition of individual or classes of GSC-enriched ion channels constrained growth of GSCs compared to normal neural stem cells. This first-in-kind global examination characterizes ion channels enriched in GSCs and explores their potential clinical relevance to glioblastoma molecular subtypes, gene mutations, survival outcomes, regional tumor expression, and experimental responses to loss-of-function. Together, the data support the potential biological and therapeutic impact of ion channels on GSC malignancy and provide strong rationale for further examination of their mechanistic and therapeutic importance. PMID:28264064
Chang, Yao-Ming; Liu, Wen-Yu; Shih, Arthur Chun-Chieh; Shen, Meng-Ni; Lu, Chen-Hua; Lu, Mei-Yeh Jade; Yang, Hui-Wen; Wang, Tzi-Yuan; Chen, Sean C-C; Chen, Stella Maris; Li, Wen-Hsiung; Ku, Maurice S B
2012-09-01
To study the regulatory and functional differentiation between the mesophyll (M) and bundle sheath (BS) cells of maize (Zea mays), we isolated large quantities of highly homogeneous M and BS cells from newly matured second leaves for transcriptome profiling by RNA sequencing. A total of 52,421 annotated genes with at least one read were found in the two transcriptomes. Defining a gene with more than one read per kilobase per million mapped reads as expressed, we identified 18,482 expressed genes; 14,972 were expressed in M cells, including 53 M-enriched transcription factor (TF) genes, whereas 17,269 were expressed in BS cells, including 214 BS-enriched TF genes. Interestingly, many TF gene families show a conspicuous BS preference in expression. Pathway analyses reveal differentiation between the two cell types in various functional categories, with the M cells playing more important roles in light reaction, protein synthesis and folding, tetrapyrrole synthesis, and RNA binding, while the BS cells specialize in transport, signaling, protein degradation and posttranslational modification, major carbon, hydrogen, and oxygen metabolism, cell division and organization, and development. Genes coding for several transporters involved in the shuttle of C(4) metabolites and BS cell wall development have been identified, to our knowledge, for the first time. This comprehensive data set will be useful for studying M/BS differentiation in regulation and function.
Role of DISC1 interacting proteins in schizophrenia risk from genome-wide analysis of missense SNPs.
Costas, Javier; Suárez-Rama, Jose Javier; Carrera, Noa; Paz, Eduardo; Páramo, Mario; Agra, Santiago; Brenlla, Julio; Ramos-Ríos, Ramón; Arrojo, Manuel
2013-11-01
A balanced translocation affecting DISC1 cosegregates with several psychiatric disorders, including schizophrenia, in a Scottish family. DISC1 is a hub protein of a network of protein-protein interactions involved in multiple developmental pathways within the brain. Gene set-based analysis has been proposed as an alternative to individual analysis of single nucleotide polymorphisms (SNPs) to get information from genome-wide association studies. In this work, we tested for an overrepresentation of the DISC1 interacting proteins within the top results of our ranked list of genes based on our previous genome-wide association study of missense SNPs in schizophrenia. Our data set consisted of 5100 common missense SNPs genotyped in 476 schizophrenic patients and 447 control subjects from Galicia, NW Spain. We used a modification of the Gene Set Enrichment Analysis adapted for SNPs, as implemented in the GenGen software. The analysis detected an overrepresentation of the DISC1 interacting proteins (permuted P-value=0.0158), indicative of the role of this gene set in schizophrenia risk. We identified seven leading-edge genes, MACF1, UTRN, DST, DISC1, KIF3A, SYNE1, and AKAP9, responsible for the overrepresentation. These genes are involved in neuronal cytoskeleton organization and intracellular transport through the microtubule cytoskeleton, suggesting that these processes may be impaired in schizophrenia. © 2013 John Wiley & Sons Ltd/University College London.
Screening key candidate genes and pathways involved in insulinoma by microarray analysis.
Zhou, Wuhua; Gong, Li; Li, Xuefeng; Wan, Yunyan; Wang, Xiangfei; Li, Huili; Jiang, Bin
2018-06-01
Insulinoma is a rare type tumor and its genetic features remain largely unknown. This study aimed to search for potential key genes and relevant enriched pathways of insulinoma.The gene expression data from GSE73338 were downloaded from Gene Expression Omnibus database. Differentially expressed genes (DEGs) were identified between insulinoma tissues and normal pancreas tissues, followed by pathway enrichment analysis, protein-protein interaction (PPI) network construction, and module analysis. The expressions of candidate key genes were validated by quantitative real-time polymerase chain reaction (RT-PCR) in insulinoma tissues.A total of 1632 DEGs were obtained, including 1117 upregulated genes and 514 downregulated genes. Pathway enrichment results showed that upregulated DEGs were significantly implicated in insulin secretion, and downregulated DEGs were mainly enriched in pancreatic secretion. PPI network analysis revealed 7 hub genes with degrees more than 10, including GCG (glucagon), GCGR (glucagon receptor), PLCB1 (phospholipase C, beta 1), CASR (calcium sensing receptor), F2R (coagulation factor II thrombin receptor), GRM1 (glutamate metabotropic receptor 1), and GRM5 (glutamate metabotropic receptor 5). DEGs involved in the significant modules were enriched in calcium signaling pathway, protein ubiquitination, and platelet degranulation. Quantitative RT-PCR data confirmed that the expression trends of these hub genes were similar to the results of bioinformatic analysis.The present study demonstrated that candidate DEGs and enriched pathways were the potential critical molecule events involved in the development of insulinoma, and these findings were useful for better understanding of insulinoma genesis.
Analyzing the Role of MicroRNAs in Schizophrenia in the Context of Common Genetic Risk Variants.
Hauberg, Mads Engel; Roussos, Panos; Grove, Jakob; Børglum, Anders Dupont; Mattheisen, Manuel
2016-04-01
The recent implication of 108 genomic loci in schizophrenia marked a great advancement in our understanding of the disease. Against the background of its polygenic nature there is a necessity to identify how schizophrenia risk genes interplay. As regulators of gene expression, microRNAs (miRNAs) have repeatedly been implicated in schizophrenia etiology. It is therefore of interest to establish their role in the regulation of schizophrenia risk genes in disease-relevant biological processes. To examine the role of miRNAs in schizophrenia in the context of disease-associated genetic variation. The basis of this study was summary statistics from the largest schizophrenia genome-wide association study meta-analysis to date (83 550 individuals in a meta-analysis of 52 genome-wide association studies) completed in 2014 along with publicly available data for predicted miRNA targets. We examined whether schizophrenia risk genes were more likely to be regulated by miRNA. Further, we used gene set analyses to identify miRNAs that are regulators of schizophrenia risk genes. Results from association tests for miRNA targetomes and related analyses. In line with previous studies, we found that similar to other complex traits, schizophrenia risk genes were more likely to be regulated by miRNAs (P < 2 × 10-16). Further, the gene set analyses revealed several miRNAs regulating schizophrenia risk genes, with the strongest enrichment for targets of miR-9-5p (P = .0056 for enrichment among the top 1% most-associated single-nucleotide polymorphisms, corrected for multiple testing). It is further of note that MIR9-2 is located in a genomic region showing strong evidence for association with schizophrenia (P = 7.1 × 10-8). The second and third strongest gene set signals were seen for the targets of miR-485-5p and miR-137, respectively. This study provides evidence for a role of miR-9-5p in the etiology of schizophrenia. Its implication is of particular interest as the functions of this neurodevelopmental miRNA tie in with established disease biology: it has a regulatory loop with the fragile X mental retardation homologue FXR1 and regulates dopamine D2 receptor density.
Trevisi, P; Latorre, R; Priori, D; Luise, D; Archetti, I; Mazzoni, M; D'Inca, R; Bosi, P
2017-01-01
The ability of live yeasts to modulate pig intestinal cell signals in response to infection with Escherichia coli F4ac (ETEC) has not been studied in-depth. The aim of this trial was to evaluate the effect of Saccharomyces cerevisiae CNCM I-4407 (Sc), supplied at different times, on the transcriptome profile of the jejunal mucosa of pigs 24 h after infection with ETEC. In total, 20 piglets selected to be ETEC-susceptible were weaned at 24 days of age (day 0) and allotted by litter to one of following groups: control (CO), CO+colistin (AB), CO+5×1010 colony-forming unit (CFU) Sc/kg feed, from day 0 (PR) and CO+5×1010 CFU Sc/kg feed from day 7 (CM). On day 7, the pigs were orally challenged with ETEC and were slaughtered 24 h later after blood sampling for haptoglobin (Hp) and C-reactive protein (CRP) determination. The jejunal mucosa was sampled (1) for morphometry; (2) for quantification of proliferation, apoptosis and zonula occludens (ZO-1); (3) to carry out the microarray analysis. A functional analysis was carried out using Gene Set Enrichment Analysis. The normalized enrichment score (NES) was calculated for each gene set, and statistical significance was defined when the False Discovery Rate % was <25 and P-values of NES were <0.05. The blood concentration of CRP and Hp, and the score for ZO-1 integrity on the jejunal villi did not differ between groups. The intestinal crypts were deeper in the AB (P=0.05) and the yeast groups (P<0.05) than in the CO group. Antibiotic treatment increased the number of mitotic cells in intestinal villi as compared with the control group (P<0.05). The PR group tended to increase the mitotic cells in villi and crypts and tended to reduce the cells in apoptosis as compared with the CM group. The transcriptome profiles of the AB and PR groups were similar. In both groups, the gene sets involved in mitosis and in mitochondria development ranked the highest, whereas in the CO group, the gene sets related to cell junction and anion channels were affected. In the CM group, the gene sets linked to the metabolic process, and transcription ranked the highest; a gene set linked with a negative effect on growth was also affected. In conclusion, the constant supplementation in the feed with the strain of yeast tested was effective in counteracting the detrimental effect of ETEC infection in susceptible pigs limits the early activation of the gene sets related to the impairment of the jejunal mucosa.
Ficklin, Stephen P; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Ficklin, Stephen P.; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance. PMID:23874666
Ravens, Sarina; Fournier, Marjorie; Ye, Tao; Stierle, Matthieu; Dembele, Doulaye; Chavant, Virginie; Tora, Làszlò
2014-01-01
The histone acetyltransferase (HAT) Mof is essential for mouse embryonic stem cell (mESC) pluripotency and early development. Mof is the enzymatic subunit of two different HAT complexes, MSL and NSL. The individual contribution of MSL and NSL to transcription regulation in mESCs is not well understood. Our genome-wide analysis show that i) MSL and NSL bind to specific and common sets of expressed genes, ii) NSL binds exclusively at promoters, iii) while MSL binds in gene bodies. Nsl1 regulates proliferation and cellular homeostasis of mESCs. MSL is the main HAT acetylating H4K16 in mESCs, is enriched at many mESC-specific and bivalent genes. MSL is important to keep a subset of bivalent genes silent in mESCs, while developmental genes require MSL for expression during differentiation. Thus, NSL and MSL HAT complexes differentially regulate specific sets of expressed genes in mESCs and during differentiation. DOI: http://dx.doi.org/10.7554/eLife.02104.001 PMID:24898753
Identification of a neuronal transcription factor network involved in medulloblastoma development
2013-01-01
Background Medulloblastomas, the most frequent malignant brain tumours affecting children, comprise at least 4 distinct clinicogenetic subgroups. Aberrant sonic hedgehog (SHH) signalling is observed in approximately 25% of tumours and defines one subgroup. Although alterations in SHH pathway genes (e.g. PTCH1, SUFU) are observed in many of these tumours, high throughput genomic analyses have identified few other recurring mutations. Here, we have mutagenised the Ptch+/- murine tumour model using the Sleeping Beauty transposon system to identify additional genes and pathways involved in SHH subgroup medulloblastoma development. Results Mutagenesis significantly increased medulloblastoma frequency and identified 17 candidate cancer genes, including orthologs of genes somatically mutated (PTEN, CREBBP) or associated with poor outcome (PTEN, MYT1L) in the human disease. Strikingly, these candidate genes were enriched for transcription factors (p=2x10-5), the majority of which (6/7; Crebbp, Myt1L, Nfia, Nfib, Tead1 and Tgif2) were linked within a single regulatory network enriched for genes associated with a differentiated neuronal phenotype. Furthermore, activity of this network varied significantly between the human subgroups, was associated with metastatic disease, and predicted poor survival specifically within the SHH subgroup of tumours. Igf2, previously implicated in medulloblastoma, was the most differentially expressed gene in murine tumours with network perturbation, and network activity in both mouse and human tumours was characterised by enrichment for multiple gene-sets indicating increased cell proliferation, IGF signalling, MYC target upregulation, and decreased neuronal differentiation. Conclusions Collectively, our data support a model of medulloblastoma development in SB-mutagenised Ptch+/- mice which involves disruption of a novel transcription factor network leading to Igf2 upregulation, proliferation of GNPs, and tumour formation. Moreover, our results identify rational therapeutic targets for SHH subgroup tumours, alongside prognostic biomarkers for the identification of poor-risk SHH patients. PMID:24252690
Vasileva, Hristina; Butcher, Robert; Pickering, Harry; Sokana, Oliver; Jack, Kelvin; Solomon, Anthony W; Holland, Martin J; Roberts, Chrissy H
2018-02-21
Clinical signs of active (inflammatory) trachoma are found in many children in the Solomon Islands, but the majority of these individuals have no serological evidence of previous infection with Chlamydia trachomatis. In Temotu and Rennell and Bellona provinces, ocular infections with C. trachomatis were seldom detected among children with active trachoma; a similar lack of association was seen between active trachoma and other common bacterial and viral causes of follicular conjunctivitis. Here, we set out to characterise patterns of gene expression at the conjunctivae of children in these provinces with and without clinical signs of trachomatous inflammation-follicular (TF) and C. trachomatis infection. Purified RNA from children with and without active trachoma was run on Affymetrix GeneChip Human Transcriptome Array 2.0 microarrays. Profiles were compared between individuals with ocular C. trachomatis infection and TF (group DI; n = 6), individuals with TF but no C. trachomatis infection (group D; n = 7), and individuals without TF or C. trachomatis infection (group N; n = 7). Differential gene expression and gene set enrichment for pathway membership were assessed. Conjunctival gene expression profiles were more similar within-group than between-group. Principal components analysis indicated that the first and second principal components combined explained almost 50% of the variance in the dataset. When comparing the DI group to the N group, genes involved in T-cell proliferation, B-cell signalling and CD8+ T cell signalling pathways were differentially regulated. When comparing the DI group to the D group, CD8+ T-cell regulation, interferon-gamma and IL17 production pathways were enriched. Genes involved in RNA transcription and translation pathways were upregulated when comparing the D group to the N group. Gene expression profiles in children in the Solomon Islands indicate immune responses consistent with bacterial infection when TF and C. trachomatis infection are concurrent. The transcriptomes of children with TF but without identified infection were not consistent with allergic or viral conjunctivitis.
Cornetta, K; Croop, J; Dropcho, E; Abonour, R; Kieran, M W; Kreissman, S; Reeves, L; Erickson, L C; Williams, D A
2006-09-01
Administration of chemotherapy is often limited by myelosuppression. Expression of drug-resistance genes in hematopoietic cells has been proposed as a means to decrease the toxicity of cytotoxic agents. In this pilot study, we utilized a retroviral vector expressing methylguanine DNA methyltransferase (MGMT) to transduce hematopoietic progenitors, which were subsequently used in the setting of alkylator therapy (procarbazine, CCNU, vincristine (PCV)) for poor prognosis brain tumors. Granulocyte colony-stimulating factor (G-CSF)-mobilized peripheral blood progenitor cells were collected by apheresis and enriched for CD34+ expression. Nine subjects were infused with CD34+-enriched cells treated in a transduction procedure involving a 4-day exposure to cytokines with vector exposure on days 3 and 4. No major adverse event was related to the gene therapy procedure. Importantly, the engraftment kinetics of the treated product was similar to unmanipulated peripheral blood stem cells, suggesting that the ex vivo manipulation did not significantly reduce engrafting progenitor cell function. Gene-transduced cells were detected in all subjects. Although the level and duration was limited, patients receiving cells transduced using fibronectin 'preloaded' with virus supernatant appeared to show improved in vivo marking frequency. These findings demonstrate the feasibility and safety of utilizing MGMT-transduced CD34+ peripheral blood progenitor cells in the setting of chemotherapy.
Zhou, Min; Ding, Yong; Cai, Liang; Wang, Yonggang; Lin, Changpo; Shi, Zhenyu
2018-05-01
Low molecular weight fucoidan (LMWF) is a sulfated polysaccharide extracted from Saccharina Japonica that presents high affinity for P-selectin and abolish selectin-dependent recruitment of leukocytes. We hypothesized that dietary intake of LMWF, as a competitive binding agent of P‑selectin, could limit the inflammatory infiltration and aneurysmal growth in an Angiotensin II‑induced abdominal aortic aneurysm (AAA) mouse model. The Gene Expression Omnibus database was used for gene expressions and gene set enrichment analysis. Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis showed that focal adhesion was involved in the development of AAA. However, dietary intake of LMWF could limit the enlargement of AAA, decreasing maximal aortic diameter and preserving elastin lamellae. Although LMWF did not decrease the circulatory monocytes count and lower the expression of P‑selectin in endothelium, it reduced macrophages infiltration in media and adventitia. Furthermore, matrix metalloproteinase expression was markedly downregulated, accompanied with reduced expression of inflammatory mediators, including interleukin 1β, tumor necrosis factor‑α and monocyte chemotactic protein‑1. The present study revealed a novel target for the treatment of AAA and the anti‑inflammatory effects of LMWF.
Hoffman, Jessica M; Soltow, Quinlyn A; Li, Shuzhao; Sidik, Alfire; Jones, Dean P; Promislow, Daniel E L
2014-01-01
Researchers have used whole-genome sequencing and gene expression profiling to identify genes associated with age, in the hope of understanding the underlying mechanisms of senescence. But there is a substantial gap from variation in gene sequences and expression levels to variation in age or life expectancy. In an attempt to bridge this gap, here we describe the effects of age, sex, genotype, and their interactions on high-sensitivity metabolomic profiles in the fruit fly, Drosophila melanogaster. Among the 6800 features analyzed, we found that over one-quarter of all metabolites were significantly associated with age, sex, genotype, or their interactions, and multivariate analysis shows that individual metabolomic profiles are highly predictive of these traits. Using a metabolomic equivalent of gene set enrichment analysis, we identified numerous metabolic pathways that were enriched among metabolites associated with age, sex, and genotype, including pathways involving sugar and glycerophospholipid metabolism, neurotransmitters, amino acids, and the carnitine shuttle. Our results suggest that high-sensitivity metabolomic studies have excellent potential not only to reveal mechanisms that lead to senescence, but also to help us understand differences in patterns of aging among genotypes and between males and females. PMID:24636523
Protective pathways against colitis mediated by appendicitis and appendectomy
Cheluvappa, R; Luo, A S; Palmer, C; Grimm, M C
2011-01-01
Appendicitis followed by appendectomy (AA) at a young age protects against inflammatory bowel disease (IBD). Using a novel murine appendicitis model, we showed that AA protected against subsequent experimental colitis. To delineate genes/pathways involved in this protection, AA was performed and samples harvested from the most distal colon. RNA was extracted from four individual colonic samples per group (AA group and double-laparotomy control group) and each sample microarray analysed followed by gene-set enrichment analysis (GSEA). The gene-expression study was validated by quantitative reverse transcription–polymerase chain reaction (RT–PCR) of 14 selected genes across the immunological spectrum. Distal colonic expression of 266 gene-sets was up-regulated significantly in AA group samples (false discovery rates < 1%; P-value < 0·001). Time–course RT–PCR experiments involving the 14 genes displayed down-regulation over 28 days. The IBD-associated genes tnfsf10, SLC22A5, C3, ccr5, irgm, ptger4 and ccl20 were modulated in AA mice 3 days after surgery. Many key immunological and cellular function-associated gene-sets involved in the protective effect of AA in experimental colitis were identified. The down-regulation of 14 selected genes over 28 days after surgery indicates activation, repression or de-repression of these genes leading to downstream AA-conferred anti-colitis protection. Further analysis of these genes, profiles and biological pathways may assist in developing better therapeutic strategies in the management of intractable IBD. PMID:21707591
2011-01-01
Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981
Gundersen, Gregory W; Jones, Matthew R; Rouillard, Andrew D; Kou, Yan; Monteiro, Caroline D; Feldmann, Axel S; Hu, Kevin S; Ma'ayan, Avi
2015-09-15
Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/. avi.maayan@mssm.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Identification of key microRNAs and genes in preeclampsia by bioinformatics analysis
Luo, Shouling; Cao, Nannan; Tang, Yao; Gu, Weirong
2017-01-01
Preeclampsia is a leading cause of perinatal maternal–foetal mortality and morbidity. The aim of this study is to identify the key microRNAs and genes in preeclampsia and uncover their potential functions. We downloaded the miRNA expression profile of GSE84260 and the gene expression profile of GSE73374 from the Gene Expression Omnibus database. Differentially expressed miRNAs and genes were identified and compared to miRNA-target information from MiRWalk 2.0, and a total of 65 differentially expressed miRNAs (DEMIs), including 32 up-regulated miRNAs and 33 down-regulated miRNAs, and 91 differentially expressed genes (DEGs), including 83 up-regulated genes and 8 down-regulated genes, were identified. The pathway enrichment analyses of the DEMIs showed that the up-regulated DEMIs were enriched in the Hippo signalling pathway and MAPK signalling pathway, and the down-regulated DEMIs were enriched in HTLV-I infection and miRNAs in cancers. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses of the DEGs were performed using Multifaceted Analysis Tool for Human Transcriptome. The up-regulated DEGs were enriched in biological processes (BPs), including the response to cAMP, response to hydrogen peroxide and cell-cell adhesion mediated by integrin; no enrichment of down-regulated DEGs was identified. KEGG analysis showed that the up-regulated DEGs were enriched in the Hippo signalling pathway and pathways in cancer. A PPI network of the DEGs was constructed by using Cytoscape software, and FOS, STAT1, MMP14, ITGB1, VCAN, DUSP1, LDHA, MCL1, MET, and ZFP36 were identified as the hub genes. The current study illustrates a characteristic microRNA profile and gene profile in preeclampsia, which may contribute to the interpretation of the progression of preeclampsia and provide novel biomarkers and therapeutic targets for preeclampsia. PMID:28594854
Genome-wide analysis of YY2 versus YY1 target genes
Chen, Li; Shioda, Toshi; Coser, Kathryn R.; Lynch, Mary C.; Yang, Chuanwei; Schmidt, Emmett V.
2010-01-01
Yin Yang 1 (YY1) is a critical transcription factor controlling cell proliferation, development and DNA damage responses. Retrotranspositions have independently generated additional YY family members in multiple species. Although Drosophila YY1 [pleiohomeotic (Pho)] and its homolog [pleiohomeotic-like (Phol)] redundantly control homeotic gene expression, the regulatory contributions of YY1-homologs have not yet been examined in other species. Indeed, targets for the mammalian YY1 homolog YY2 are completely unknown. Using gene set enrichment analysis, we found that lentiviral constructs containing short hairpin loop inhibitory RNAs for human YY1 (shYY1) and its homolog YY2 (shYY2) caused significant changes in both shared and distinguishable gene sets in human cells. Ribosomal protein genes were the most significant gene set upregulated by both shYY1 and shYY2, although combined shYY1/2 knock downs were not additive. In contrast, shYY2 reversed the anti-proliferative effects of shYY1, and shYY2 particularly altered UV damage response, platelet-specific and mitochondrial function genes. We found that decreases in YY1 or YY2 caused inverse changes in UV sensitivity, and that their combined loss reversed their respective individual effects. Our studies show that human YY2 is not redundant to YY1, and YY2 is a significant regulator of genes previously identified as uniquely responding to YY1. PMID:20215434
Competing endogenous RNA regulatory network in papillary thyroid carcinoma.
Chen, Shouhua; Fan, Xiaobin; Gu, He; Zhang, Lili; Zhao, Wenhua
2018-05-11
The present study aimed to screen all types of RNAs involved in the development of papillary thyroid carcinoma (PTC). RNA‑sequencing data of PTC and normal samples were used for screening differentially expressed (DE) microRNAs (DE‑miRNAs), long non‑coding RNAs (DE‑lncRNAs) and genes (DEGs). Subsequently, lncRNA‑miRNA, miRNA‑gene (that is, miRNA‑mRNA) and gene‑gene interaction pairs were extracted and used to construct regulatory networks. Feature genes in the miRNA‑mRNA network were identified by topological analysis and recursive feature elimination analysis. A support vector machine (SVM) classifier was built using 15 feature genes, and its classification effect was validated using two microarray data sets that were downloaded from the Gene Expression Omnibus (GEO) database. In addition, Gene Ontology function and Kyoto Encyclopedia Genes and Genomes pathway enrichment analyses were conducted for genes identified in the ceRNA network. A total of 506 samples, including 447 tumor samples and 59 normal samples, were obtained from The Cancer Genome Atlas (TCGA); 16 DE‑lncRNAs, 917 DEGs and 30 DE‑miRNAs were screened. The miRNA‑mRNA regulatory network comprised 353 nodes and 577 interactions. From these data, 15 feature genes with high predictive precision (>95%) were extracted from the network and were used to form an SVM classifier with an accuracy of 96.05% (486/506) for PTC samples downloaded from TCGA, and accuracies of 96.81 and 98.46% for GEO downloaded data sets. The ceRNA regulatory network comprised 596 lines (or interactions) and 365 nodes. Genes in the ceRNA network were significantly enriched in 'neuron development', 'differentiation', 'neuroactive ligand‑receptor interaction', 'metabolism of xenobiotics by cytochrome P450', 'drug metabolism' and 'cytokine‑cytokine receptor interaction' pathways. Hox transcript antisense RNA, miRNA‑206 and kallikrein‑related peptidase 10 were nodes in the ceRNA regulatory network of the selected feature gene, and they may serve import roles in the development of PTC.
The essential gene set of a photosynthetic organism
Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.; ...
2015-10-27
Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ~250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism's 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlapmore » with wellconserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNA Leu , which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism's physiology and defines the essential gene set required for the growth of a photosynthetic organism.« less
SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.
Merelli, Ivan; Calabria, Andrea; Cozzi, Paolo; Viti, Federica; Mosca, Ettore; Milanesi, Luciano
2013-01-01
The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies.
ISAAC - InterSpecies Analysing Application using Containers.
Baier, Herbert; Schultz, Jörg
2014-01-15
Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.
Putsathit, Papanin; Morgan, Justin; Bradford, Damien; Engelhardt, Nelly; Riley, Thomas V
2015-02-01
The Becton Dickinson (BD) PCR-based GeneOhm Cdiff assay has demonstrated a high sensitivity and specificity for detecting Clostridium difficile. Recently, the BD Max platform, using the same principles as BD GeneOhm, has become available in Australia. This study aimed to investigate the sensitivity and specificity of BD Max Cdiff assay for the detection of toxigenic C. difficile in an Australian setting. Between December 2013 and January 2014, 406 stool specimens from 349 patients were analysed with the BD Max Cdiff assay. Direct and enrichment toxigenic culture were performed on bioMérieux ChromID C. difficile agar as a reference method. isolates from specimens with discrepant results were further analysed with an in-house PCR to detect the presence of toxin genes. The overall prevalence of toxigenic C. difficile was 7.2%. Concordance between the BD Max assay and enrichment culture was 98.5%. The sensitivity, specificity, positive predictive value and negative predictive value for the BD Max Cdiff assay were 95.5%, 99.0%, 87.5% and 99.7%, respectively, when compared to direct culture, and 91.7%, 99.0%, 88.0% and 99.4%, respectively, when compared to enrichment culture. The new BD Max Cdiff assay appeared to be an excellent platform for rapid and accurate detection of toxigenic C. difficile.
GEAR: genomic enrichment analysis of regional DNA copy number changes.
Kim, Tae-Min; Jung, Yu-Chae; Rhyu, Mun-Gan; Jung, Myeong Ho; Chung, Yeun-Jun
2008-02-01
We developed an algorithm named GEAR (genomic enrichment analysis of regional DNA copy number changes) for functional interpretation of genome-wide DNA copy number changes identified by array-based comparative genomic hybridization. GEAR selects two types of chromosomal alterations with potential biological relevance, i.e. recurrent and phenotype-specific alterations. Then it performs functional enrichment analysis using a priori selected functional gene sets to identify primary and clinical genomic signatures. The genomic signatures identified by GEAR represent functionally coordinated genomic changes, which can provide clues on the underlying molecular mechanisms related to the phenotypes of interest. GEAR can help the identification of key molecular functions that are activated or repressed in the tumor genomes leading to the improved understanding on the tumor biology. GEAR software is available with online manual in the website, http://www.systemsbiology.co.kr/GEAR/.
Vancamelbeke, Maaike; Vanuytsel, Tim; Farré, Ricard; Verstockt, Sare; Ferrante, Marc; Van Assche, Gert; Rutgeerts, Paul; Schuit, Frans; Vermeire, Séverine; Arijs, Ingrid; Cleynen, Isabelle
2017-10-01
Intestinal barrier defects are common in patients with inflammatory bowel disease (IBD). To identify which components could underlie these changes, we performed an in-depth analysis of epithelial barrier genes in IBD. A set of 128 intestinal barrier genes was selected. Polygenic risk scores were generated based on selected barrier gene variants that were associated with Crohn's disease (CD) or ulcerative colitis (UC) in our study. Gene expression was analyzed using microarray and quantitative reverse transcription polymerase chain reaction. Influence of barrier gene variants on expression was studied by cis-expression quantitative trait loci mapping and comparing patients with low- and high-risk scores. Barrier risk scores were significantly higher in patients with IBD than controls. At single-gene level, the associated barrier single-nucleotide polymorphisms were most significantly enriched in PTGER4 for CD and HNF4A for UC. As a group, the regulating proteins were most enriched for CD and UC. Expression analysis showed that many epithelial barrier genes were significantly dysregulated in active CD and UC, with overrepresentation of mucus layer genes. In uninflamed CD ileum and IBD colon, most barrier gene levels restored to normal, except for MUC1 and MUC4 that remained persistently increased compared with controls. Expression levels did not depend on cis-regulatory variants nor combined genetic risk. We found genetic and transcriptomic dysregulations of key epithelial barrier genes and components in IBD. Of these, we believe that mucus genes, in particular MUC1 and MUC4, play an essential role in the pathogenesis of IBD and could represent interesting targets for treatment.
Modrell, Melinda S; Lyne, Mike; Carr, Adrian R; Zakon, Harold H; Buckley, David; Campbell, Alexander S; Davis, Marcus C; Micklem, Gos; Baker, Clare VH
2017-01-01
The anamniote lateral line system, comprising mechanosensory neuromasts and electrosensory ampullary organs, is a useful model for investigating the developmental and evolutionary diversification of different organs and cell types. Zebrafish neuromast development is increasingly well understood, but neither zebrafish nor Xenopus is electroreceptive and our molecular understanding of ampullary organ development is rudimentary. We have used RNA-seq to generate a lateral line-enriched gene-set from late-larval paddlefish (Polyodon spathula). Validation of a subset reveals expression in developing ampullary organs of transcription factor genes critical for hair cell development, and genes essential for glutamate release at hair cell ribbon synapses, suggesting close developmental, physiological and evolutionary links between non-teleost electroreceptors and hair cells. We identify an ampullary organ-specific proneural transcription factor, and candidates for the voltage-sensing L-type Cav channel and rectifying Kv channel predicted from skate (cartilaginous fish) ampullary organ electrophysiology. Overall, our results illuminate ampullary organ development, physiology and evolution. DOI: http://dx.doi.org/10.7554/eLife.24197.001 PMID:28346141
Zhang, Xiao-Ning; Shi, Yifei; Powers, Jordan J; Gowda, Nikhil B; Zhang, Chong; Ibrahim, Heba M M; Ball, Hannah B; Chen, Samuel L; Lu, Hua; Mount, Stephen M
2017-10-11
Regulation of pre-mRNA splicing diversifies protein products and affects many biological processes. Arabidopsis thaliana Serine/Arginine-rich 45 (SR45), regulates pre-mRNA splicing by interacting with other regulatory proteins and spliceosomal subunits. Although SR45 has orthologs in diverse eukaryotes, including human RNPS1, the sr45-1 null mutant is viable. Narrow flower petals and reduced seed formation suggest that SR45 regulates genes involved in diverse processes, including reproduction. To understand how SR45 is involved in the regulation of reproductive processes, we studied mRNA from the wild-type and sr45-1 inflorescences using RNA-seq, and identified SR45-bound RNAs by immunoprecipitation. Using a variety of bioinformatics tools, we identified a total of 358 SR45 differentially regulated (SDR) genes, 542 SR45-dependent alternative splicing (SAS) events, and 1812 SR45-associated RNAs (SARs). There is little overlap between SDR genes and SAS genes, and neither set of genes is enriched for flower or seed development. However, transcripts from reproductive process genes are significantly overrepresented in SARs. In exploring the fate of SARs, we found that a total of 81 SARs are subject to alternative splicing, while 14 of them are known Nonsense-Mediated Decay (NMD) targets. Motifs related to GGNGG are enriched both in SARs and near different types of SAS events, suggesting that SR45 recognizes this motif directly. Genes involved in plant defense are significantly over-represented among genes whose expression is suppressed by SR45, and sr45-1 plants do indeed show enhanced immunity. We find that SR45 is a suppressor of innate immunity. We find that a single motif (GGNGG) is highly enriched in both RNAs bound by SR45 and in sequences near SR45- dependent alternative splicing events in inflorescence tissue. We find that the alternative splicing events regulated by SR45 are enriched for this motif whether the effect of SR45 is activation or repression of the particular event. Thus, our data suggests that SR45 acts to control splice site choice in a way that defies simple categorization as an activator or repressor of splicing.
Dowle, Eddy J; Pochon, Xavier; C Banks, Jonathan; Shearer, Karen; Wood, Susanna A
2016-09-01
Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (<1% of the total abundance or biomass), moderately abundant (1-5%) and highly abundant (>5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step. © 2015 John Wiley & Sons Ltd.
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.
Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang
2015-01-01
RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
von Netzer, Frederick; Pilloni, Giovanni; Kleindienst, Sara; Krüger, Martin; Knittel, Katrin; Gründger, Friederike
2013-01-01
The detection of anaerobic hydrocarbon degrader populations via catabolic gene markers is important for the understanding of processes at contaminated sites. Fumarate-adding enzymes (FAEs; i.e., benzylsuccinate and alkylsuccinate synthases) have already been established as specific functional marker genes for anaerobic hydrocarbon degraders. Several recent studies based on pure cultures and laboratory enrichments have shown the existence of new and deeply branching FAE gene lineages, such as clostridial benzylsuccinate synthases and homologues, as well as naphthylmethylsuccinate synthases. However, established FAE gene detection assays were not designed to target these novel lineages, and consequently, their detectability in different environments remains obscure. Here, we present a new suite of parallel primer sets for detecting the comprehensive range of FAE markers known to date, including clostridial benzylsuccinate, naphthylmethylsuccinate, and alkylsuccinate synthases. It was not possible to develop one single assay spanning the complete diversity of FAE genes alone. The enhanced assays were tested with a range of hydrocarbon-degrading pure cultures, enrichments, and environmental samples of marine and terrestrial origin. They revealed the presence of several, partially unexpected FAE gene lineages not detected in these environments before: distinct deltaproteobacterial and also clostridial bssA homologues as well as environmental nmsA homologues. These findings were backed up by dual-digest terminal restriction fragment length polymorphism diagnostics to identify FAE gene populations independently of sequencing. This allows rapid insights into intrinsic degrader populations and degradation potentials established in aromatic and aliphatic hydrocarbon-impacted environmental systems. PMID:23124238
Gene expression profiling of selenophosphate synthetase 2 knockdown in Drosophila melanogaster.
Li, Gaopeng; Liu, Liying; Li, Ping; Chen, Luonan; Song, Haiyun; Zhang, Yan
2016-03-01
Selenium (Se) is an important trace element for many organisms and is incorporated into selenoproteins as selenocysteine (Sec). In eukaryotes, selenophosphate synthetase SPS2 is essential for Sec biosynthesis. In recent years, genetic disruptions of both Sec biosynthesis genes and selenoprotein genes have been investigated in different animal models, which provide important clues for understanding the Se metabolism and function in these organisms. However, a systematic study on the knockdown of SPS2 has not been performed in vivo. Herein, we conducted microarray experiments to study the transcriptome of fruit flies with knockdown of SPS2 in larval and adult stages. Several hundred differentially expressed genes were identified in each stage. In spite that the expression levels of other Sec biosynthesis genes and selenoprotein genes were not significantly changed, it is possible that selenoprotein translation might be reduced without impacting the mRNA level. Functional enrichment and network-based analyses revealed that although different sets of differentially expressed genes were obtained in each stage, they were both significantly enriched in the carbohydrate metabolism and redox processes. Furthermore, protein-protein interaction (PPI)-based network clustering analysis implied that several hub genes detected in the top modules, such as Nimrod C1 and regucalcin, could be considered as key regulators that are responsible for the complex responses caused by SPS2 knockdown. Overall, our data provide new insights into the relationship between Se utilization and several fundamental cellular processes as well as diseases.
Korsunsky, Ilya; Parameswaran, Janaki; Shapira, Iuliana; Lovecchio, John; Menzin, Andrew; Whyte, Jill; Dos Santos, Lisa; Liang, Sharon; Bhuiya, Tawfiqul; Keogh, Mary; Khalili, Houman; Pond, Cassandra; Liew, Anthony; Shih, Andrew; Gregersen, Peter K; Lee, Annette T
2017-10-01
MicroRNAs have been established as key regulators of tumor gene expression and as prime biomarker candidates for clinical phenotypes in epithelial ovarian cancer (EOC). We analyzed the coexpression and regulatory structure of microRNAs and their co-localized gene targets in primary tumor tissue of 20 patients with advanced EOC in order to construct a regulatory signature for clinical prognosis. We performed an integrative analysis to identify two prognostic microRNA/mRNA coexpression modules, each enriched for consistent biological functions. One module, enriched for malignancy-related functions, was found to be upregulated in malignant versus benign samples. The second module, enriched for immune-related functions, was strongly correlated with imputed intratumoral immune infiltrates of T cells, natural killer cells, cytotoxic lymphocytes, and macrophages. We validated the prognostic relevance of the immunological module microRNAs in the publicly available The Cancer Genome Atlas data set. These findings provide novel functional roles for microRNAs in the progression of advanced EOC and possible prognostic signatures for survival. © American Federation for Medical Research (unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Yoo, Seungyeul; Takikawa, Sachiko; Geraghty, Patrick; Argmann, Carmen; Campbell, Joshua; Lin, Luan; Huang, Tao; Tu, Zhidong; Feronjy, Robert; Spira, Avrum; Schadt, Eric E.; Powell, Charles A.; Zhu, Jun
2015-01-01
Chronic Obstructive Pulmonary Disease (COPD) is a complex disease. Genetic, epigenetic, and environmental factors are known to contribute to COPD risk and disease progression. Therefore we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity. EPAS1 is distinct in comparison with other key regulators in terms of methylation profile and downstream target genes. Genes predicted to be regulated by EPAS1 were enriched for biological processes including signaling, cell communications, and system development. We confirmed that EPAS1 protein levels are lower in human COPD lung tissue compared to non-disease controls and that Epas1 gene expression is reduced in mice chronically exposed to cigarette smoke. As EPAS1 downstream genes were significantly enriched for hypoxia responsive genes in endothelial cells, we tested EPAS1 function in human endothelial cells. EPAS1 knockdown by siRNA in endothelial cells impacted genes that significantly overlapped with EPAS1 downstream genes in lung tissue including hypoxia responsive genes, and genes associated with emphysema severity. Our first integrative analysis of genome-wide DNA methylation and gene expression profiles illustrates that not only does DNA methylation play a ‘causal’ role in the molecular pathophysiology of COPD, but it can be leveraged to directly identify novel key mediators of this pathophysiology. PMID:25569234
Wloch-Salamon, Dominika M; Tomala, Katarzyna; Aggeli, Dimitra; Dunn, Barbara
2017-06-07
Over its evolutionary history, Saccharomyces cerevisiae has evolved to be well-adapted to fluctuating nutrient availability. In the presence of sufficient nutrients, yeast cells continue to proliferate, but upon starvation haploid yeast cells enter stationary phase and differentiate into nonquiescent (NQ) and quiescent (Q) cells. Q cells survive stress better than NQ cells and show greater viability when nutrient-rich conditions are restored. To investigate the genes that may be involved in the differentiation of Q and NQ cells, we serially propagated yeast populations that were enriched for either only Q or only NQ cell types over many repeated growth-starvation cycles. After 30 cycles (equivalent to 300 generations), each enriched population produced a higher proportion of the enriched cell type compared to the starting population, suggestive of adaptive change. We also observed differences in each population's fitness suggesting possible tradeoffs: clones from NQ lines were better adapted to logarithmic growth, while clones from Q lines were better adapted to starvation. Whole-genome sequencing of clones from Q- and NQ-enriched lines revealed mutations in genes involved in the stress response and survival in limiting nutrients ( ECM21 , RSP5 , MSN1 , SIR4 , and IRA2 ) in both Q and NQ lines, but also differences between the two lines: NQ line clones had recurrent independent mutations affecting the Ssy1p-Ptr3p-Ssy5p (SPS) amino acid sensing pathway, while Q line clones had recurrent, independent mutations in SIR3 and FAS1 Our results suggest that both sets of enriched-cell type lines responded to common, as well as distinct, selective pressures. Copyright © 2017 Wloch-Salamon et al.
Rager, Julia E.; Yosim, Andrew; Fry, Rebecca C.
2014-01-01
There is increasing evidence that environmental agents mediate susceptibility to infectious disease. Studies support the impact of prenatal/early life exposure to the environmental metals inorganic arsenic (iAs) and cadmium (Cd) on increased risk for susceptibility to infection. The specific biological mechanisms that underlie such exposure-mediated effects remain understudied. This research aimed to identify key genes/signal transduction pathways that associate prenatal exposure to these toxic metals with changes in infectious disease susceptibility using a Comparative Genomic Enrichment Method (CGEM). Using CGEM an infectious disease gene (IDG) database was developed comprising 1085 genes with known roles in viral, bacterial, and parasitic disease pathways. Subsequently, datasets collected from human pregnancy cohorts exposed to iAs or Cd were examined in relationship to the IDGs, specifically focusing on data representing epigenetic modifications (5-methyl cytosine), genomic perturbations (mRNA expression), and proteomic shifts (protein expression). A set of 82 infection and exposure-related genes was identified and found to be enriched for their role in the glucocorticoid receptor signal transduction pathway. Given their common identification across numerous human cohorts and their known toxicological role in disease, the identified genes within the glucocorticoid signal transduction pathway may underlie altered infectious disease susceptibility associated with prenatal exposures to the toxic metals iAs and Cd in humans. PMID:25479081
Morin, Ryan D.; Chang, Elbert; Petrescu, Anca; Liao, Nancy; Griffith, Malachi; Kirkpatrick, Robert; Butterfield, Yaron S.; Young, Alice C.; Stott, Jeffrey; Barber, Sarah; Babakaiff, Ryan; Dickson, Mark C.; Matsuo, Corey; Wong, David; Yang, George S.; Smailus, Duane E.; Wetherby, Keith D.; Kwong, Peggy N.; Grimwood, Jane; Brinkley, Charles P.; Brown-John, Mabel; Reddix-Dugue, Natalie D.; Mayo, Michael; Schmutz, Jeremy; Beland, Jaclyn; Park, Morgan; Gibson, Susan; Olson, Teika; Bouffard, Gerard G.; Tsai, Miranda; Featherstone, Ruth; Chand, Steve; Siddiqui, Asim S.; Jang, Wonhee; Lee, Ed; Klein, Steven L.; Blakesley, Robert W.; Zeeberg, Barry R.; Narasimhan, Sudarshan; Weinstein, John N.; Pennacchio, Christa Prange; Myers, Richard M.; Green, Eric D.; Wagner, Lukas; Gerhard, Daniela S.; Marra, Marco A.; Jones, Steven J.M.; Holt, Robert A.
2006-01-01
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization. PMID:16672307
Ghauri, Muhammad A; Khalid, Ahmad M; Grant, Susan; Grant, William D; Heaphy, Shaun
2006-06-01
Environmental samples were collected from high-pH sites in Pakistan, including a uranium heap set up for carbonate leaching, the lime unit of a tannery, and the Khewra salt mine. Another sample was collected from a hot spring on the shore of the soda lake, Magadi, in Kenya. Microbial cultures were enriched from Pakistani samples. Phylogenetic analysis of isolates was carried out by sequencing 16S rRNA genes. Genomic DNA was amplified by polymerase chain reaction using integron gene-cassette-specific primers. Different gene-cassette-linked genes were recovered from the cultured strains related to Halomonas magadiensis, Virgibacillus halodenitrificans, and Yania flava and from the uncultured environmental DNA sample. The usefulness of this technique as a tool for gene mining is indicated.
Li, Xiaofang; Tian, Run; Gao, Hugh; Yan, Feng; Ying, Le; Yang, Yongkang; Yang, Pei
2018-01-01
Cervical cancer is the leading cause of death with gynecological malignancies. We aimed to explore the molecular mechanism of carcinogenesis and biomarkers for cervical cancer by integrated bioinformatic analysis. We employed RNA-sequencing details of 254 cervical squamous cell carcinomas and 3 normal samples from The Cancer Genome Atlas. To explore the distinct pathways, messenger RNA expression was submitted to a Gene Set Enrichment Analysis. Kyoto Encyclopedia of Genes and Genomes and protein–protein interaction network analysis of differentially expressed genes were performed. Then, we conducted pathway enrichment analysis for modules acquired in protein–protein interaction analysis and obtained a list of pathways in every module. After intersecting the results from the 3 approaches, we evaluated the survival rates of both mutual pathways and genes in the pathway, and 5 survival-related genes were obtained. Finally, Cox hazards ratio analysis of these 5 genes was performed. DNA replication pathway (P < .001; 12 genes included) was suggested to have the strongest association with the prognosis of cervical squamous cancer. In total, 5 of the 12 genes, namely, minichromosome maintenance 2, minichromosome maintenance 4, minichromosome maintenance 5, proliferating cell nuclear antigen, and ribonuclease H2 subunit A were significantly correlated with survival. Minichromosome maintenance 5 was shown as an independent prognostic biomarker for patients with cervical cancer. This study identified a distinct pathway (DNA replication). Five genes which may be prognostic biomarkers and minichromosome maintenance 5 were identified as independent prognostic biomarkers for patients with cervical cancer. PMID:29642758
Pathways Involved in Sasang Constitution from Genome-Wide Analysis in a Korean Population
Yu, Sung-Gon; Kim, Jong-Yeol; Song, Kwang Hoon
2012-01-01
Abstract Objective Sasang constitution (SC) medicine, a branch of Korean traditional medicine, classifies the individual into one of four constitutional types (Taeum, TE; Soeum, SE; Soyang, SY; and Taeyang, TY) based on physiologic characteristics. The authors of the current article recently reported individual genetic elements associated with SC types via genome-wide association (GWA) analysis. However, to understand the biologic mechanisms underlying constitution, a comprehensive approach that combines individual genetic effects was applied. Design Genotypes of 1222 subjects of defined constitution types were measured for 341,998 genetic loci across the entire genome. The biologic pathways associated with SC types were identified via GWA analysis using three different algorithms—namely, the Z-static method, a restandardized gene set assay, and a gene set enrichment assay. Results Distinct pathways were associated (p<0.05) with each constitution type. The TE type was significantly associated with cytoskeleton-related pathways. The SE type was significantly associated with cardio- and amino-acid metabolism–related pathways. The SY type was associated with enriched melanoma-related pathways. TY subjects were excluded because of the small size of that sample. Among these functionally related pathways, core-node genes regulating multiple pathways were identified. TJP1, PTK2, and SRC were selected as core-nodes for TE; RHOA, and MAOA/MAOB for SE; and GNAO1 for SY (p<0.05), respectively. Conclusions The current authors systematically identified the biologic pathways and core-node genes associated with SC types from the GWA study; this information should provide insights regarding the molecular mechanisms inherent in constitutional pathophysiology. PMID:22889377
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Borowsky, Alexander T.
2017-01-01
Plants produce diverse specialized metabolites (SMs), but the genes responsible for their production and regulation remain largely unknown, hindering efforts to tap plant pharmacopeia. Given that genes comprising SM pathways exhibit environmentally dependent coregulation, we hypothesized that genes within a SM pathway would form tight associations (modules) with each other in coexpression networks, facilitating their identification. To evaluate this hypothesis, we used 10 global coexpression data sets, each a meta-analysis of hundreds to thousands of experiments, across eight plant species to identify hundreds of coexpressed gene modules per data set. In support of our hypothesis, 15.3 to 52.6% of modules contained two or more known SM biosynthetic genes, and module genes were enriched in SM functions. Moreover, modules recovered many experimentally validated SM pathways, including all six known to form biosynthetic gene clusters (BGCs). In contrast, bioinformatically predicted BGCs (i.e., those lacking an associated metabolite) were no more coexpressed than the null distribution for neighboring genes. These results suggest that most predicted plant BGCs are not genuine SM pathways and argue that BGCs are not a hallmark of plant specialized metabolism. We submit that global gene coexpression is a rich, largely untapped resource for discovering the genetic basis and architecture of plant natural products. PMID:28408660
2014-01-01
Background Clinically useful biomarkers for patient stratification and monitoring of disease progression and drug response are in big demand in drug development and for addressing potential safety concerns. Many diseases influence the frequency and phenotype of cells found in the peripheral blood and the transcriptome of blood cells. Changes in cell type composition influence whole blood gene expression analysis results and thus the discovery of true transcript level changes remains a challenge. We propose a robust and reproducible procedure, which includes whole transcriptome gene expression profiling of major subsets of immune cell cells directly sorted from whole blood. Methods Target cells were enriched using magnetic microbeads and an autoMACS® Pro Separator (Miltenyi Biotec). Flow cytometric analysis for purity was performed before and after magnetic cell sorting. Total RNA was hybridized on HGU133 Plus 2.0 expression microarrays (Affymetrix, USA). CEL files signal intensity values were condensed using RMA and a custom CDF file (EntrezGene-based). Results Positive selection by use of MACS® Technology coupled to transcriptomics was assessed for eight different peripheral blood cell types, CD14+ monocytes, CD3+, CD4+, or CD8+ T cells, CD15+ granulocytes, CD19+ B cells, CD56+ NK cells, and CD45+ pan leukocytes. RNA quality from enriched cells was above a RIN of eight. GeneChip analysis confirmed cell type specific transcriptome profiles. Storing whole blood collected in an EDTA Vacutainer® tube at 4°C followed by MACS does not activate sorted cells. Gene expression analysis supports cell enrichment measurements by MACS. Conclusions The proposed workflow generates reproducible cell-type specific transcriptome data which can be translated to clinical settings and used to identify clinically relevant gene expression biomarkers from whole blood samples. This procedure enables the integration of transcriptomics of relevant immune cell subsets sorted directly from whole blood in clinical trial protocols. PMID:25984272
Hadley, Dexter; Wu, Zhi-liang; Kao, Charlly; Kini, Akshata; Mohamed-Hadley, Alisha; Thomas, Kelly; Vazquez, Lyam; Qiu, Haijun; Mentch, Frank; Pellegrino, Renata; Kim, Cecilia; Connolly, John; Pinto, Dalila; Merikangas, Alison; Klei, Lambertus; Vorstman, Jacob A.S.; Thompson, Ann; Regan, Regina; Pagnamenta, Alistair T.; Oliveira, Bárbara; Magalhaes, Tiago R.; Gilbert, John; Duketis, Eftichia; De Jonge, Maretha V.; Cuccaro, Michael; Correia, Catarina T.; Conroy, Judith; Conceição, Inês C.; Chiocchetti, Andreas G.; Casey, Jillian P.; Bolshakova, Nadia; Bacchelli, Elena; Anney, Richard; Zwaigenbaum, Lonnie; Wittemeyer, Kerstin; Wallace, Simon; Engeland, Herman van; Soorya, Latha; Rogé, Bernadette; Roberts, Wendy; Poustka, Fritz; Mouga, Susana; Minshew, Nancy; McGrew, Susan G.; Lord, Catherine; Leboyer, Marion; Le Couteur, Ann S.; Kolevzon, Alexander; Jacob, Suma; Guter, Stephen; Green, Jonathan; Green, Andrew; Gillberg, Christopher; Fernandez, Bridget A.; Duque, Frederico; Delorme, Richard; Dawson, Geraldine; Café, Cátia; Brennan, Sean; Bourgeron, Thomas; Bolton, Patrick F.; Bölte, Sven; Bernier, Raphael; Baird, Gillian; Bailey, Anthony J.; Anagnostou, Evdokia; Almeida, Joana; Wijsman, Ellen M.; Vieland, Veronica J.; Vicente, Astrid M.; Schellenberg, Gerard D.; Pericak-Vance, Margaret; Paterson, Andrew D.; Parr, Jeremy R.; Oliveira, Guiomar; Almeida, Joana; Café, Cátia; Mouga, Susana; Correia, Catarina; Nurnberger, John I.; Monaco, Anthony P.; Maestrini, Elena; Klauck, Sabine M.; Hakonarson, Hakon; Haines, Jonathan L.; Geschwind, Daniel H.; Freitag, Christine M.; Folstein, Susan E.; Ennis, Sean; Coon, Hilary; Battaglia, Agatino; Szatmari, Peter; Sutcliffe, James S.; Hallmayer, Joachim; Gill, Michael; Cook, Edwin H.; Buxbaum, Joseph D.; Devlin, Bernie; Gallagher, Louise; Betancur, Catalina; Scherer, Stephen W.; Glessner, Joseph; Hakonarson, Hakon
2014-01-01
Although multiple reports show that defective genetic networks underlie the aetiology of autism, few have translated into pharmacotherapeutic opportunities. Since drugs compete with endogenous small molecules for protein binding, many successful drugs target large gene families with multiple drug binding sites. Here we search for defective gene family interaction networks (GFINs) in 6,742 patients with the ASDs relative to 12,544 neurologically normal controls, to find potentially druggable genetic targets. We find significant enrichment of structural defects (P≤2.40E−09, 1.8-fold enrichment) in the metabotropic glutamate receptor (GRM) GFIN, previously observed to impact attention deficit hyperactivity disorder (ADHD) and schizophrenia. Also, the MXD-MYC-MAX network of genes, previously implicated in cancer, is significantly enriched (P≤3.83E−23, 2.5-fold enrichment), as is the calmodulin 1 (CALM1) gene interaction network (P≤4.16E−04, 14.4-fold enrichment), which regulates voltage-independent calcium-activated action potentials at the neuronal synapse. We find that multiple defective gene family interactions underlie autism, presenting new translational opportunities to explore for therapeutic interventions. PMID:24927284
Arnardottir, Erna S.; Nikonova, Elena V.; Shockley, Keith R.; Podtelezhnikov, Alexei A.; Anafi, Ron C.; Tanis, Keith Q.; Maislin, Greg; Stone, David J.; Renger, John J.; Winrow, Christopher J.; Pack, Allan I.
2014-01-01
Study Objectives: To address whether changes in gene expression in blood cells with sleep loss are different in individuals resistant and sensitive to sleep deprivation. Design: Blood draws every 4 h during a 3-day study: 24-h normal baseline, 38 h of continuous wakefulness and subsequent recovery sleep, for a total of 19 time-points per subject, with every 2-h psychomotor vigilance task (PVT) assessment when awake. Setting: Sleep laboratory. Participants: Fourteen subjects who were previously identified as behaviorally resistant (n = 7) or sensitive (n = 7) to sleep deprivation by PVT. Intervention: Thirty-eight hours of continuous wakefulness. Measurements and Results: We found 4,481 unique genes with a significant 24-h diurnal rhythm during a normal sleep-wake cycle in blood (false discovery rate [FDR] < 5%). Biological pathways were enriched for biosynthetic processes during sleep. After accounting for circadian effects, two genes (SREBF1 and CPT1A, both involved in lipid metabolism) exhibited small, but significant, linear changes in expression with the duration of sleep deprivation (FDR < 5%). The main change with sleep deprivation was a reduction in the amplitude of the diurnal rhythm of expression of normally cycling probe sets. This reduction was noticeably higher in behaviorally resistant subjects than sensitive subjects, at any given P value. Furthermore, blood cell type enrichment analysis showed that the expression pattern difference between sensitive and resistant subjects is mainly found in cells of myeloid origin, such as monocytes. Conclusion: Individual differences in behavioral effects of sleep deprivation are associated with differences in diurnal amplitude of gene expression for genes that show circadian rhythmicity. Citation: Arnardottir ES, Nikonova EV, Shockley KR, Podtelezhnikov AA, Anafi RC, Tanis KQ, Maislin G, Stone DJ, Renger JJ, Winrow CJ, Pack AI. Blood-gene expression reveals reduced circadian rhythmicity in individuals resistant to sleep deprivation. SLEEP 2014;37(10):1589-1600. PMID:25197809
2013-01-01
Background Austism spectrum disorder (ASD) is a heterogeneous behavioral disorder or condition characterized by severe impairment of social engagement and the presence of repetitive activities. The molecular etiology of ASD is still largely unknown despite a strong genetic component. Part of the difficulty in turning genetics into disease mechanisms and potentially new therapeutics is the sheer number and diversity of the genes that have been associated with ASD and ASD symptoms. The goal of this work is to use shRNA-generated models of genetic defects proposed as causative for ASD to identify the common pathways that might explain how they produce a core clinical disability. Methods Transcript levels of Mecp2, Mef2a, Mef2d, Fmr1, Nlgn1, Nlgn3, Pten, and Shank3 were knocked-down in mouse primary neuron cultures using shRNA constructs. Whole genome expression analysis was conducted for each of the knockdown cultures as well as a mock-transduced culture and a culture exposed to a lentivirus expressing an anti-luciferase shRNA. Gene set enrichment and a causal reasoning engine was employed to identify pathway level perturbations generated by the transcript knockdown. Results Quantification of the shRNA targets confirmed the successful knockdown at the transcript and protein levels of at least 75% for each of the genes. After subtracting out potential artifacts caused by viral infection, gene set enrichment and causal reasoning engine analysis showed that a significant number of gene expression changes mapped to pathways associated with neurogenesis, long-term potentiation, and synaptic activity. Conclusions This work demonstrates that despite the complex genetic nature of ASD, there are common molecular mechanisms that connect many of the best established autism candidate genes. By identifying the key regulatory checkpoints in the interlinking transcriptional networks underlying autism, we are better able to discover the ideal points of intervention that provide the broadest efficacy across the diverse population of autism patients. PMID:24238429
Lanz, Thomas A; Guilmette, Edward; Gosink, Mark M; Fischer, James E; Fitzgerald, Lawrence W; Stephenson, Diane T; Pletcher, Mathew T
2013-11-15
Austism spectrum disorder (ASD) is a heterogeneous behavioral disorder or condition characterized by severe impairment of social engagement and the presence of repetitive activities. The molecular etiology of ASD is still largely unknown despite a strong genetic component. Part of the difficulty in turning genetics into disease mechanisms and potentially new therapeutics is the sheer number and diversity of the genes that have been associated with ASD and ASD symptoms. The goal of this work is to use shRNA-generated models of genetic defects proposed as causative for ASD to identify the common pathways that might explain how they produce a core clinical disability. Transcript levels of Mecp2, Mef2a, Mef2d, Fmr1, Nlgn1, Nlgn3, Pten, and Shank3 were knocked-down in mouse primary neuron cultures using shRNA constructs. Whole genome expression analysis was conducted for each of the knockdown cultures as well as a mock-transduced culture and a culture exposed to a lentivirus expressing an anti-luciferase shRNA. Gene set enrichment and a causal reasoning engine was employed to identify pathway level perturbations generated by the transcript knockdown. Quantification of the shRNA targets confirmed the successful knockdown at the transcript and protein levels of at least 75% for each of the genes. After subtracting out potential artifacts caused by viral infection, gene set enrichment and causal reasoning engine analysis showed that a significant number of gene expression changes mapped to pathways associated with neurogenesis, long-term potentiation, and synaptic activity. This work demonstrates that despite the complex genetic nature of ASD, there are common molecular mechanisms that connect many of the best established autism candidate genes. By identifying the key regulatory checkpoints in the interlinking transcriptional networks underlying autism, we are better able to discover the ideal points of intervention that provide the broadest efficacy across the diverse population of autism patients.
Tissue-Specific Transcriptomic Profiling of Sorghum propinquum using a Rice Genome Array
Zhang, Ting; Zhao, Xiuqin; Huang, Liyu; Liu, Xiaoyue; Zong, Ying; Zhu, Linghua; Yang, Daichang; Fu, Binying
2013-01-01
Sorghum (Sorghum bicolor) is one of the world's most important cereal crops. S. propinquum is a perennial wild relative of S. bicolor with well-developed rhizomes. Functional genomics analysis of S. propinquum, especially with respect to molecular mechanisms related to rhizome growth and development, can contribute to the development of more sustainable grain, forage, and bioenergy cropping systems. In this study, we used a whole rice genome oligonucleotide microarray to obtain tissue-specific gene expression profiles of S. propinquum with special emphasis on rhizome development. A total of 548 tissue-enriched genes were detected, including 31 and 114 unique genes that were expressed predominantly in the rhizome tips (RT) and internodes (RI), respectively. Further GO analysis indicated that the functions of these tissue-enriched genes corresponded to their characteristic biological processes. A few distinct cis-elements, including ABA-responsive RY repeat CATGCA, sugar-repressive TTATCC, and GA-responsive TAACAA, were found to be prevalent in RT-enriched genes, implying an important role in rhizome growth and development. Comprehensive comparative analysis of these rhizome-enriched genes and rhizome-specific genes previously identified in Oryza longistaminata and S. propinquum indicated that phytohormones, including ABA, GA, and SA, are key regulators of gene expression during rhizome development. Co-localization of rhizome-enriched genes with rhizome-related QTLs in rice and sorghum generated functional candidates for future cloning of genes associated with rhizome growth and development. PMID:23536906
Roth, Justin C.; Ismail, Mourad; Reese, Jane S.; Lingas, Karen T.; Ferrari, Giuliana; Gerson, Stanton L.
2012-01-01
The P140K point mutant of MGMT allows robust hematopoietic stem cell (HSC) enrichment in vivo. Thus, dual-gene vectors that couple MGMT and therapeutic gene expression have allowed enrichment of gene-corrected HSCs in animal models. However, expression levels from dual-gene vectors are often reduced for one or both genes. Further, it may be desirable to express selection and therapeutic genes at distinct stages of cell differentiation. In this regard, we evaluated whether hematopoietic cells could be efficiently cotransduced using low MOIs of two separate single-gene lentiviruses, including MGMT for dual-positive cell enrichment. Cotransduction efficiencies were evaluated using a range of MGMT : GFP virus ratios, MOIs, and selection stringencies in vitro. Cotransduction was optimal when equal proportions of each virus were used, but low MGMT : GFP virus ratios resulted in the highest proportion of dual-positive cells after selection. This strategy was then evaluated in murine models for in vivo selection of HSCs cotransduced with a ubiquitous MGMT expression vector and an erythroid-specific GFP vector. Although the MGMT and GFP expression percentages were variable among engrafted recipients, drug selection enriched MGMT-positive leukocyte and GFP-positive erythroid cell populations. These data demonstrate cotransduction as a mean to rapidly enrich and evaluate therapeutic lentivectors in vivo. PMID:22888445
The ISW1 and CHD1 ATP-dependent chromatin remodelers compete to set nucleosome spacing in vivo.
Ocampo, Josefina; Chereji, Răzvan V; Eriksson, Peter R; Clark, David J
2016-06-02
Adenosine triphosphate-dependent chromatin remodeling machines play a central role in gene regulation by manipulating chromatin structure. Most genes have a nucleosome-depleted region at the promoter and an array of regularly spaced nucleosomes phased relative to the transcription start site. In vitro, the three known yeast nucleosome spacing enzymes (CHD1, ISW1 and ISW2) form arrays with different spacing. We used genome-wide nucleosome sequencing to determine whether these enzymes space nucleosomes differently in vivo We find that CHD1 and ISW1 compete to set the spacing on most genes, such that CHD1 dominates genes with shorter spacing and ISW1 dominates genes with longer spacing. In contrast, ISW2 plays a minor role, limited to transcriptionally inactive genes. Heavily transcribed genes show weak phasing and extreme spacing, either very short or very long, and are depleted of linker histone (H1). Genes with longer spacing are enriched in H1, which directs chromatin folding. We propose that CHD1 directs short spacing, resulting in eviction of H1 and chromatin unfolding, whereas ISW1 directs longer spacing, allowing H1 to bind and condense the chromatin. Thus, competition between the two remodelers to set the spacing on each gene may result in a highly dynamic chromatin structure. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Computational dissection of human episodic memory reveals mental process-specific genetic profiles
Luksys, Gediminas; Fastenrath, Matthias; Coynel, David; Freytag, Virginie; Gschwind, Leo; Heck, Angela; Jessen, Frank; Maier, Wolfgang; Milnik, Annette; Riedel-Heller, Steffi G.; Scherer, Martin; Spalek, Klara; Vogler, Christian; Wagner, Michael; Wolfsgruber, Steffen; Papassotiropoulos, Andreas; de Quervain, Dominique J.-F.
2015-01-01
Episodic memory performance is the result of distinct mental processes, such as learning, memory maintenance, and emotional modulation of memory strength. Such processes can be effectively dissociated using computational models. Here we performed gene set enrichment analyses of model parameters estimated from the episodic memory performance of 1,765 healthy young adults. We report robust and replicated associations of the amine compound SLC (solute-carrier) transporters gene set with the learning rate, of the collagen formation and transmembrane receptor protein tyrosine kinase activity gene sets with the modulation of memory strength by negative emotional arousal, and of the L1 cell adhesion molecule (L1CAM) interactions gene set with the repetition-based memory improvement. Furthermore, in a large functional MRI sample of 795 subjects we found that the association between L1CAM interactions and memory maintenance revealed large clusters of differences in brain activity in frontal cortical areas. Our findings provide converging evidence that distinct genetic profiles underlie specific mental processes of human episodic memory. They also provide empirical support to previous theoretical and neurobiological studies linking specific neuromodulators to the learning rate and linking neural cell adhesion molecules to memory maintenance. Furthermore, our study suggests additional memory-related genetic pathways, which may contribute to a better understanding of the neurobiology of human memory. PMID:26261317
Computational dissection of human episodic memory reveals mental process-specific genetic profiles.
Luksys, Gediminas; Fastenrath, Matthias; Coynel, David; Freytag, Virginie; Gschwind, Leo; Heck, Angela; Jessen, Frank; Maier, Wolfgang; Milnik, Annette; Riedel-Heller, Steffi G; Scherer, Martin; Spalek, Klara; Vogler, Christian; Wagner, Michael; Wolfsgruber, Steffen; Papassotiropoulos, Andreas; de Quervain, Dominique J-F
2015-09-01
Episodic memory performance is the result of distinct mental processes, such as learning, memory maintenance, and emotional modulation of memory strength. Such processes can be effectively dissociated using computational models. Here we performed gene set enrichment analyses of model parameters estimated from the episodic memory performance of 1,765 healthy young adults. We report robust and replicated associations of the amine compound SLC (solute-carrier) transporters gene set with the learning rate, of the collagen formation and transmembrane receptor protein tyrosine kinase activity gene sets with the modulation of memory strength by negative emotional arousal, and of the L1 cell adhesion molecule (L1CAM) interactions gene set with the repetition-based memory improvement. Furthermore, in a large functional MRI sample of 795 subjects we found that the association between L1CAM interactions and memory maintenance revealed large clusters of differences in brain activity in frontal cortical areas. Our findings provide converging evidence that distinct genetic profiles underlie specific mental processes of human episodic memory. They also provide empirical support to previous theoretical and neurobiological studies linking specific neuromodulators to the learning rate and linking neural cell adhesion molecules to memory maintenance. Furthermore, our study suggests additional memory-related genetic pathways, which may contribute to a better understanding of the neurobiology of human memory.
CoPub: a literature-based keyword enrichment tool for microarray data analysis.
Frijters, Raoul; Heupers, Bart; van Beek, Pieter; Bouwhuis, Maurice; van Schaik, René; de Vlieg, Jacob; Polman, Jan; Alkema, Wynand
2008-07-01
Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl.
Identification of Bacteria Responsible for Ammonia Oxidation in Freshwater Aquaria
Burrell, Paul C.; Phalen, Carol M.; Hovanec, Timothy A.
2001-01-01
Culture enrichments and culture-independent molecular methods were employed to identify and confirm the presence of novel ammonia-oxidizing bacteria (AOB) in nitrifying freshwater aquaria. Reactors were seeded with biomass from freshwater nitrifying systems and enriched for AOB under various conditions of ammonia concentration. Surveys of cloned rRNA genes from the enrichments revealed four major strains of AOB which were phylogenetically related to the Nitrosomonas marina cluster, the Nitrosospira cluster, or the Nitrosomonas europaea-Nitrosococcus mobilis cluster of the β subdivision of the class Proteobacteria. Ammonia concentration in the reactors determined which AOB strain dominated in an enrichment. Oligonucleotide probes and PCR primer sets specific for the four AOB strains were developed and used to confirm the presence of the AOB strains in the enrichments. Enrichments of the AOB strains were added to newly established aquaria to determine their ability to accelerate the establishment of ammonia oxidation. Enrichments containing the Nitrosomonas marina-like AOB strain were most efficient at accelerating ammonia oxidation in newly established aquaria. Furthermore, if the Nitrosomonas marina-like AOB strain was present in the original enrichment, even one with other AOB, only the Nitrosomonas marina-like AOB strain was present in aquaria after nitrification was established. Nitrosomonas marina-like AOB were 2% or less of the cells detected by fluorescence in situ hybridization analysis in aquaria in which nitrification was well established. PMID:11722936
Fauteux, François; Strömvik, Martina V
2009-01-01
Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes. PMID:19843335
Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit
2016-03-01
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.
Rai, Amit; Kamochi, Hidetaka; Suzuki, Hideyuki; Nakamura, Michimi; Takahashi, Hiroki; Hatada, Tomoki; Saito, Kazuki; Yamazaki, Mami
2017-01-01
Lonicera japonica is one of the most important medicinal plants with applications in traditional Chinese and Japanese medicine for thousands of years. Extensive studies on the constituents of L. japonica extracts have revealed an accumulation of pharmaceutically active metabolite classes, such as chlorogenic acid, luteolin and other flavonoids, and secoiridoids, which impart characteristic medicinal properties. Despite being a rich source of pharmaceutically active metabolites, little is known about the biosynthetic enzymes involved, and their expression profile across different tissues of L. japonica. In this study, we performed de novo transcriptome assembly for L. japonica, representing transcripts from nine different tissues. A total of 22 Gbps clean RNA-seq reads from nine tissues of L. japonica were used, resulting in 243,185 unigenes, with 99,938 unigenes annotated based on a homology search using blastx against the NCBI-nr protein database. Unsupervised principal component analysis and correlation studies using transcript expression data from all nine tissues of L. japonica showed relationships between tissues, explaining their association at different developmental stages. Homologs for all genes associated with chlorogenic acid, luteolin, and secoiridoid biosynthesis pathways were identified in the L. japonica transcriptome assembly. Expression of unigenes associated with chlorogenic acid was enriched in stems and leaf-2, unigenes from luteolin were enriched in stems and flowers, while unigenes from secoiridoid metabolic pathways were enriched in leaf-1 and shoot apex. Our results showed that different tissues of L. japonica are enriched with sets of unigenes associated with specific pharmaceutically important metabolic pathways and, therefore, possess unique medicinal properties. The present study will serve as a resource for future attempts for functional characterization of enzyme coding genes within key metabolic processes.
Xi, Jianing; Wang, Minghui; Li, Ao
2017-09-26
The accumulating availability of next-generation sequencing data offers an opportunity to pinpoint driver genes that are causally implicated in oncogenesis through computational models. Despite previous efforts made regarding this challenging problem, there is still room for improvement in the driver gene identification accuracy. In this paper, we propose a novel integrated approach called IntDriver for prioritizing driver genes. Based on a matrix factorization framework, IntDriver can effectively incorporate functional information from both the interaction network and Gene Ontology similarity, and detect driver genes mutated in different sets of patients at the same time. When evaluated through known benchmarking driver genes, the top ranked genes of our result show highly significant enrichment for the known genes. Meanwhile, IntDriver also detects some known driver genes that are not found by the other competing approaches. When measured by precision, recall and F1 score, the performances of our approach are comparable or increased in comparison to the competing approaches.
OVCAR-3 Spheroid-Derived Cells Display Distinct Metabolic Profiles
Vermeersch, Kathleen A.; Wang, Lijuan; Mezencev, Roman; McDonald, John F.; Styczynski, Mark P.
2015-01-01
Introduction Recently, multicellular spheroids were isolated from a well-established epithelial ovarian cancer cell line, OVCAR-3, and were propagated in vitro. These spheroid-derived cells displayed numerous hallmarks of cancer stem cells, which are chemo- and radioresistant cells thought to be a significant cause of cancer recurrence and resultant mortality. Gene set enrichment analysis of expression data from the OVCAR-3 cells and the spheroid-derived putative cancer stem cells identified several metabolic pathways enriched in differentially expressed genes. Before this, there had been little previous knowledge or investigation of systems-scale metabolic differences between cancer cells and cancer stem cells, and no knowledge of such differences in ovarian cancer stem cells. Methods To determine if there were substantial metabolic changes corresponding with these transcriptional differences, we used two-dimensional gas chromatography coupled to mass spectrometry to measure the metabolite profiles of the two cell lines. Results These two cell lines exhibited significant metabolic differences in both intracellular and extracellular metabolite measurements. Principal components analysis, an unsupervised dimensional reduction technique, showed complete separation between the two cell types based on their metabolite profiles. Pathway analysis of intracellular metabolomics data revealed close overlap with metabolic pathways identified from gene expression data, with four out of six pathways found enriched in gene-level analysis also enriched in metabolite-level analysis. Some of those pathways contained multiple metabolites that were individually statistically significantly different between the two cell lines, with one of the most broadly and consistently different pathways, arginine and proline metabolism, suggesting an interesting hypothesis about cancerous and stem-like metabolic phenotypes in this pair of cell lines. Conclusions Overall, we demonstrate for the first time that metabolism in an ovarian cancer stem cell line is distinct from that of more differentiated isogenic cancer cells, supporting the potential importance of metabolism in the differences between cancer cells and cancer stem cells. PMID:25688563
Identification of targets of miRNA-221 and miRNA-222 in fulvestrant-resistant breast cancer
Liu, Pengfei; Sun, Manna; Jiang, Wenhua; Zhao, Jinkun; Liang, Chunyong; Zhang, Huilai
2016-01-01
The present study aimed to identify the differentially expressed genes (DEGs) regulated by microRNA (miRNA)-221 and miRNA-222 that are associated with the resistance of breast cancer to fulvestrant. The GSE19777 transcription profile was downloaded from the Gene Expression Omnibus database, and includes data from three samples of antisense miRNA-221-transfected fulvestrant-resistant MCF7-FR breast cancer cells, three samples of antisense miRNA-222-transfected fulvestrant-resistant MCF7-FR cells and three samples of control inhibitor (green fluorescent protein)-treated fulvestrant-resistant MCF7-FR cells. The linear models for microarray data package in R/Bioconductor was employed to screen for DEGs in the miRNA-transfected cells, and the pheatmap package in R was used to perform two-way clustering. Pathway enrichment was conducted using the Gene Set Enrichment Analysis tool. Furthermore, a miRNA-messenger (m) RNA regulatory network depicting interactions between miRNA-targeted upregulated DEGs was constructed and visualized using Cytoscape. In total, 492 and 404 DEGs were identified for the antisense miRNA-221-transfected MCF7-FR cells and the antisense miRNA-222-transfected MCF7-FR cells, respectively. Genes of the pentose phosphate pathway (PPP) were significantly enriched in the antisense miRNA-221-transfected MCF7-FR cells. In addition, components of the Wnt signaling pathway and cell adhesion molecules (CAMs) were significantly enriched in the antisense miRNA-222-transfected MCF7-FR cells. In the miRNA-mRNA regulatory network, miRNA-222 was demonstrated to target protocadherin 10 (PCDH10). The results of the present study suggested that the PPP and Wnt signaling pathways, as well as CAMs and PCDH10, may be associated with the resistance of breast cancer to fulvestrant. PMID:27895744
Expression of mouse Tla region class I genes in tissues enriched for gamma delta cells.
Eghtesady, P; Brorson, K A; Cheroutre, H; Tigelaar, R E; Hood, L; Kronenberg, M
1992-01-01
The Tla region of the BALB/c mouse major histocompatibility complex contains at least 20 class I genes. The function of the products of these genes is unknown, but recent evidence demonstrates that some Tla region gene products could be involved in presentation of antigens to gamma delta T cells. We have generated a set of polymerase chain reaction (PCR) oligonucleotide primers and hybridization probes that permit us to specifically amplify and detect expression of 11 of the 20 BALB/c Tla region genes. cDNA prepared from 12 adult and fetal tissues and from seven cell lines was analyzed. In some cases, northern blot analysis or staining with monoclonal antibodies specific for the Tla-encoded thymus leukemia (TL) antigen were used to confirm the expression pattern of several of the genes as determined by PCR. Some Tla region genes, such as T24d and the members of the T10d/T22d gene pair, are expressed in a wide variety of tissues in a manner similar to the class I transplantation antigens. The members of the TL antigen encoding gene pair, T3d/T18d, are expressed in only a limited number of organs, including several sites enriched for gamma delta T cells. Other Tla region genes, including T1d, T2d, T16d, and T17d, are transcriptionally silent and transcripts from the T8d/T20d gene pair do not undergo proper splicing. In general, sites that contain gamma delta T lymphocytes have Tla region transcripts. The newly identified pattern of expression of the genes analyzed in sites containing gamma delta T cells further extends the list of potential candidates for antigen presentation to gamma delta T cells.
Ihara, Motomasa; Meyer-Ficca, Mirella L; Leu, N Adrian; Rao, Shilpa; Li, Fan; Gregory, Brian D; Zalenskaya, Irina A; Schultz, Richard M; Meyer, Ralph G
2014-05-01
To achieve the extreme nuclear condensation necessary for sperm function, most histones are replaced with protamines during spermiogenesis in mammals. Mature sperm retain only a small fraction of nucleosomes, which are, in part, enriched on gene regulatory sequences, and recent findings suggest that these retained histones provide epigenetic information that regulates expression of a subset of genes involved in embryo development after fertilization. We addressed this tantalizing hypothesis by analyzing two mouse models exhibiting abnormal histone positioning in mature sperm due to impaired poly(ADP-ribose) (PAR) metabolism during spermiogenesis and identified altered sperm histone retention in specific gene loci genome-wide using MNase digestion-based enrichment of mononucleosomal DNA. We then set out to determine the extent to which expression of these genes was altered in embryos generated with these sperm. For control sperm, most genes showed some degree of histone association, unexpectedly suggesting that histone retention in sperm genes is not an all-or-none phenomenon and that a small number of histones may remain associated with genes throughout the genome. The amount of retained histones, however, was altered in many loci when PAR metabolism was impaired. To ascertain whether sperm histone association and embryonic gene expression are linked, the transcriptome of individual 2-cell embryos derived from such sperm was determined using microarrays and RNA sequencing. Strikingly, a moderate but statistically significant portion of the genes that were differentially expressed in these embryos also showed different histone retention in the corresponding gene loci in sperm of their fathers. These findings provide new evidence for the existence of a linkage between sperm histone retention and gene expression in the embryo.
Leu, N. Adrian; Rao, Shilpa; Li, Fan; Gregory, Brian D.; Zalenskaya, Irina A.; Schultz, Richard M.; Meyer, Ralph G.
2014-01-01
To achieve the extreme nuclear condensation necessary for sperm function, most histones are replaced with protamines during spermiogenesis in mammals. Mature sperm retain only a small fraction of nucleosomes, which are, in part, enriched on gene regulatory sequences, and recent findings suggest that these retained histones provide epigenetic information that regulates expression of a subset of genes involved in embryo development after fertilization. We addressed this tantalizing hypothesis by analyzing two mouse models exhibiting abnormal histone positioning in mature sperm due to impaired poly(ADP-ribose) (PAR) metabolism during spermiogenesis and identified altered sperm histone retention in specific gene loci genome-wide using MNase digestion-based enrichment of mononucleosomal DNA. We then set out to determine the extent to which expression of these genes was altered in embryos generated with these sperm. For control sperm, most genes showed some degree of histone association, unexpectedly suggesting that histone retention in sperm genes is not an all-or-none phenomenon and that a small number of histones may remain associated with genes throughout the genome. The amount of retained histones, however, was altered in many loci when PAR metabolism was impaired. To ascertain whether sperm histone association and embryonic gene expression are linked, the transcriptome of individual 2-cell embryos derived from such sperm was determined using microarrays and RNA sequencing. Strikingly, a moderate but statistically significant portion of the genes that were differentially expressed in these embryos also showed different histone retention in the corresponding gene loci in sperm of their fathers. These findings provide new evidence for the existence of a linkage between sperm histone retention and gene expression in the embryo. PMID:24810616
Fakhro, Khalid A.; Choi, Murim; Ware, Stephanie M.; Belmont, John W.; Towbin, Jeffrey A.; Lifton, Richard P.; Khokha, Mustafa K.; Brueckner, Martina
2011-01-01
Dominant human genetic diseases that impair reproductive fitness and have high locus heterogeneity constitute a problem for gene discovery because the usual criterion of finding more mutations in specific genes than expected by chance may require extremely large populations. Heterotaxy (Htx), a congenital heart disease resulting from abnormalities in left-right (LR) body patterning, has features suggesting that many cases fall into this category. In this setting, appropriate model systems may provide a means to support implication of specific genes. By high-resolution genotyping of 262 Htx subjects and 991 controls, we identify a twofold excess of subjects with rare genic copy number variations in Htx (14.5% vs. 7.4%, P = 1.5 × 10−4). Although 7 of 45 Htx copy number variations were large chromosomal abnormalities, 38 smaller copy number variations altered a total of 61 genes, 22 of which had Xenopus orthologs. In situ hybridization identified 7 of these 22 genes with expression in the ciliated LR organizer (gastrocoel roof plate), a marked enrichment compared with 40 of 845 previously studied genes (sevenfold enrichment, P < 10−6). Morpholino knockdown in Xenopus of Htx candidates demonstrated that five (NEK2, ROCK2, TGFBR2, GALNT11, and NUP188) strongly disrupted both morphological LR development and expression of pitx2, a molecular marker of LR patterning. These effects were specific, because 0 of 13 control genes from rare Htx or control copy number variations produced significant LR abnormalities (P = 0.001). These findings identify genes not previously implicated in LR patterning. PMID:21282601
Fakhro, Khalid A; Choi, Murim; Ware, Stephanie M; Belmont, John W; Towbin, Jeffrey A; Lifton, Richard P; Khokha, Mustafa K; Brueckner, Martina
2011-02-15
Dominant human genetic diseases that impair reproductive fitness and have high locus heterogeneity constitute a problem for gene discovery because the usual criterion of finding more mutations in specific genes than expected by chance may require extremely large populations. Heterotaxy (Htx), a congenital heart disease resulting from abnormalities in left-right (LR) body patterning, has features suggesting that many cases fall into this category. In this setting, appropriate model systems may provide a means to support implication of specific genes. By high-resolution genotyping of 262 Htx subjects and 991 controls, we identify a twofold excess of subjects with rare genic copy number variations in Htx (14.5% vs. 7.4%, P = 1.5 × 10(-4)). Although 7 of 45 Htx copy number variations were large chromosomal abnormalities, 38 smaller copy number variations altered a total of 61 genes, 22 of which had Xenopus orthologs. In situ hybridization identified 7 of these 22 genes with expression in the ciliated LR organizer (gastrocoel roof plate), a marked enrichment compared with 40 of 845 previously studied genes (sevenfold enrichment, P < 10(-6)). Morpholino knockdown in Xenopus of Htx candidates demonstrated that five (NEK2, ROCK2, TGFBR2, GALNT11, and NUP188) strongly disrupted both morphological LR development and expression of pitx2, a molecular marker of LR patterning. These effects were specific, because 0 of 13 control genes from rare Htx or control copy number variations produced significant LR abnormalities (P = 0.001). These findings identify genes not previously implicated in LR patterning.
Pfalzer, Anna C; Kamanu, Frederick K; Parnell, Laurence D; Tai, Albert K; Liu, Zhenhua; Mason, Joel B; Crott, Jimmy W
2016-08-01
Obesity is a significant risk factor for colorectal cancer (CRC); however, the relative contribution of high-fat (HF) consumption and excess adiposity remains unclear. It is becoming apparent that obesity perturbs both the intestinal microbiome and metabolome, and each has the potential to induce protumorigenic changes in the epithelial transcriptome. The physiological consequences and the degree to which these different biologic systems interact remain poorly defined. To understand the mechanisms by which obesity drives colonic tumorigenesis, we profiled the colonic epithelial transcriptome of HF-fed and genetically obese (DbDb) mice with a genetic predisposition to intestinal tumorigenesis (Apc(1638N)); 266 and 584 genes were differentially expressed in the colonic mucosa of HF and DbDb mice, respectively. These genes mapped to pathways involved in immune function, and cellular proliferation and cancer. Furthermore, Akt was central within the networks of interacting genes identified in both gene sets. Regression analyses of coexpressed genes with the abundance of bacterial taxa identified three taxa, previously correlated with tumor burden, to be significantly correlated with a gene module enriched for Akt-related genes. Similarly, regression of coexpressed genes with metabolites found that adenosine, which was negatively associated with inflammatory markers and tumor burden, was also correlated with a gene module enriched with Akt regulators. Our findings provide evidence that HF consumption and excess adiposity result in changes in the colonic transcriptome that, although distinct, both appear to converge on Akt signaling. Such changes could be mediated by alterations in the colonic microbiome and metabolome.
Stam, Remco; Scheikl, Daniela; Tellier, Aurélien
2016-01-01
Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. PMID:27189991
Hill, Matthew J; Killick, Richard; Navarrete, Katherinne; Maruszak, Aleksandra; McLaughlin, Gemma M; Williams, Brenda P; Bray, Nicholas J
2017-05-01
Common variants in the TCF4 gene are among the most robustly supported genetic risk factors for schizophrenia. Rare TCF4 deletions and loss-of-function point mutations cause Pitt-Hopkins syndrome, a developmental disorder associated with severe intellectual disability. To explore molecular and cellular mechanisms by which TCF4 perturbation could interfere with human cortical development, we experimentally reduced the endogenous expression of TCF4 in a neural progenitor cell line derived from the developing human cerebral cortex using RNA interference. Effects on genome-wide gene expression were assessed by microarray, followed by Gene Ontology and pathway analysis of differentially expressed genes. We tested for genetic association between the set of differentially expressed genes and schizophrenia using genome-wide association study data from the Psychiatric Genomics Consortium and competitive gene set analysis (MAGMA). Effects on cell proliferation were assessed using high content imaging. Genes that were differentially expressed following TCF4 knockdown were highly enriched for involvement in the cell cycle. There was a nonsignificant trend for genetic association between the differentially expressed gene set and schizophrenia. Consistent with the gene expression data, TCF4 knockdown was associated with reduced proliferation of cortical progenitor cells in vitro. A detailed mechanistic explanation of how TCF4 knockdown alters human neural progenitor cell proliferation is not provided by this study. Our data indicate effects of TCF4 perturbation on human cortical progenitor cell proliferation, a process that could contribute to cognitive deficits in individuals with Pitt-Hopkins syndrome and risk for schizophrenia.
Protective pathways against colitis mediated by appendicitis and appendectomy.
Cheluvappa, R; Luo, A S; Palmer, C; Grimm, M C
2011-09-01
Appendicitis followed by appendectomy (AA) at a young age protects against inflammatory bowel disease (IBD). Using a novel murine appendicitis model, we showed that AA protected against subsequent experimental colitis. To delineate genes/pathways involved in this protection, AA was performed and samples harvested from the most distal colon. RNA was extracted from four individual colonic samples per group (AA group and double-laparotomy control group) and each sample microarray analysed followed by gene-set enrichment analysis (GSEA). The gene-expression study was validated by quantitative reverse transcription-polymerase chain reaction (RT-PCR) of 14 selected genes across the immunological spectrum. Distal colonic expression of 266 gene-sets was up-regulated significantly in AA group samples (false discovery rates < 1%; P-value < 0·001). Time-course RT-PCR experiments involving the 14 genes displayed down-regulation over 28 days. The IBD-associated genes tnfsf10, SLC22A5, C3, ccr5, irgm, ptger4 and ccl20 were modulated in AA mice 3 days after surgery. Many key immunological and cellular function-associated gene-sets involved in the protective effect of AA in experimental colitis were identified. The down-regulation of 14 selected genes over 28 days after surgery indicates activation, repression or de-repression of these genes leading to downstream AA-conferred anti-colitis protection. Further analysis of these genes, profiles and biological pathways may assist in developing better therapeutic strategies in the management of intractable IBD. © 2011 The Authors. Clinical and Experimental Immunology © 2011 British Society for Immunology.
Zhang, Chaoyang; Peng, Li; Zhang, Yaqin; Liu, Zhaoyang; Li, Wenling; Chen, Shilian; Li, Guancheng
2017-06-01
Liver cancer is a serious threat to public health and has fairly complicated pathogenesis. Therefore, the identification of key genes and pathways is of much importance for clarifying molecular mechanism of hepatocellular carcinoma (HCC) initiation and progression. HCC-associated gene expression dataset was downloaded from Gene Expression Omnibus database. Statistical software R was used for significance analysis of differentially expressed genes (DEGs) between liver cancer samples and normal samples. Gene Ontology (GO) term enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, based on R software, were applied for the identification of pathways in which DEGs significantly enriched. Cytoscape software was for the construction of protein-protein interaction (PPI) network and module analysis to find the hub genes and key pathways. Finally, weighted correlation network analysis (WGCNA) was conducted to further screen critical gene modules with similar expression pattern and explore their biological significance. Significance analysis identified 1230 DEGs with fold change >2, including 632 significantly down-regulated DEGs and 598 significantly up-regulated DEGs. GO term enrichment analysis suggested that up-regulated DEG significantly enriched in immune response, cell adhesion, cell migration, type I interferon signaling pathway, and cell proliferation, and the down-regulated DEG mainly enriched in response to endoplasmic reticulum stress and endoplasmic reticulum unfolded protein response. KEGG pathway analysis found DEGs significantly enriched in five pathways including complement and coagulation cascades, focal adhesion, ECM-receptor interaction, antigen processing and presentation, and protein processing in endoplasmic reticulum. The top 10 hub genes in HCC were separately GMPS, ACACA, ALB, TGFB1, KRAS, ERBB2, BCL2, EGFR, STAT3, and CD8A, which resulted from PPI network. The top 3 gene interaction modules in PPI network enriched in immune response, organ development, and response to other organism, respectively. WGCNA revealed that the confirmed eight gene modules significantly enriched in monooxygenase and oxidoreductase activity, response to endoplasmic reticulum stress, type I interferon signaling pathway, processing, presentation and binding of peptide antigen, cellular response to cadmium and zinc ion, cell locomotion and differentiation, ribonucleoprotein complex and RNA processing, and immune system process, respectively. In conclusion, we identified some key genes and pathways closely related with HCC initiation and progression by a series of bioinformatics analysis on DEGs. These screened genes and pathways provided for a more detailed molecular mechanism underlying HCC occurrence and progression, holding promise for acting as biomarkers and potential therapeutic targets.
Impact of ontology evolution on functional analyses.
Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard
2012-10-15
Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.
2011-01-01
Background Stem cells and their niches are studied in many systems, but mammalian germ stem cells (GSC) and their niches are still poorly understood. In rat testis, spermatogonia and undifferentiated Sertoli cells proliferate before puberty, but at puberty most spermatogonia enter spermatogenesis, and Sertoli cells differentiate to support this program. Thus, pre-pubertal spermatogonia might possess GSC potential and pre-pubertal Sertoli cells niche functions. We hypothesized that the different stem cell pools at pre-puberty and maturity provide a model for the identification of stem cell and niche-specific genes. We compared the transcript profiles of spermatogonia and Sertoli cells from pre-pubertal and pubertal rats and examined how these related to genes expressed in testicular cancers, which might originate from inappropriate communication between GSCs and Sertoli cells. Results The pre-pubertal spermatogonia-specific gene set comprised known stem cell and spermatogonial stem cell (SSC) markers. Similarly, the pre-pubertal Sertoli cell-specific gene set comprised known niche gene transcripts. A large fraction of these specifically enriched transcripts encoded trans-membrane, extra-cellular, and secreted proteins highlighting stem cell to niche communication. Comparing selective gene sets established in this study with published gene expression data of testicular cancers and their stroma, we identified sets expressed genes shared between testicular tumors and pre-pubertal spermatogonia, and tumor stroma and pre-pubertal Sertoli cells with statistic significance. Conclusions Our data suggest that SSC and their niche specifically express complementary factors for cell communication and that the same factors might be implicated in the communication between tumor cells and their micro-enviroment in testicular cancer. PMID:21232125
2013-01-01
Background High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. Results We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. Conclusions We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. PMID:23302187
Androgen receptor (AR) cistrome in prostate differentiation and cancer progression.
Wang, Fengtian; Koul, Hari K
2017-01-01
Despite the progress in development of better AR-targeted therapies for prostate cancer (PCa), there is no curative therapy for castration-resistant prostate cancer (CRPC). Therapeutic resistance in PCa can be characterized in two broad categories of AR therapy resistance: the first and most prevalent one involves restoration of AR activity despite AR targeted therapy, and the second one involves tumor progression despite blockade of AR activity. As such AR remains the most attractive drug target for CRPC. Despite its oncogenic role, AR signaling also contributes to the maturation and differentiation of prostate luminal cells during development. Recent evidence suggests that AR cistrome is altered in advanced PCa. Alteration in AR may result from AR amplification, alternative splicing, mutations, post-translational modification of AR, and altered expression of AR co-factors. We reasoned that such alterations would result in the transcription of disparate AR target genes and as such may contribute to the emergence of castration-resistance. In the present study, we evaluated the expression of genes associated with canonical or non-canonical AR cistrome in relationship with PCa progression and prostate development by analyzing publicly available datasets. We discovered a transcription switch from canonical AR cistrome target genes to the non-canonical AR cistrome target genes during PCa progression. Using Gene Set Enrichment Analysis (GSEA), we discovered that canonical AR cistrome target genes are enriched in indolent PCa patients and the loss of canonical AR cistrome is associated with tumor metastasis and poor clinical outcome. Analysis of the datasets involving prostate development, revealed that canonical AR cistrome target genes are significantly enriched in prostate luminal cells and can distinguish luminal cells from basal cells, suggesting a pivotal role for canonical AR cistrome driven genes in prostate development. These data suggest that the expression of canonical AR cistrome related genes play an important role in maintaining the prostate luminal cell identity and might restrict the lineage plasticity observed in lethal PCa. Understanding the molecular mechanisms that dictate AR cistrome may lead to development of new therapeutic strategies aimed at restoring canonical AR cistrome, rewiring the oncogenic AR signaling and overcome resistance to AR targeted therapies.
Whole-transcriptome response to water stress in a California endemic oak, Quercus lobata.
Gugger, Paul F; Peñaloza-Ramírez, Juan Manuel; Wright, Jessica W; Sork, Victoria L
2017-05-01
Reduced water availability during drought can create major stress for many plant species. Within a species, populations with a history of seasonal drought may have evolved the ability to tolerate drought more than those in areas of high precipitation and low seasonality. In this study, we assessed response to water stress in a California oak species, Quercus lobata Née, by measuring changes in gene expression profiles before and after a simulated drought stress treatment through water deprivation of seedlings in a greenhouse setting. Using whole-transcriptome sequencing from nine samples from three collection localities, we identified which genes are involved in response to drought stress and tested the hypothesis that seedlings sampled from climatically different regions of the species range respond to water stress differently. We observed a surprisingly massive transcriptional response to drought: 35,347 of 68,434 contigs (52%) were differentially expressed before versus after drought treatment, of which 18,111 were down-regulated and 17,236 were up-regulated. Genes functionally associated with abiotic stresses and death were enriched among the up-regulated genes, whereas metabolic and cell part-related genes were enriched among the down-regulated. We found 56 contigs that exhibited significantly different expression responses to the drought treatment among the three populations (treatment × population interaction), suggesting that those genes may be involved in local adaptation to drought stress. These genes have stress response (e.g., WRKY DNA-binding protein 51 and HSP20-like chaperones superfamily protein), metabolic (e.g., phosphoglycerate kinase and protein kinase superfamily protein), transport/transfer (e.g., cationic amino acid transporter 7 and K+ transporter) and regulatory functions (e.g., WRKY51 and Homeodomain-like transcriptional regulator). Baseline expression levels of 1310 unique contigs also differed among pairs of populations, and they were enriched for metabolic and cell part-related genes. Out of the large fraction of the transcriptome that was differentially expressed in response to our drought treatment, we identified several novel genes that are candidates for involvement in local adaptation to drought. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Analytical workflow profiling gene expression in murine macrophages
Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.
2015-01-01
Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305
Reliable pre-eclampsia pathways based on multiple independent microarray data sets.
Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo
2015-02-01
Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre-eclampsia. © The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
2013-01-01
Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432
2011-01-01
Background The aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor (TF) that mediates responses to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Integration of TCDD-induced genome-wide AhR enrichment, differential gene expression and computational dioxin response element (DRE) analyses further elucidate the hepatic AhR regulatory network. Results Global ChIP-chip and gene expression analyses were performed on hepatic tissue from immature ovariectomized mice orally gavaged with 30 μg/kg TCDD. ChIP-chip analysis identified 14,446 and 974 AhR enriched regions (1% false discovery rate) at 2 and 24 hrs, respectively. Enrichment density was greatest in the proximal promoter, and more specifically, within ± 1.5 kb of a transcriptional start site (TSS). AhR enrichment also occurred distal to a TSS (e.g. intergenic DNA and 3' UTR), extending the potential gene expression regulatory roles of the AhR. Although TF binding site analyses identified over-represented DRE sequences within enriched regions, approximately 50% of all AhR enriched regions lacked a DRE core (5'-GCGTG-3'). Microarray analysis identified 1,896 number of TCDD-responsive genes (|fold change| ≥ 1.5, P1(t) > 0.999). Integrating this gene expression data with our ChIP-chip and DRE analyses only identified 625 differentially expressed genes that involved an AhR interaction at a DRE. Functional annotation analysis of differentially regulated genes associated with AhR enrichment identified overrepresented processes related to fatty acid and lipid metabolism and transport, and xenobiotic metabolism, which are consistent with TCDD-elicited steatosis in the mouse liver. Conclusions Details of the AhR regulatory network have been expanded to include AhR-DNA interactions within intragenic and intergenic genomic regions. Moreover, the AhR can interact with DNA independent of a DRE core suggesting there are alternative mechanisms of AhR-mediated gene regulation. PMID:21762485
Herman, Dorota; Slabbinck, Bram; Pè, Mario Enrico
2016-01-01
Leaves are vital organs for biomass and seed production because of their role in the generation of metabolic energy and organic compounds. A better understanding of the molecular networks underlying leaf development is crucial to sustain global requirements for food and renewable energy. Here, we combined transcriptome profiling of proliferative leaf tissue with in-depth phenotyping of the fourth leaf at later stages of development in 197 recombinant inbred lines of two different maize (Zea mays) populations. Previously, correlation analysis in a classical biparental mapping population identified 1,740 genes correlated with at least one of 14 traits. Here, we extended these results with data from a multiparent advanced generation intercross population. As expected, the phenotypic variability was found to be larger in the latter population than in the biparental population, although general conclusions on the correlations among the traits are comparable. Data integration from the two diverse populations allowed us to identify a set of 226 genes that are robustly associated with diverse leaf traits. This set of genes is enriched for transcriptional regulators and genes involved in protein synthesis and cell wall metabolism. In order to investigate the molecular network context of the candidate gene set, we integrated our data with publicly available functional genomics data and identified a growth regulatory network of 185 genes. Our results illustrate the power of combining in-depth phenotyping with transcriptomics in mapping populations to dissect the genetic control of complex traits and present a set of candidate genes for use in biomass improvement. PMID:26754667
Baute, Joke; Herman, Dorota; Coppens, Frederik; De Block, Jolien; Slabbinck, Bram; Dell'Acqua, Matteo; Pè, Mario Enrico; Maere, Steven; Nelissen, Hilde; Inzé, Dirk
2016-03-01
Leaves are vital organs for biomass and seed production because of their role in the generation of metabolic energy and organic compounds. A better understanding of the molecular networks underlying leaf development is crucial to sustain global requirements for food and renewable energy. Here, we combined transcriptome profiling of proliferative leaf tissue with in-depth phenotyping of the fourth leaf at later stages of development in 197 recombinant inbred lines of two different maize (Zea mays) populations. Previously, correlation analysis in a classical biparental mapping population identified 1,740 genes correlated with at least one of 14 traits. Here, we extended these results with data from a multiparent advanced generation intercross population. As expected, the phenotypic variability was found to be larger in the latter population than in the biparental population, although general conclusions on the correlations among the traits are comparable. Data integration from the two diverse populations allowed us to identify a set of 226 genes that are robustly associated with diverse leaf traits. This set of genes is enriched for transcriptional regulators and genes involved in protein synthesis and cell wall metabolism. In order to investigate the molecular network context of the candidate gene set, we integrated our data with publicly available functional genomics data and identified a growth regulatory network of 185 genes. Our results illustrate the power of combining in-depth phenotyping with transcriptomics in mapping populations to dissect the genetic control of complex traits and present a set of candidate genes for use in biomass improvement. © 2016 American Society of Plant Biologists. All Rights Reserved.
Paisitkriangkrai, Sakrapee; Quek, Kelly; Nievergall, Eva; Jabbour, Anissa; Zannettino, Andrew; Kok, Chung Hoow
2018-06-07
Recurrent oncogenic fusion genes play a critical role in the development of various cancers and diseases and provide, in some cases, excellent therapeutic targets. To date, analysis tools that can identify and compare recurrent fusion genes across multiple samples have not been available to researchers. To address this deficiency, we developed Co-occurrence Fusion (Co-fuse), a new and easy to use software tool that enables biologists to merge RNA-seq information, allowing them to identify recurrent fusion genes, without the need for exhaustive data processing. Notably, Co-fuse is based on pattern mining and statistical analysis which enables the identification of hidden patterns of recurrent fusion genes. In this report, we show that Co-fuse can be used to identify 2 distinct groups within a set of 49 leukemic cell lines based on their recurrent fusion genes: a multiple myeloma (MM) samples-enriched cluster and an acute myeloid leukemia (AML) samples-enriched cluster. Our experimental results further demonstrate that Co-fuse can identify known driver fusion genes (e.g., IGH-MYC, IGH-WHSC1) in MM, when compared to AML samples, indicating the potential of Co-fuse to aid the discovery of yet unknown driver fusion genes through cohort comparisons. Additionally, using a 272 primary glioma sample RNA-seq dataset, Co-fuse was able to validate recurrent fusion genes, further demonstrating the power of this analysis tool to identify recurrent fusion genes. Taken together, Co-fuse is a powerful new analysis tool that can be readily applied to large RNA-seq datasets, and may lead to the discovery of new disease subgroups and potentially new driver genes, for which, targeted therapies could be developed. The Co-fuse R source code is publicly available at https://github.com/sakrapee/co-fuse .
Sugathan, Aarathi; Biagioli, Marta; Golzio, Christelle; Erdin, Serkan; Blumenthal, Ian; Manavalan, Poornima; Ragavendran, Ashok; Brand, Harrison; Lucente, Diane; Miles, Judith; Sheridan, Steven D.; Stortchevoi, Alexei; Kellis, Manolis; Haggarty, Stephen J.; Katsanis, Nicholas; Gusella, James F.; Talkowski, Michael E.
2014-01-01
Truncating mutations of chromodomain helicase DNA-binding protein 8 (CHD8), and of many other genes with diverse functions, are strong-effect risk factors for autism spectrum disorder (ASD), suggesting multiple mechanisms of pathogenesis. We explored the transcriptional networks that CHD8 regulates in neural progenitor cells (NPCs) by reducing its expression and then integrating transcriptome sequencing (RNA sequencing) with genome-wide CHD8 binding (ChIP sequencing). Suppressing CHD8 to levels comparable with the loss of a single allele caused altered expression of 1,756 genes, 64.9% of which were up-regulated. CHD8 showed widespread binding to chromatin, with 7,324 replicated sites that marked 5,658 genes. Integration of these data suggests that a limited array of direct regulatory effects of CHD8 produced a much larger network of secondary expression changes. Genes indirectly down-regulated (i.e., without CHD8-binding sites) reflect pathways involved in brain development, including synapse formation, neuron differentiation, cell adhesion, and axon guidance, whereas CHD8-bound genes are strongly associated with chromatin modification and transcriptional regulation. Genes associated with ASD were strongly enriched among indirectly down-regulated loci (P < 10−8) and CHD8-bound genes (P = 0.0043), which align with previously identified coexpression modules during fetal development. We also find an intriguing enrichment of cancer-related gene sets among CHD8-bound genes (P < 10−10). In vivo suppression of chd8 in zebrafish produced macrocephaly comparable to that of humans with inactivating mutations. These data indicate that heterozygous disruption of CHD8 precipitates a network of gene-expression changes involved in neurodevelopmental pathways in which many ASD-associated genes may converge on shared mechanisms of pathogenesis. PMID:25294932
Ma, Min; Chen, Xiaofei; Lu, Liangyu; Yuan, Feng; Zeng, Wen; Luo, Shulin; Yin, Feng; Cai, Junfeng
2016-12-01
Postmenopausal osteoporosis is a common bone disease and characterized by low bone mineral density. This study aimed to reveal key genes associated with postmenopausal osteoporosis (PMO), and provide a theoretical basis for subsequent experiments. The dataset GSE7429 was obtained from Gene Expression Omnibus. A total of 20 B cell samples (ten ones, respectively from postmenopausal women with low or high bone mineral density (BMD) were included in this dataset. Following screening of differentially expressed genes (DEGs), coexpression analysis of all genes was performed, and key genes in the coexpression network were screened using the random walk algorithm. Afterwards, functional and pathway analyses were conducted. Additionally, protein-protein interactions (PPIs) between DEGs and key genes were analyzed. A set of 308 DEGs (170 up-regulated ones and 138 down-regulated ones) between low BMD and high BMD samples were identified, and 101 key genes in the coexpression network were screened out. In the coexpression network, some genes had a higher score and degree, such as CSTA. The key genes in the coexpression network were mainly enriched in GO terms of the defense response (e.g., SERPINA1 and CST3), immune response (e.g., IL32 and CLEC7A); while, the DEGs were mainly enriched in structural constituent of cytoskeleton (e.g., CYLC2 and TUBA1B) and membrane-enclosed lumen (e.g., CCNE1 and INTS5). In the PPI network, CCNE1 interacted with REL; and TUBA1B interacted with ESR1. A series of interactions, such as CSTA/TYROBP, CCNE1/REL and TUBA1B/ESR1 might play pivotal roles in the occurrence and development of PMO.
Chen, Yaowen; Li, Zongcheng; Hu, Shuofeng; Zhang, Jian; Wu, Jiaqi; Shao, Ningsheng; Bo, Xiaochen; Ni, Ming; Ying, Xiaomin
2017-02-01
Gut microbes play a critical role in human health and disease, and researchers have begun to characterize their genomes, the so-called gut metagenome. Thus far, metagenomics studies have focused on genus- or species-level composition and microbial gene sets, while strain-level composition and single-nucleotide polymorphism (SNP) have been overlooked. The gut metagenomes of type 2 diabetes (T2D) patients have been found to be enriched with butyrate-producing bacteria and sulfate reduction functions. However, it is not known whether the gut metagenomes of T2D patients have characteristic strain patterns or SNP distributions. We downloaded public gut metagenome datasets from 170 T2D patients and 174 healthy controls and performed a systematic comparative analysis of their metagenome SNPs. We found that Bacteroides coprocola, whose relative abundance did not differ between the groups, had a characteristic distribution of SNPs in the T2D patient group. We identified 65 genes, all in B. coprocola, that had remarkably different enrichment of SNPs. The first and sixth ranked genes encode glycosyl hydrolases (GenBank accession EDU99824.1 and EDV02301.1). Interestingly, alpha-glucosidase, which is also a glycosyl hydrolase located in the intestine, is an important drug target of T2D. These results suggest that different strains of B. coprocola may have different roles in human gut and a specific set of B. coprocola strains are correlated with T2D.
Yu, Liang; Wang, Bingbo; Ma, Xiaoke; Gao, Lin
2016-12-23
Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.
Separate enrichment analysis of pathways for up- and downregulated genes.
Hong, Guini; Zhang, Wenjing; Li, Hongdong; Shen, Xiaopei; Guo, Zheng
2014-03-06
Two strategies are often adopted for enrichment analysis of pathways: the analysis of all differentially expressed (DE) genes together or the analysis of up- and downregulated genes separately. However, few studies have examined the rationales of these enrichment analysis strategies. Using both microarray and RNA-seq data, we show that gene pairs with functional links in pathways tended to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. We then show that the imbalance could greatly reduce the statistical power for finding disease-associated pathways through the analysis of all-DE genes. Further, using gene expression profiles from five types of tumours, we illustrate that the separate analysis of up- and downregulated genes could identify more pathways that are really pertinent to phenotypic difference. In conclusion, analysing up- and downregulated genes separately is more powerful than analysing all of the DE genes together.
Synaptic, transcriptional and chromatin genes disrupted in autism.
De Rubeis, Silvia; He, Xin; Goldberg, Arthur P; Poultney, Christopher S; Samocha, Kaitlin; Cicek, A Erucment; Kou, Yan; Liu, Li; Fromer, Menachem; Walker, Susan; Singh, Tarinder; Klei, Lambertus; Kosmicki, Jack; Shih-Chen, Fu; Aleksic, Branko; Biscaldi, Monica; Bolton, Patrick F; Brownfeld, Jessica M; Cai, Jinlu; Campbell, Nicholas G; Carracedo, Angel; Chahrour, Maria H; Chiocchetti, Andreas G; Coon, Hilary; Crawford, Emily L; Curran, Sarah R; Dawson, Geraldine; Duketis, Eftichia; Fernandez, Bridget A; Gallagher, Louise; Geller, Evan; Guter, Stephen J; Hill, R Sean; Ionita-Laza, Juliana; Jimenz Gonzalez, Patricia; Kilpinen, Helena; Klauck, Sabine M; Kolevzon, Alexander; Lee, Irene; Lei, Irene; Lei, Jing; Lehtimäki, Terho; Lin, Chiao-Feng; Ma'ayan, Avi; Marshall, Christian R; McInnes, Alison L; Neale, Benjamin; Owen, Michael J; Ozaki, Noriio; Parellada, Mara; Parr, Jeremy R; Purcell, Shaun; Puura, Kaija; Rajagopalan, Deepthi; Rehnström, Karola; Reichenberg, Abraham; Sabo, Aniko; Sachse, Michael; Sanders, Stephan J; Schafer, Chad; Schulte-Rüther, Martin; Skuse, David; Stevens, Christine; Szatmari, Peter; Tammimies, Kristiina; Valladares, Otto; Voran, Annette; Li-San, Wang; Weiss, Lauren A; Willsey, A Jeremy; Yu, Timothy W; Yuen, Ryan K C; Cook, Edwin H; Freitag, Christine M; Gill, Michael; Hultman, Christina M; Lehner, Thomas; Palotie, Aaarno; Schellenberg, Gerard D; Sklar, Pamela; State, Matthew W; Sutcliffe, James S; Walsh, Christiopher A; Scherer, Stephen W; Zwick, Michael E; Barett, Jeffrey C; Cutler, David J; Roeder, Kathryn; Devlin, Bernie; Daly, Mark J; Buxbaum, Joseph D
2014-11-13
The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.
Wang, Hao; Sun, Xuming; Chou, Jeff; Lin, Marina; Ferrario, Carlos M; Zapata-Sudo, Gisele; Groban, Leanne
2017-02-01
We previously showed that cardiomyocyte-specific G protein-coupled estrogen receptor (GPER) gene deletion leads to sex-specific adverse effects on cardiac structure and function; alterations which may be due to distinct differences in mitochondrial and inflammatory processes between sexes. Here, we provide the results of Gene Set Enrichment Analysis (GSEA) based on the DNA microarray data from GPER-knockout versus GPER-intact (intact) cardiomyocytes. This article contains complete data on the mitochondrial and inflammatory response-related gene expression changes that were significant in GPER knockout versus intact cardiomyocytes from adult male and female mice. The data are supplemental to our original research article "Cardiomyocyte-specific deletion of the G protein-coupled estrogen receptor (GPER) leads to left ventricular dysfunction and adverse remodeling: a sex-specific gene profiling" (Wang et al., 2016) [1]. Data have been deposited to the Gene Expression Omnibus (GEO) database repository with the dataset identifier GSE86843.
Liu, Rong; Guo, Cheng-Xian; Zhou, Hong-Hao
2015-01-01
This study aims to identify effective gene networks and prognostic biomarkers associated with estrogen receptor positive (ER+) breast cancer using human mRNA studies. Weighted gene coexpression network analysis was performed with a complex ER+ breast cancer transcriptome to investigate the function of networks and key genes in the prognosis of breast cancer. We found a significant correlation of an expression module with distant metastasis-free survival (HR = 2.25; 95% CI .21.03-4.88 in discovery set; HR = 1.78; 95% CI = 1.07-2.93 in validation set). This module contained genes enriched in the biological process of the M phase. From this module, we further identified and validated 5 hub genes (CDK1, DLGAP5, MELK, NUSAP1, and RRM2), the expression levels of which were strongly associated with poor survival. Highly expressed MELK indicated poor survival in luminal A and luminal B breast cancer molecular subtypes. This gene was also found to be associated with tamoxifen resistance. Results indicated that a network-based approach may facilitate the discovery of biomarkers for the prognosis of ER+ breast cancer and may also be used as a basis for establishing personalized therapies. Nevertheless, before the application of this approach in clinical settings, in vivo and in vitro experiments and multi-center randomized controlled clinical trials are still needed.
Blood Gene Expression Profiling of Breast Cancer Survivors Experiencing Fibrosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Landmark-Hoyvik, Hege, E-mail: hblandma@rr-research.n; Institute for Clinical Medicine, University of Oslo, Oslo; Dumeaux, Vanessa
2011-03-01
Purpose: To extend knowledge on the mechanisms and pathways involved in maintenance of radiation-induced fibrosis (RIF) by performing gene expression profiling of whole blood from breast cancer (BC) survivors with and without fibrosis 3-7 years after end of radiotherapy treatment. Methods and Materials: Gene expression profiles from blood were obtained for 254 BC survivors derived from a cohort of survivors, treated with adjuvant radiotherapy for breast cancer 3-7 years earlier. Analyses of transcriptional differences in blood gene expression between BC survivors with fibrosis (n = 31) and BC survivors without fibrosis (n = 223) were performed using R version 2.8.0more » and tools from the Bioconductor project. Gene sets extracted through a literature search on fibrosis and breast cancer were subsequently used in gene set enrichment analysis. Results: Substantial differences in blood gene expression between BC survivors with and without fibrosis were observed, and 87 differentially expressed genes were identified through linear analysis. Transforming growth factor-{beta}1 signaling was identified as the most significant gene set, showing a down-regulation of most of the core genes, together with up-regulation of a transcriptional activator of the inhibitor of fibrinolysis, Plasminogen activator inhibitor 1 in the BC survivors with fibrosis. Conclusion: Transforming growth factor-{beta}1 signaling was found down-regulated during the maintenance phase of fibrosis as opposed to the up-regulation reported during the early, initiating phase of fibrosis. Hence, once the fibrotic tissue has developed, the maintenance phase might rather involve a deregulation of fibrinolysis and altered degradation of extracellular matrix components.« less
Gene expression changes in the course of normal brain aging are sexually dimorphic
Berchtold, Nicole C.; Cribbs, David H.; Coleman, Paul D.; Rogers, Joseph; Head, Elizabeth; Kim, Ronald; Beach, Tom; Miller, Carol; Troncoso, Juan; Trojanowski, John Q.; Zielke, H. Ronald; Cotman, Carl W.
2008-01-01
Gene expression profiles were assessed in the hippocampus, entorhinal cortex, superior-frontal gyrus, and postcentral gyrus across the lifespan of 55 cognitively intact individuals aged 20–99 years. Perspectives on global gene changes that are associated with brain aging emerged, revealing two overarching concepts. First, different regions of the forebrain exhibited substantially different gene profile changes with age. For example, comparing equally powered groups, 5,029 probe sets were significantly altered with age in the superior-frontal gyrus, compared with 1,110 in the entorhinal cortex. Prominent change occurred in the sixth to seventh decades across cortical regions, suggesting that this period is a critical transition point in brain aging, particularly in males. Second, clear gender differences in brain aging were evident, suggesting that the brain undergoes sexually dimorphic changes in gene expression not only in development but also in later life. Globally across all brain regions, males showed more gene change than females. Further, Gene Ontology analysis revealed that different categories of genes were predominantly affected in males vs. females. Notably, the male brain was characterized by global decreased catabolic and anabolic capacity with aging, with down-regulated genes heavily enriched in energy production and protein synthesis/transport categories. Increased immune activation was a prominent feature of aging in both sexes, with proportionally greater activation in the female brain. These data open opportunities to explore age-dependent changes in gene expression that set the balance between neurodegeneration and compensatory mechanisms in the brain and suggest that this balance is set differently in males and females, an intriguing idea. PMID:18832152
The genome sequence of taurine cattle: a window to ruminant biology and evolution.
Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi
2009-04-24
To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Exome Sequencing Identifies Potentially Druggable Mutations in Nasopharyngeal Carcinoma.
Chow, Yock Ping; Tan, Lu Ping; Chai, San Jiun; Abdul Aziz, Norazlin; Choo, Siew Woh; Lim, Paul Vey Hong; Pathmanathan, Rajadurai; Mohd Kornain, Noor Kaslina; Lum, Chee Lun; Pua, Kin Choo; Yap, Yoke Yeow; Tan, Tee Yong; Teo, Soo Hwang; Khoo, Alan Soo-Beng; Patel, Vyomesh
2017-03-03
In this study, we first performed whole exome sequencing of DNA from 10 untreated and clinically annotated fresh frozen nasopharyngeal carcinoma (NPC) biopsies and matched bloods to identify somatically mutated genes that may be amenable to targeted therapeutic strategies. We identified a total of 323 mutations which were either non-synonymous (n = 238) or synonymous (n = 85). Furthermore, our analysis revealed genes in key cancer pathways (DNA repair, cell cycle regulation, apoptosis, immune response, lipid signaling) were mutated, of which those in the lipid-signaling pathway were the most enriched. We next extended our analysis on a prioritized sub-set of 37 mutated genes plus top 5 mutated cancer genes listed in COSMIC using a custom designed HaloPlex target enrichment panel with an additional 88 NPC samples. Our analysis identified 160 additional non-synonymous mutations in 37/42 genes in 66/88 samples. Of these, 99/160 mutations within potentially druggable pathways were further selected for validation. Sanger sequencing revealed that 77/99 variants were true positives, giving an accuracy of 78%. Taken together, our study indicated that ~72% (n = 71/98) of NPC samples harbored mutations in one of the four cancer pathways (EGFR-PI3K-Akt-mTOR, NOTCH, NF-κB, DNA repair) which may be potentially useful as predictive biomarkers of response to matched targeted therapies.
Exome Sequencing Identifies Potentially Druggable Mutations in Nasopharyngeal Carcinoma
Chow, Yock Ping; Tan, Lu Ping; Chai, San Jiun; Abdul Aziz, Norazlin; Choo, Siew Woh; Lim, Paul Vey Hong; Pathmanathan, Rajadurai; Mohd Kornain, Noor Kaslina; Lum, Chee Lun; Pua, Kin Choo; Yap, Yoke Yeow; Tan, Tee Yong; Teo, Soo Hwang; Khoo, Alan Soo-Beng; Patel, Vyomesh
2017-01-01
In this study, we first performed whole exome sequencing of DNA from 10 untreated and clinically annotated fresh frozen nasopharyngeal carcinoma (NPC) biopsies and matched bloods to identify somatically mutated genes that may be amenable to targeted therapeutic strategies. We identified a total of 323 mutations which were either non-synonymous (n = 238) or synonymous (n = 85). Furthermore, our analysis revealed genes in key cancer pathways (DNA repair, cell cycle regulation, apoptosis, immune response, lipid signaling) were mutated, of which those in the lipid-signaling pathway were the most enriched. We next extended our analysis on a prioritized sub-set of 37 mutated genes plus top 5 mutated cancer genes listed in COSMIC using a custom designed HaloPlex target enrichment panel with an additional 88 NPC samples. Our analysis identified 160 additional non-synonymous mutations in 37/42 genes in 66/88 samples. Of these, 99/160 mutations within potentially druggable pathways were further selected for validation. Sanger sequencing revealed that 77/99 variants were true positives, giving an accuracy of 78%. Taken together, our study indicated that ~72% (n = 71/98) of NPC samples harbored mutations in one of the four cancer pathways (EGFR-PI3K-Akt-mTOR, NOTCH, NF-κB, DNA repair) which may be potentially useful as predictive biomarkers of response to matched targeted therapies. PMID:28256603
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaushik, Gaurav, E-mail: kausgaur@isu.edu; Department of Medical Pathology and Laboratory Medicine, University of California at Davis, Davis, CA 95817; Institute for Pediatric Regenerative Medicine, Shriners Hospitals for Children, Northern California, 2425 Stockton Boulevard, Sacramento, CA 95817
Psychoactive pharmaceuticals have been found as teratogens at clinical dosage during pregnancy. These pharmaceuticals have also been detected in minute (ppb) concentrations in drinking water in the US, and are environmental contaminants that may be complicit in triggering neurological disorders in genetically susceptible individuals. Previous studies have determined that psychoactive pharmaceuticals (fluoxetine, venlafaxine and carbamazepine) at environmentally relevant concentrations enriched sets of genes regulating development and function of the nervous system in fathead minnows. Altered gene sets were also associated with potential neurological disorders, including autism spectrum disorders (ASD). Subsequent in vitro studies indicated that psychoactive pharmaceuticals altered ASD-associated synaptic proteinmore » expression and gene expression in human neuronal cells. However, it is unknown if environmentally relevant concentrations of these pharmaceuticals are able to cross biological barriers from mother to fetus, thus potentially posing risks to nervous system development. The main objective of this study was to test whether psychoactive pharmaceuticals (fluoxetine, venlafaxine, and carbamazepine) administered through the drinking water at environmental concentrations to pregnant mice could reach the brain of the developing embryo by crossing intestinal and placental barriers. We addressed this question by adding {sup 2}H-isotope labeled pharmaceuticals to the drinking water of female mice for 20 days (10 pre-and 10 post–conception days), and quantifying {sup 2}H-isotope enrichment signals in the dam liver and brain of developing embryos using isotope ratio mass spectrometry. Significant levels of {sup 2}H enrichment was detected in the brain of embryos and livers of carbamazepine-treated mice but not in those of control dams, or for fluoxetine or venlafaxine application. These results provide the first evidence that carbamazepine in drinking water and at typical environmental concentrations is transmitted from mother to embryo. Our results, combined with previous evidence that carbamazepine may be associated with ASD in infants, warrant the closer examination of psychoactive pharmaceuticals in drinking water and their potential association with neurodevelopmental disorders.« less
Tumor-propagating cells and Yap/Taz activity contribute to lung tumor progression and metastasis
Lau, Allison N; Curtis, Stephen J; Fillmore, Christine M; Rowbotham, Samuel P; Mohseni, Morvarid; Wagner, Darcy E; Beede, Alexander M; Montoro, Daniel T; Sinkevicius, Kerstin W; Walton, Zandra E; Barrios, Juliana; Weiss, Daniel J; Camargo, Fernando D; Wong, Kwok-Kin; Kim, Carla F
2014-01-01
Metastasis is the leading cause of morbidity for lung cancer patients. Here we demonstrate that murine tumor propagating cells (TPCs) with the markers Sca1 and CD24 are enriched for metastatic potential in orthotopic transplantation assays. CD24 knockdown decreased the metastatic potential of lung cancer cell lines resembling TPCs. In lung cancer patient data sets, metastatic spread and patient survival could be stratified with a murine lung TPC gene signature. The TPC signature was enriched for genes in the Hippo signaling pathway. Knockdown of the Hippo mediators Yap1 or Taz decreased in vitro cellular migration and transplantation of metastatic disease. Furthermore, constitutively active Yap was sufficient to drive lung tumor progression in vivo. These results demonstrate functional roles for two different pathways, CD24-dependent and Yap/Taz-dependent pathways, in lung tumor propagation and metastasis. This study demonstrates the utility of TPCs for identifying molecules contributing to metastatic lung cancer, potentially enabling the therapeutic targeting of this devastating disease. PMID:24497554
Yue, Zongliang; Zheng, Qi; Neylon, Michael T; Yoo, Minjae; Shin, Jimin; Zhao, Zhiying; Tan, Aik Choon
2018-01-01
Abstract Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug–gene, miRNA–gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/. PMID:29126216
Bian, Zhong-Rui; Yin, Juan; Sun, Wen; Lin, Dian-Jie
2017-04-01
Diagnose of active tuberculosis (TB) is challenging and treatment response is also difficult to efficiently monitor. The aim of this study was to use an integrated analysis of microarray and network-based method to the samples from publically available datasets to obtain a diagnostic module set and pathways in active TB. Towards this goal, background protein-protein interactions (PPI) network was generated based on global PPI information and gene expression data, following by identification of differential expression network (DEN) from the background PPI network. Then, ego genes were extracted according to the degree features in DEN. Next, module collection was conducted by ego gene expansion based on EgoNet algorithm. After that, differential expression of modules between active TB and controls was evaluated using random permutation test. Finally, biological significance of differential modules was detected by pathways enrichment analysis based on Reactome database, and Fisher's exact test was implemented to extract differential pathways for active TB. Totally, 47 ego genes and 47 candidate modules were identified from the DEN. By setting the cutoff-criteria of gene size >5 and classification accuracy ≥0.9, 7 ego modules (Module 4, Module 7, Module 9, Module 19, Module 25, Module 38 and Module 43) were extracted, and all of them had the statistical significance between active TB and controls. Then, Fisher's exact test was conducted to capture differential pathways for active TB. Interestingly, genes in Module 4, Module 25, Module 38, and Module 43 were enriched in the same pathway, formation of a pool of free 40S subunits. Significant pathway for Module 7 and Module 9 was eukaryotic translation termination, and for Module 19 was nonsense mediated decay enhanced by the exon junction complex (EJC). Accordingly, differential modules and pathways might be potential biomarkers for treating active TB, and provide valuable clues for better understanding of molecular mechanism of active TB. Copyright © 2017 Elsevier Ltd. All rights reserved.
Liang, Junjun; Chen, Xin; Deng, Guangbing; Pan, Zhifen; Zhang, Haili; Li, Qiao; Yang, Kaijun; Long, Hai; Yu, Maoqun
2017-10-11
The harsh environment on the Qinghai-Tibetan Plateau gives Tibetan hulless barley (Hordeum vulgare var. nudum) great ability to resist adversities such as drought, salinity, and low temperature, and makes it a good subject for the analysis of drought tolerance mechanism. To elucidate the specific gene networks and pathways that contribute to its drought tolerance, and for identifying new candidate genes for breeding purposes, we performed a transcriptomic analysis using two accessions of Tibetan hulless barley, namely Z772 (drought-tolerant) and Z013 (drought-sensitive). There were more up-regulated genes of Z772 than Z013 under both mild (5439-VS-2604) and severe (7203-VS-3359) dehydration treatments. Under mild dehydration stress, the pathways exclusively enriched in drought-tolerance genotype Z772 included Protein processing in endoplasmic reticulum, tricarboxylic acid (TCA) cycle, Wax biosynthesis, and Spliceosome. Under severe dehydration stress, the pathways that were mainly enriched in Z772 included Carbon fixation in photosynthetic organisms, Pyruvate metabolism, Porphyrin and chlorophyll metabolism. The main differentially expressed genes (DEGs) in response to dehydration stress and genes whose expression was different between tolerant and sensitive genotypes were presented in this study, respectively. The candidate genes for drought tolerance were selected based on their expression patterns. The RNA-Seq data obtained in this study provided an initial overview on global gene expression patterns and networks that related to dehydration shock in Tibetan hulless barley. Furthermore, these data provided pathways and a targeted set of candidate genes that might be essential for deep analyzing the molecular mechanisms of plant tolerance to drought stress.
Hill, W D; Davies, G; van de Lagemaat, L N; Christoforou, A; Marioni, R E; Fernandes, C P D; Liewald, D C; Croning, M D R; Payton, A; Craig, L C A; Whalley, L J; Horan, M; Ollier, W; Hansell, N K; Wright, M J; Martin, N G; Montgomery, G W; Steen, V M; Le Hellard, S; Espeseth, T; Lundervold, A J; Reinvang, I; Starr, J M; Pendleton, N; Grant, S G N; Bates, T C; Deary, I J
2014-01-01
Differences in general cognitive ability (intelligence) account for approximately half of the variation in any large battery of cognitive tests and are predictive of important life events including health. Genome-wide analyses of common single-nucleotide polymorphisms indicate that they jointly tag between a quarter and a half of the variance in intelligence. However, no single polymorphism has been reliably associated with variation in intelligence. It remains possible that these many small effects might be aggregated in networks of functionally linked genes. Here, we tested a network of 1461 genes in the postsynaptic density and associated complexes for an enriched association with intelligence. These were ascertained in 3511 individuals (the Cognitive Ageing Genetics in England and Scotland (CAGES) consortium) phenotyped for general cognitive ability, fluid cognitive ability, crystallised cognitive ability, memory and speed of processing. By analysing the results of a genome wide association study (GWAS) using Gene Set Enrichment Analysis, a significant enrichment was found for fluid cognitive ability for the proteins found in the complexes of N-methyl-D-aspartate receptor complex; P=0.002. Replication was sought in two additional cohorts (N=670 and 2062). A meta-analytic P-value of 0.003 was found when these were combined with the CAGES consortium. The results suggest that genetic variation in the macromolecular machines formed by membrane-associated guanylate kinase (MAGUK) scaffold proteins and their interaction partners contributes to variation in intelligence. PMID:24399044
Peng, Ruoqi; Sridhar, Sriram; Tyagi, Gaurav; Phillips, Jonathan E; Garrido, Rosario; Harris, Paul; Burns, Lisa; Renteria, Lorena; Woods, John; Chen, Leena; Allard, John; Ravindran, Palanikumar; Bitter, Hans; Liang, Zhenmin; Hogaboam, Cory M; Kitson, Chris; Budd, David C; Fine, Jay S; Bauer, Carla M T; Stevenson, Christopher S
2013-01-01
The preclinical model of bleomycin-induced lung fibrosis, used to investigate mechanisms related to idiopathic pulmonary fibrosis (IPF), has incorrectly predicted efficacy for several candidate compounds suggesting that it may be of limited value. As an attempt to improve the predictive nature of this model, integrative bioinformatic approaches were used to compare molecular alterations in the lungs of bleomycin-treated mice and patients with IPF. Using gene set enrichment analysis we show for the first time that genes differentially expressed during the fibrotic phase of the single challenge bleomycin model were significantly enriched in the expression profiles of IPF patients. The genes that contributed most to the enrichment were largely involved in mitosis, growth factor, and matrix signaling. Interestingly, these same mitotic processes were increased in the expression profiles of fibroblasts isolated from rapidly progressing, but not slowly progressing, IPF patients relative to control subjects. The data also indicated that TGFβ was not the sole mediator responsible for the changes observed in this model since the ALK-5 inhibitor SB525334 effectively attenuated some but not all of the fibrosis associated with this model. Although some would suggest that repetitive bleomycin injuries may more effectively model IPF-like changes, our data do not support this conclusion. Together, these data highlight that a single bleomycin instillation effectively replicates several of the specific pathogenic molecular changes associated with IPF, and may be best used as a model for patients with active disease.
Identification of transcriptional factors and key genes in primary osteoporosis by DNA microarray.
Xie, Wengui; Ji, Lixin; Zhao, Teng; Gao, Pengfei
2015-05-09
A number of genes have been identified to be related with primary osteoporosis while less is known about the comprehensive interactions between regulating genes and proteins. We aimed to identify the differentially expressed genes (DEGs) and regulatory effects of transcription factors (TFs) involved in primary osteoporosis. The gene expression profile GSE35958 was obtained from Gene Expression Omnibus database, including 5 primary osteoporosis and 4 normal bone tissues. The differentially expressed genes between primary osteoporosis and normal bone tissues were identified by the same package in R language. The TFs of these DEGs were predicted with the Essaghir A method. DAVID (The Database for Annotation, Visualization and Integrated Discovery) was applied to perform the GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis of DEGs. After analyzing regulatory effects, a regulatory network was built between TFs and the related DEGs. A total of 579 DEGs was screened, including 310 up-regulated genes and 269 down-regulated genes in primary osteoporosis samples. In GO terms, more up-regulated genes were enriched in transcription regulator activity, and secondly in transcription factor activity. A total 10 significant pathways were enriched in KEGG analysis, including colorectal cancer, Wnt signaling pathway, Focal adhesion, and MAPK signaling pathway. Moreover, total 7 TFs were enriched, of which CTNNB1, SP1, and TP53 regulated most up-regulated DEGs. The discovery of the enriched TFs might contribute to the understanding of the mechanism of primary osteoporosis. Further research on genes and TFs related to the WNT signaling pathway and MAPK pathway is urgent for clinical diagnosis and directing treatment of primary osteoporosis.
Martinovic-Weigelt, Dalma; Mehinto, Alvine C.; Ankley, Gerald T.; Denslow, Nancy D.; Barber, Larry B.; Lee, Kathy E.; King, Ryan J.; Schoenfuss, Heiko L.; Schroeder, Anthony L.; Villeneuve, Daniel L.
2014-01-01
The present study investigated whether a combination of targeted analytical chemistry information with unsupervised, data-rich biological methodology (i.e., transcriptomics) could be utilized to evaluate relative contributions of wastewater treatment plant (WWTP) effluents to biological effects. The effects of WWTP effluents on fish exposed to ambient, receiving waters were studied at three locations with distinct WWTP and watershed characteristics. At each location, 4 d exposures of male fathead minnows to the WWTP effluent and upstream and downstream ambient waters were conducted. Transcriptomic analyses were performed on livers using 15 000 feature microarrays, followed by a canonical pathway and gene set enrichment analyses. Enrichment of gene sets indicative of teleost brain–pituitary–gonadal–hepatic (BPGH) axis function indicated that WWTPs serve as an important source of endocrine active chemicals (EACs) that affect the BPGH axis (e.g., cholesterol and steroid metabolism were altered). The results indicated that transcriptomics may even pinpoint pertinent adverse outcomes (i.e., liver vacuolization) and groups of chemicals that preselected chemical analytes may miss. Transcriptomic Effects-Based monitoring was capable of distinguishing sites, and it reflected chemical pollution gradients, thus holding promise for assessment of relative contributions of point sources to pollution and the efficacy of pollution remediation.
Microarray data mining using Bioconductor packages.
Nie, Haisheng; Neerincx, Pieter B T; van der Poel, Jan; Ferrari, Francesco; Bicciato, Silvio; Leunissen, Jack A M; Groenen, Martien A M
2009-07-16
This paper describes the results of a Gene Ontology (GO) term enrichment analysis of chicken microarray data using the Bioconductor packages. By checking the enriched GO terms in three contrasts, MM8-PM8, MM8-MA8, and MM8-MM24, of the provided microarray data during this workshop, this analysis aimed to investigate the host reactions in chickens occurring shortly after a secondary challenge with either a homologous or heterologous species of Eimeria. The results of GO enrichment analysis using GO terms annotated to chicken genes and GO terms annotated to chicken-human orthologous genes were also compared. Furthermore, a locally adaptive statistical procedure (LAP) was performed to test differentially expressed chromosomal regions, rather than individual genes, in the chicken genome after Eimeria challenge. GO enrichment analysis identified significant (raw p-value < 0.05) GO terms for all three contrasts included in the analysis. Some of the GO terms linked to, generally, primary immune responses or secondary immune responses indicating the GO enrichment analysis is a useful approach to analyze microarray data. The comparisons of GO enrichment results using chicken gene information and chicken-human orthologous gene information showed more refined GO terms related to immune responses when using chicken-human orthologous gene information, this suggests that using chicken-human orthologous gene information has higher power to detect significant GO terms with more refined functionality. Furthermore, three chromosome regions were identified to be significantly up-regulated in contrast MM8-PM8 (q-value < 0.01). Overall, this paper describes a practical approach to analyze microarray data in farm animals where the genome information is still incomplete. For farm animals, such as chicken, with currently limited gene annotation, borrowing gene annotation information from orthologous genes in well-annotated species, such as human, will help improve the pathway analysis results substantially. Furthermore, LAP analysis approach is a relatively new and very useful way to be applied in microarray analysis.
Martin, Elizabeth M.; Fry, Rebecca C.
2016-01-01
Abstract A biological mechanism by which exposure to environmental contaminants results in gene-specific CpG methylation patterning is currently unknown. We hypothesize that gene-specific CpG methylation is related to environmentally perturbed transcription factor occupancy. To test this hypothesis, a database of 396 genes with altered CpG methylation either in cord blood leukocytes or placental tissue was compiled from 14 studies representing assessments of six environmental contaminants. Subsequently, an in silico approach was used to identify transcription factor binding sites enriched among the genes with altered CpG methylation in relationship to the suite of environmental contaminants. For each study, the sequences of the promoter regions (representing −1000 to +500 bp from the transcription start site) of all genes with altered CpG methylation were analyzed for enrichment of transcription factor binding sites. Binding sites for a total of 56 unique transcription factors were identified to be enriched within the promoter regions of the genes. Binding sites for the Kidney-Enriched Krupple-like Factor 15, a known responder to endogenous stress, were enriched ( P < 0.001–0.041) among the genes with altered CpG methylation associated for five of the six environmental contaminants. These data support the transcription factor occupancy theory as a potential mechanism underlying environmentally-induced gene-specific CpG methylation. PMID:27066266
González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen
2016-09-15
Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.
Schwer, Bjoern; Wei, Pei-Chi; Chang, Amelia N; Kao, Jennifer; Du, Zhou; Meyers, Robin M; Alt, Frederick W
2016-02-23
High-throughput, genome-wide translocation sequencing (HTGTS) studies of activated B cells have revealed that DNA double-strand breaks (DSBs) capable of translocating to defined bait DSBs are enriched around the transcription start sites (TSSs) of active genes. We used the HTGTS approach to investigate whether a similar phenomenon occurs in primary neural stem/progenitor cells (NSPCs). We report that breakpoint junctions indeed are enriched around TSSs that were determined to be active by global run-on sequencing analyses of NSPCs. Comparative analyses of transcription profiles in NSPCs and B cells revealed that the great majority of TSS-proximal junctions occurred in genes commonly expressed in both cell types, possibly because this common set has higher transcription levels on average than genes transcribed in only one or the other cell type. In the latter context, among all actively transcribed genes containing translocation junctions in NSPCs, those with junctions located within 2 kb of the TSS show a significantly higher transcription rate on average than genes with junctions in the gene body located at distances greater than 2 kb from the TSS. Finally, analysis of repair junction signatures of TSS-associated translocations in wild-type versus classical nonhomologous end-joining (C-NHEJ)-deficient NSPCs reveals that both C-NHEJ and alternative end-joining pathways can generate translocations by joining TSS-proximal DSBs to DSBs on other chromosomes. Our studies show that the generation of transcription-associated DSBs is conserved across divergent cell types.
Abby, Sophie S.; Melcher, Michael; Kerou, Melina; Krupovic, Mart; Stieglmeier, Michaela; Rossel, Claudia; Pfeifer, Kevin; Schleper, Christa
2018-01-01
Ammonia oxidizing archaea (AOA) of the phylum Thaumarchaeota are widespread in moderate environments but their occurrence and activity has also been demonstrated in hot springs. Here we present the first enrichment of a thermophilic representative with a sequenced genome, which facilitates the search for adaptive strategies and for traits that shape the evolution of Thaumarchaeota. Candidatus Nitrosocaldus cavascurensis has been enriched from a hot spring in Ischia, Italy. It grows optimally at 68°C under chemolithoautotrophic conditions on ammonia or urea converting ammonia stoichiometrically into nitrite with a generation time of approximately 23 h. Phylogenetic analyses based on ribosomal proteins place the organism as a sister group to all known mesophilic AOA. The 1.58 Mb genome of Ca. N. cavascurensis harbors an amoAXCB gene cluster encoding ammonia monooxygenase and genes for a 3-hydroxypropionate/4-hydroxybutyrate pathway for autotrophic carbon fixation, but also genes that indicate potential alternative energy metabolisms. Although a bona fide gene for nitrite reductase is missing, the organism is sensitive to NO-scavenging, underlining the potential importance of this compound for AOA metabolism. Ca. N. cavascurensis is distinct from all other AOA in its gene repertoire for replication, cell division and repair. Its genome has an impressive array of mobile genetic elements and other recently acquired gene sets, including conjugative systems, a provirus, transposons and cell appendages. Some of these elements indicate recent exchange with the environment, whereas others seem to have been domesticated and might convey crucial metabolic traits. PMID:29434576
Grandy, Rodrigo A; Whitfield, Troy W; Wu, Hai; Fitzgerald, Mark P; VanOudenhove, Jennifer J; Zaidi, Sayyed K; Montecino, Martin A; Lian, Jane B; van Wijnen, André J; Stein, Janet L; Stein, Gary S
2016-02-15
Stem cell phenotypes are reflected by posttranslational histone modifications, and this chromatin-related memory must be mitotically inherited to maintain cell identity through proliferative expansion. In human embryonic stem cells (hESCs), bivalent genes with both activating (H3K4me3) and repressive (H3K27me3) histone modifications are essential to sustain pluripotency. Yet, the molecular mechanisms by which this epigenetic landscape is transferred to progeny cells remain to be established. By mapping genomic enrichment of H3K4me3/H3K27me3 in pure populations of hESCs in G2, mitotic, and G1 phases of the cell cycle, we found striking variations in the levels of H3K4me3 through the G2-M-G1 transition. Analysis of a representative set of bivalent genes revealed that chromatin modifiers involved in H3K4 methylation/demethylation are recruited to bivalent gene promoters in a cell cycle-dependent fashion. Interestingly, bivalent genes enriched with H3K4me3 exclusively during mitosis undergo the strongest upregulation after induction of differentiation. Furthermore, the histone modification signature of genes that remain bivalent in differentiated cells resolves into a cell cycle-independent pattern after lineage commitment. These results establish a new dimension of chromatin regulation important in the maintenance of pluripotency. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Grandy, Rodrigo A.; Whitfield, Troy W.; Wu, Hai; Fitzgerald, Mark P.; VanOudenhove, Jennifer J.; Zaidi, Sayyed K.; Montecino, Martin A.; Lian, Jane B.; van Wijnen, André J.; Stein, Janet L.
2015-01-01
Stem cell phenotypes are reflected by posttranslational histone modifications, and this chromatin-related memory must be mitotically inherited to maintain cell identity through proliferative expansion. In human embryonic stem cells (hESCs), bivalent genes with both activating (H3K4me3) and repressive (H3K27me3) histone modifications are essential to sustain pluripotency. Yet, the molecular mechanisms by which this epigenetic landscape is transferred to progeny cells remain to be established. By mapping genomic enrichment of H3K4me3/H3K27me3 in pure populations of hESCs in G2, mitotic, and G1 phases of the cell cycle, we found striking variations in the levels of H3K4me3 through the G2-M-G1 transition. Analysis of a representative set of bivalent genes revealed that chromatin modifiers involved in H3K4 methylation/demethylation are recruited to bivalent gene promoters in a cell cycle-dependent fashion. Interestingly, bivalent genes enriched with H3K4me3 exclusively during mitosis undergo the strongest upregulation after induction of differentiation. Furthermore, the histone modification signature of genes that remain bivalent in differentiated cells resolves into a cell cycle-independent pattern after lineage commitment. These results establish a new dimension of chromatin regulation important in the maintenance of pluripotency. PMID:26644406
Wang, W; Huang, S; Hou, W; Liu, Y; Fan, Q; He, A; Wen, Y; Hao, J; Guo, X; Zhang, F
2017-10-01
Several genome-wide association studies (GWAS) of bone mineral density (BMD) have successfully identified multiple susceptibility genes, yet isolated susceptibility genes are often difficult to interpret biologically. The aim of this study was to unravel the genetic background of BMD at pathway level, by integrating BMD GWAS data with genome-wide expression quantitative trait loci (eQTLs) and methylation quantitative trait loci (meQTLs) data METHOD: We employed the GWAS datasets of BMD from the Genetic Factors for Osteoporosis Consortium (GEFOS), analysing patients' BMD. The areas studied included 32 735 femoral necks, 28 498 lumbar spines, and 8143 forearms. Genome-wide eQTLs (containing 923 021 eQTLs) and meQTLs (containing 683 152 unique methylation sites with local meQTLs) data sets were collected from recently published studies. Gene scores were first calculated by summary data-based Mendelian randomisation (SMR) software and meQTL-aligned GWAS results. Gene set enrichment analysis (GSEA) was then applied to identify BMD-associated gene sets with a predefined significance level of 0.05. We identified multiple gene sets associated with BMD in one or more regions, including relevant known biological gene sets such as the Reactome Circadian Clock (GSEA p-value = 1.0 × 10 -4 for LS and 2.7 × 10 -2 for femoral necks BMD in eQTLs-based GSEA) and insulin-like growth factor receptor binding (GSEA p-value = 5.0 × 10 -4 for femoral necks and 2.6 × 10 -2 for lumbar spines BMD in meQTLs-based GSEA). Our results provided novel clues for subsequent functional analysis of bone metabolism, and illustrated the benefit of integrating eQTLs and meQTLs data into pathway association analysis for genetic studies of complex human diseases. Cite this article : W. Wang, S. Huang, W. Hou, Y. Liu, Q. Fan, A. He, Y. Wen, J. Hao, X. Guo, F. Zhang. Integrative analysis of GWAS, eQTLs and meQTLs data suggests that multiple gene sets are associated with bone mineral density. Bone Joint Res 2017;6:572-576. © 2017 Wang et al.
Chowdhury, Nilotpal; Sapru, Shantanu
2015-01-01
Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research.
Chowdhury, Nilotpal; Sapru, Shantanu
2015-01-01
Introduction Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. Aim The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Methods Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate – adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Results Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. Conclusion To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research. PMID:26080057
Aslam, Muhammad Aamir; Schokker, Dirkjan; Groothuis, Ton G G; de Wit, Agnes A C; Smits, Mari A; Woelders, Henri
2015-06-01
Female birds have been shown to manipulate offspring sex ratio. However, mechanisms of sex ratio bias are not well understood. Reduced feed availability and change in body condition can affect the mass of eggs in birds that could lead to a skew in sex ratio. We employed feed restriction in laying chickens (Gallus gallus) to induce a decrease in body condition and egg mass using 45 chicken hens in treatment and control groups. Feed restriction led to an overall decline of egg mass. In the second period of treatment (Days 9-18) with more severe feed restriction and a steeper decline of egg mass, the sex ratio per hen (proportion of male eggs) had a significant negative association with mean egg mass per hen. Based on this association, two groups of hens were selected from feed restriction group, that is, hens producing male bias with low egg mass and hens producing female bias with high egg mass with overall sex ratios of 0.71 and 0.44 respectively. Genomewide transcriptome analysis on the germinal disks of F1 preovulatory follicles collected at the time of occurrence of meiosis-I was performed. We did not find significantly differentially expressed genes in these two groups of hens. However, gene set enrichment analysis showed that a number of cellular processes related to cell cycle progression, mitotic/meiotic apparatus, and chromosomal movement were enriched in female-biased hens or high mean egg mass as compared with male-biased hens or low mean egg mass. The differentially expressed gene sets may be involved in meiotic drive regulating sex ratio in the chicken. © 2015 by the Society for the Study of Reproduction, Inc.
Poussin, Carinne; Ibberson, Mark; Hall, Diana; Ding, Jun; Soto, Jamie; Abel, E Dale; Thorens, Bernard
2011-09-01
To identify metabolic pathways that may underlie susceptibility or resistance to high-fat diet-induced hepatic steatosis. We performed comparative transcriptomic analysis of the livers of A/J and C57Bl/6 mice, which are, respectively, resistant and susceptible to high-fat diet-induced hepatosteatosis and obesity. Mice from both strains were fed a normal chow or a high-fat diet for 2, 10, and 30 days, and transcriptomic data were analyzed by time-dependent gene set enrichment analysis. Biochemical analysis of mitochondrial respiration was performed to confirm the transcriptomic analysis. Time-dependent gene set enrichment analysis revealed a rapid, transient, and coordinate upregulation of 13 oxidative phosphorylation genes after initiation of high-fat diet feeding in the A/J, but not in the C57Bl/6, mouse livers. Biochemical analysis using liver mitochondria from both strains of mice confirmed a rapid increase by high-fat diet feeding of the respiration rate in A/J but not C57Bl/6 mice. Importantly, ATP production was the same in both types of mitochondria, indicating increased uncoupling of the A/J mitochondria. Together with previous data showing increased expression of mitochondrial β-oxidation genes in C57Bl/6 but not A/J mouse livers, our present study suggests that an important aspect of the adaptation of livers to high-fat diet feeding is to increase the activity of the oxidative phosphorylation chain and its uncoupling to dissipate the excess of incoming metabolic energy and to reduce the production of reactive oxygen species. The flexibility in oxidative phosphorylation activity may thus participate in the protection of A/J mouse livers against the initial damages induced by high-fat diet feeding that may lead to hepatosteatosis.
Wisecaver, Jennifer H; Borowsky, Alexander T; Tzin, Vered; Jander, Georg; Kliebenstein, Daniel J; Rokas, Antonis
2017-05-01
Plants produce diverse specialized metabolites (SMs), but the genes responsible for their production and regulation remain largely unknown, hindering efforts to tap plant pharmacopeia. Given that genes comprising SM pathways exhibit environmentally dependent coregulation, we hypothesized that genes within a SM pathway would form tight associations (modules) with each other in coexpression networks, facilitating their identification. To evaluate this hypothesis, we used 10 global coexpression data sets, each a meta-analysis of hundreds to thousands of experiments, across eight plant species to identify hundreds of coexpressed gene modules per data set. In support of our hypothesis, 15.3 to 52.6% of modules contained two or more known SM biosynthetic genes, and module genes were enriched in SM functions. Moreover, modules recovered many experimentally validated SM pathways, including all six known to form biosynthetic gene clusters (BGCs). In contrast, bioinformatically predicted BGCs (i.e., those lacking an associated metabolite) were no more coexpressed than the null distribution for neighboring genes. These results suggest that most predicted plant BGCs are not genuine SM pathways and argue that BGCs are not a hallmark of plant specialized metabolism. We submit that global gene coexpression is a rich, largely untapped resource for discovering the genetic basis and architecture of plant natural products. © 2017 American Society of Plant Biologists. All rights reserved.
Wang, James K. T.; Langfelder, Peter; Horvath, Steve; Palazzolo, Michael J.
2017-01-01
Huntington's disease (HD) is a progressive and autosomal dominant neurodegeneration caused by CAG expansion in the huntingtin gene (HTT), but the pathophysiological mechanism of mutant HTT (mHTT) remains unclear. To study HD using systems biological methodologies on all published data, we undertook the first comprehensive curation of two key PubMed HD datasets: perturbation genes that impact mHTT-driven endpoints and therefore are putatively linked causally to pathogenic mechanisms, and the protein interactome of HTT that reflects its biology. We perused PubMed articles containing co-citation of gene IDs and MeSH terms of interest to generate mechanistic gene sets for iterative enrichment analyses and rank ordering. The HD Perturbation database of 1,218 genes highly overlaps the HTT Interactome of 1,619 genes, suggesting links between normal HTT biology and mHTT pathology. These two HD datasets are enriched for protein networks of key genes underlying two mechanisms not previously implicated in HD nor in each other: exosome synaptic functions and homeostatic synaptic plasticity. Moreover, proteins, possibly including HTT, and miRNA detected in exosomes from a wide variety of sources also highly overlap the HD datasets, suggesting both mechanistic and biomarker links. Finally, the HTT Interactome highly intersects protein networks of pathogenic genes underlying Parkinson's, Alzheimer's and eight non-HD polyglutamine diseases, ALS, and spinal muscular atrophy. These protein networks in turn highly overlap the exosome and homeostatic synaptic plasticity gene sets. Thus, we hypothesize that HTT and other neurodegeneration pathogenic genes form a large interlocking protein network involved in exosome and homeostatic synaptic functions, particularly where the two mechanisms intersect. Mutant pathogenic proteins cause dysfunctions at distinct points in this network, each altering the two mechanisms in specific fashion that contributes to distinct disease pathologies, depending on the gene mutation and the cellular and biological context. This protein network is rich with drug targets, and exosomes may provide disease biomarkers, thus enabling drug discovery. All the curated datasets are made available for other investigators. Elucidating the roles of pathogenic neurodegeneration genes in exosome and homeostatic synaptic functions may provide a unifying framework for the age-dependent, progressive and tissue selective nature of multiple neurodegenerative diseases. PMID:28611571
Wang, James K T; Langfelder, Peter; Horvath, Steve; Palazzolo, Michael J
2017-01-01
Huntington's disease (HD) is a progressive and autosomal dominant neurodegeneration caused by CAG expansion in the huntingtin gene ( HTT ), but the pathophysiological mechanism of mutant HTT (mHTT) remains unclear. To study HD using systems biological methodologies on all published data, we undertook the first comprehensive curation of two key PubMed HD datasets: perturbation genes that impact mHTT-driven endpoints and therefore are putatively linked causally to pathogenic mechanisms, and the protein interactome of HTT that reflects its biology. We perused PubMed articles containing co-citation of gene IDs and MeSH terms of interest to generate mechanistic gene sets for iterative enrichment analyses and rank ordering. The HD Perturbation database of 1,218 genes highly overlaps the HTT Interactome of 1,619 genes, suggesting links between normal HTT biology and mHTT pathology. These two HD datasets are enriched for protein networks of key genes underlying two mechanisms not previously implicated in HD nor in each other: exosome synaptic functions and homeostatic synaptic plasticity. Moreover, proteins, possibly including HTT, and miRNA detected in exosomes from a wide variety of sources also highly overlap the HD datasets, suggesting both mechanistic and biomarker links. Finally, the HTT Interactome highly intersects protein networks of pathogenic genes underlying Parkinson's, Alzheimer's and eight non-HD polyglutamine diseases, ALS, and spinal muscular atrophy. These protein networks in turn highly overlap the exosome and homeostatic synaptic plasticity gene sets. Thus, we hypothesize that HTT and other neurodegeneration pathogenic genes form a large interlocking protein network involved in exosome and homeostatic synaptic functions, particularly where the two mechanisms intersect. Mutant pathogenic proteins cause dysfunctions at distinct points in this network, each altering the two mechanisms in specific fashion that contributes to distinct disease pathologies, depending on the gene mutation and the cellular and biological context. This protein network is rich with drug targets, and exosomes may provide disease biomarkers, thus enabling drug discovery. All the curated datasets are made available for other investigators. Elucidating the roles of pathogenic neurodegeneration genes in exosome and homeostatic synaptic functions may provide a unifying framework for the age-dependent, progressive and tissue selective nature of multiple neurodegenerative diseases.
Dominguez, Daniel; Tsai, Yi-Hsuan; Gomez, Nicholas; Jha, Deepak Kumar; Davis, Ian; Wang, Zefeng
2016-01-01
Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes. PMID:27364684
A human functional protein interaction network and its application to cancer data analysis
2010-01-01
Background One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system. Results We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers. Conclusions We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases. PMID:20482850
Zhang, Jing; Carnduff, Lisa; Norman, Grant; Josey, Tyson; Wang, Yushan; Sawyer, Thomas W; Martyniuk, Christopher J; Langlois, Valerie S
2014-01-01
With wide adoption of explosive-dependent weaponry during military activities, Blast-induced neurotrauma (BINT)-induced traumatic brain injury (TBI) has become a significant medical issue. Therefore, a robust and accessible biomarker system is in demand for effective and efficient TBI diagnosis. Such systems will also be beneficial to studies of TBI pathology. Here we propose the mammalian hair follicles as a potential candidate. An Advanced Blast Simulator (ABS) was developed to generate shock waves simulating traumatic conditions on brains of rat model. Microarray analysis was performed in hair follicles to identify the gene expression profiles that are associated with shock waves. Gene set enrichment analysis (GSEA) and sub-network enrichment analysis (SNEA) were used to identify cell processes and molecular signaling cascades affected by simulated bomb blasts. Enrichment analyses indicated that genes with altered expression levels were involved in central nervous system (CNS)/peripheral nervous system (PNS) responses as well as signal transduction including Ca2+, K+-transportation-dependent signaling, Toll-Like Receptor (TLR) signaling and Mitogen Activated Protein Kinase (MAPK) signaling cascades. Many of the pathways identified as affected by shock waves in the hair follicles have been previously reported to be TBI responsive in other organs such as brain and blood. The results suggest that the hair follicle has some common TBI responsive molecular signatures to other tissues. Moreover, various TBI-associated diseases were identified as preferentially affected using a gene network approach, indicating that the hair follicle may be capable of reflecting comprehensive responses to TBI conditions. Accordingly, the present study demonstrates that the hair follicle is a potentially viable system for rapid and non-invasive TBI diagnosis.
Zhang, Jing; Carnduff, Lisa; Norman, Grant; Josey, Tyson; Wang, Yushan; Sawyer, Thomas W.; Martyniuk, Christopher J.; Langlois, Valerie S.
2014-01-01
With wide adoption of explosive-dependent weaponry during military activities, Blast-induced neurotrauma (BINT)-induced traumatic brain injury (TBI) has become a significant medical issue. Therefore, a robust and accessible biomarker system is in demand for effective and efficient TBI diagnosis. Such systems will also be beneficial to studies of TBI pathology. Here we propose the mammalian hair follicles as a potential candidate. An Advanced Blast Simulator (ABS) was developed to generate shock waves simulating traumatic conditions on brains of rat model. Microarray analysis was performed in hair follicles to identify the gene expression profiles that are associated with shock waves. Gene set enrichment analysis (GSEA) and sub-network enrichment analysis (SNEA) were used to identify cell processes and molecular signaling cascades affected by simulated bomb blasts. Enrichment analyses indicated that genes with altered expression levels were involved in central nervous system (CNS)/peripheral nervous system (PNS) responses as well as signal transduction including Ca2+, K+-transportation-dependent signaling, Toll-Like Receptor (TLR) signaling and Mitogen Activated Protein Kinase (MAPK) signaling cascades. Many of the pathways identified as affected by shock waves in the hair follicles have been previously reported to be TBI responsive in other organs such as brain and blood. The results suggest that the hair follicle has some common TBI responsive molecular signatures to other tissues. Moreover, various TBI-associated diseases were identified as preferentially affected using a gene network approach, indicating that the hair follicle may be capable of reflecting comprehensive responses to TBI conditions. Accordingly, the present study demonstrates that the hair follicle is a potentially viable system for rapid and non-invasive TBI diagnosis. PMID:25136963
Jiang, Zhenhong; He, Fei; Zhang, Ziding
2017-07-01
Through large-scale transcriptional data analyses, we highlighted the importance of plant metabolism in plant immunity and identified 26 metabolic pathways that were frequently influenced by the infection of 14 different pathogens. Reprogramming of plant metabolism is a common phenomenon in plant defense responses. Currently, a large number of transcriptional profiles of infected tissues in Arabidopsis (Arabidopsis thaliana) have been deposited in public databases, which provides a great opportunity to understand the expression patterns of metabolic pathways during plant defense responses at the systems level. Here, we performed a large-scale transcriptome analysis based on 135 previously published expression samples, including 14 different pathogens, to explore the expression pattern of Arabidopsis metabolic pathways. Overall, metabolic genes are significantly changed in expression during plant defense responses. Upregulated metabolic genes are enriched on defense responses, and downregulated genes are enriched on photosynthesis, fatty acid and lipid metabolic processes. Gene set enrichment analysis (GSEA) identifies 26 frequently differentially expressed metabolic pathways (FreDE_Paths) that are differentially expressed in more than 60% of infected samples. These pathways are involved in the generation of energy, fatty acid and lipid metabolism as well as secondary metabolite biosynthesis. Clustering analysis based on the expression levels of these 26 metabolic pathways clearly distinguishes infected and control samples, further suggesting the importance of these metabolic pathways in plant defense responses. By comparing with FreDE_Paths from abiotic stresses, we find that the expression patterns of 26 FreDE_Paths from biotic stresses are more consistent across different infected samples. By investigating the expression correlation between transcriptional factors (TFs) and FreDE_Paths, we identify several notable relationships. Collectively, the current study will deepen our understanding of plant metabolism in plant immunity and provide new insights into disease-resistant crop improvement.
Osterndorff-Kahanek, Elizabeth A.; Becker, Howard C.; Lopez, Marcelo F.; Farris, Sean P.; Tiwari, Gayatri R.; Nunez, Yury O.; Harris, R. Adron; Mayfield, R. Dayne
2015-01-01
Repeated ethanol exposure and withdrawal in mice increases voluntary drinking and represents an animal model of physical dependence. We examined time- and brain region-dependent changes in gene coexpression networks in amygdala (AMY), nucleus accumbens (NAC), prefrontal cortex (PFC), and liver after four weekly cycles of chronic intermittent ethanol (CIE) vapor exposure in C57BL/6J mice. Microarrays were used to compare gene expression profiles at 0-, 8-, and 120-hours following the last ethanol exposure. Each brain region exhibited a large number of differentially expressed genes (2,000-3,000) at the 0- and 8-hour time points, but fewer changes were detected at the 120-hour time point (400-600). Within each region, there was little gene overlap across time (~20%). All brain regions were significantly enriched with differentially expressed immune-related genes at the 8-hour time point. Weighted gene correlation network analysis identified modules that were highly enriched with differentially expressed genes at the 0- and 8-hour time points with virtually no enrichment at 120 hours. Modules enriched for both ethanol-responsive and cell-specific genes were identified in each brain region. These results indicate that chronic alcohol exposure causes global ‘rewiring‘ of coexpression systems involving glial and immune signaling as well as neuronal genes. PMID:25803291
DOE Office of Scientific and Technical Information (OSTI.GOV)
Emms, David M.; Covshoff, Sarah; Hibberd, Julian M.
C4 photosynthesis is considered one of the most remarkable examples of evolutionary convergence in eukaryotes. However, it is unknown whether the evolution of C4 photosynthesis required the evolution of new genes. Genome-wide gene-tree species-tree reconciliation of seven monocot species that span two origins of C4 photosynthesis revealed that there was significant parallelism in the duplication and retention of genes coincident with the evolution of C4 photosynthesis in these lineages. Specifically, 21 orthologous genes were duplicated and retained independently in parallel at both C4 origins. Analysis of this gene cohort revealed that the set of parallel duplicated and retained genes ismore » enriched for genes that are preferentially expressed in bundle sheath cells, the cell type in which photosynthesis was activated during C4 evolution. Moreover, functional analysis of the cohort of parallel duplicated genes identified SWEET-13 as a potential key transporter in the evolution of C4 photosynthesis in grasses, and provides new insight into the mechanism of phloem loading in these C4 species.« less
Gaponova, Anna V.; Deneka, Alexander Y.; Beck, Tim N.; Liu, Hanqing; Andrianov, Gregory; Nikonova, Anna S.; Nicolas, Emmanuelle; Einarson, Margret B.; Golemis, Erica A.; Serebriiskii, Ilya G.
2017-01-01
Ovarian, head and neck, and other cancers are commonly treated with cisplatin and other DNA damaging cytotoxic agents. Altered DNA damage response (DDR) contributes to resistance of these tumors to chemotherapies, some targeted therapies, and radiation. DDR involves multiple protein complexes and signaling pathways, some of which are evolutionarily ancient and involve protein orthologs conserved from yeast to humans. To identify new regulators of cisplatin-resistance in human tumors, we integrated high throughput and curated datasets describing yeast genes that regulate sensitivity to cisplatin and/or ionizing radiation. Next, we clustered highly validated genes based on chemogenomic profiling, and then mapped orthologs of these genes in expanded genomic networks for multiple metazoans, including humans. This approach identified an enriched candidate set of genes involved in the regulation of resistance to radiation and/or cisplatin in humans. Direct functional assessment of selected candidate genes using RNA interference confirmed their activity in influencing cisplatin resistance, degree of γH2AX focus formation and ATR phosphorylation, in ovarian and head and neck cancer cell lines, suggesting impaired DDR signaling as the driving mechanism. This work enlarges the set of genes that may contribute to chemotherapy resistance and provides a new contextual resource for interpreting next generation sequencing (NGS) genomic profiling of tumors. PMID:27863405
Kumar, Gulshan; Gupta, Khushboo; Pathania, Shivalika; Swarnkar, Mohit Kumar; Rattan, Usha Kumari; Singh, Gagandeep; Sharma, Ram Kumar; Singh, Anil Kumar
2017-01-01
The availability of sufficient chilling during bud dormancy plays an important role in the subsequent yield and quality of apple fruit, whereas, insufficient chilling availability negatively impacts the apple production. The transcriptome profiling during bud dormancy release and initial fruit set under low and high chill conditions was performed using RNA-seq. The comparative high number of differentially expressed genes during bud break and fruit set under high chill condition indicates that chilling availability was associated with transcriptional reorganization. The comparative analysis reveals the differential expression of genes involved in phytohormone metabolism, particularly for Abscisic acid, gibberellic acid, ethylene, auxin and cytokinin. The expression of Dormancy Associated MADS-box, Flowering Locus C-like, Flowering Locus T-like and Terminal Flower 1-like genes was found to be modulated under differential chilling. The co-expression network analysis indentified two high chill specific modules that were found to be enriched for “post-embryonic development” GO terms. The network analysis also identified hub genes including Early flowering 7, RAF10, ZEP4 and F-box, which may be involved in regulating chilling-mediated dormancy release and fruit set. The results of transcriptome and co-expression network analysis indicate that chilling availability majorly regulates phytohormone-related pathways and post-embryonic development during bud break. PMID:28198417
Stam, Remco; Scheikl, Daniela; Tellier, Aurélien
2016-06-02
Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Exploring of the molecular mechanism of rhinitis via bioinformatics methods
Song, Yufen; Yan, Zhaohui
2018-01-01
The aim of this study was to analyze gene expression profiles for exploring the function and regulatory network of differentially expressed genes (DEGs) in pathogenesis of rhinitis by a bioinformatics method. The gene expression profile of GSE43523 was downloaded from the Gene Expression Omnibus database. The dataset contained 7 seasonal allergic rhinitis samples and 5 non-allergic normal samples. DEGs between rhinitis samples and normal samples were identified via the limma package of R. The webGestal database was used to identify enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the DEGs. The differentially co-expressed pairs of the DEGs were identified via the DCGL package in R, and the differential co-expression network was constructed based on these pairs. A protein-protein interaction (PPI) network of the DEGs was constructed based on the Search Tool for the Retrieval of Interacting Genes database. A total of 263 DEGs were identified in rhinitis samples compared with normal samples, including 125 downregulated ones and 138 upregulated ones. The DEGs were enriched in 7 KEGG pathways. 308 differential co-expression gene pairs were obtained. A differential co-expression network was constructed, containing 212 nodes. In total, 148 PPI pairs of the DEGs were identified, and a PPI network was constructed based on these pairs. Bioinformatics methods could help us identify significant genes and pathways related to the pathogenesis of rhinitis. Steroid biosynthesis pathway and metabolic pathways might play important roles in the development of allergic rhinitis (AR). Genes such as CDC42 effector protein 5, solute carrier family 39 member A11 and PR/SET domain 10 might be also associated with the pathogenesis of AR, which provided references for the molecular mechanisms of AR. PMID:29257233
Li, Yiping; Li, Yanhong; Bai, Zhenjiang; Pan, Jian; Wang, Jian; Fang, Fang
2017-12-13
Sepsis represents a complex disease with the dysregulated inflammatory response and high mortality rate. The goal of this study was to identify potential transcriptomic markers in developing pediatric sepsis by a co-expression module analysis of the transcriptomic dataset. Using the R software and Bioconductor packages, we performed a weighted gene co-expression network analysis to identify co-expression modules significantly associated with pediatric sepsis. Functional interpretation (gene ontology and pathway analysis) and enrichment analysis with known transcription factors and microRNAs of the identified candidate modules were then performed. In modules significantly associated with sepsis, the intramodular analysis was further performed and "hub genes" were identified and validated by quantitative real-time PCR (qPCR) in this study. 15 co-expression modules in total were detected, and four modules ("midnight blue", "cyan", "brown", and "tan") were most significantly associated with pediatric sepsis and suggested as potential sepsis-associated modules. Gene ontology analysis and pathway analysis revealed that these four modules strongly associated with immune response. Three of the four sepsis-associated modules were also enriched with known transcription factors (false discovery rate-adjusted P < 0.05). Hub genes were identified in each of the four modules. Four of the identified hub genes (MYB proto-oncogene like 1, killer cell lectin like receptor G1, stomatin, and membrane spanning 4-domains A4A) were further validated to be differentially expressed between septic children and controls by qPCR. Four pediatric sepsis-associated co-expression modules were identified in this study. qPCR results suggest that hub genes in these modules are potential transcriptomic markers for pediatric sepsis diagnosis. These results provide novel insights into the pathogenesis of pediatric sepsis and promote the generation of diagnostic gene sets.
Al-Quraishy, Saleh; Dkhil, Mohamed A; Abdel-Baki, Abdel-Azeem S; Ghanjati, Foued; Erichsen, Lars; Santourlidis, Simeon; Wunderlich, Frank; Araúzo-Bravo, Marcos J
2017-05-01
Epigenetic mechanisms such as DNA methylation are increasingly recognized to be critical for vaccination efficacy and outcome of different infectious diseases, but corresponding information is scarcely available for host defense against malaria. In the experimental blood-stage malaria Plasmodium chabaudi, we investigate the possible effects of a blood-stage vaccine on DNA methylation of gene promoters in the liver, known as effector against blood-stage malaria, using DNA methylation microarrays. Naturally susceptible Balb/c mice acquire, by protective vaccination, the potency to survive P. chabaudi malaria and, concomitantly, modifications of constitutive DNA methylation of promoters of numerous genes in the liver; specifically, promoters of 256 genes are hyper(=up)- and 345 genes are hypo(=down)-methylated (p < 0.05). Protective vaccination also leads to changes in promoter DNA methylation upon challenge with P. chabaudi at peak parasitemia on day 8 post infection (p.i.), when 571 and 1013 gene promoters are up- and down-methylated, respectively, in relation to constitutive DNA methylation (p < 0.05). Gene set enrichment analyses reveal that both vaccination and P. chabaudi infections mainly modify promoters of those genes which are most statistically enriched with functions relating to regulation of transcription. Genes with down-methylated promoters encompass those encoding CX3CL1, GP130, and GATA2, known to be involved in monocyte recruitment, IL-6 trans-signaling, and onset of erythropoiesis, respectively. Our data suggest that vaccination may epigenetically improve parts of several effector functions of the liver against blood-stage malaria, as, e.g., recruitment of monocyte/macrophage to the liver accelerated liver regeneration and extramedullary hepatic erythropoiesis, thus leading to self-healing of otherwise lethal P. chabaudi blood-stage malaria.
Robinson, Gene E.; Jakobsson, Eric
2016-01-01
The emerging field of sociogenomics explores the relations between social behavior and genome structure and function. An important question is the extent to which associations between social behavior and gene expression are conserved among the Metazoa. Prior experimental work in an invertebrate model of social behavior, the honey bee, revealed distinct brain gene expression patterns in African and European honey bees, and within European honey bees with different behavioral phenotypes. The present work is a computational study of these previous findings in which we analyze, by orthology determination, the extent to which genes that are socially regulated in honey bees are conserved across the Metazoa. We found that the differentially expressed gene sets associated with alarm pheromone response, the difference between old and young bees, and the colony influence on soldier bees, are enriched in widely conserved genes, indicating that these differences have genomic bases shared with many other metazoans. By contrast, the sets of differentially expressed genes associated with the differences between African and European forager and guard bees are depleted in widely conserved genes, indicating that the genomic basis for this social behavior is relatively specific to honey bees. For the alarm pheromone response gene set, we found a particularly high degree of conservation with mammals, even though the alarm pheromone itself is bee-specific. Gene Ontology identification of human orthologs to the strongly conserved honey bee genes associated with the alarm pheromone response shows overrepresentation of protein metabolism, regulation of protein complex formation, and protein folding, perhaps associated with remodeling of critical neural circuits in response to alarm pheromone. We hypothesize that such remodeling may be an adaptation of social animals to process and respond appropriately to the complex patterns of conspecific communication essential for social organization. PMID:27359102
Differential Effect of Active Smoking on Gene Expression in Male and Female Smokers
Paul, Sunirmal; Amundson, Sally A
2015-01-01
Smoking is the second leading cause of preventable death in the United States. Cohort epidemiological studies have demonstrated that women are more vulnerable to cigarette-smoking induced diseases than their male counterparts, however, the molecular basis of these differences has remained unknown. In this study, we explored if there were differences in the gene expression patterns between male and female smokers, and how these patterns might reflect different sex-specific responses to the stress of smoking. Using whole genome microarray gene expression profiling, we found that a substantial number of oxidant related genes were expressed in both male and female smokers, however, smoking-responsive genes did indeed differ greatly between male and female smokers. Gene set enrichment analysis (GSEA) against reference oncogenic signature gene sets identified a large number of oncogenic pathway gene-sets that were significantly altered in female smokers compared to male smokers. In addition, functional annotation with Ingenuity Pathway Analysis (IPA) identified smoking-correlated genes associated with biological functions in male and female smokers that are directly relevant to well-known smoking related pathologies. However, these relevant biological functions were strikingly overrepresented in female smokers compared to male smokers. IPA network analysis with the functional categories of immune and inflammatory response gene products suggested potential interactions between smoking response and female hormones. Our results demonstrate a striking dichotomy between male and female gene expression responses to smoking. This is the first genome-wide expression study to compare the sex-specific impacts of smoking at a molecular level and suggests a novel potential connection between sex hormone signaling and smoking-induced diseases in female smokers. PMID:25621181
Liu, Hui; Robinson, Gene E; Jakobsson, Eric
2016-06-01
The emerging field of sociogenomics explores the relations between social behavior and genome structure and function. An important question is the extent to which associations between social behavior and gene expression are conserved among the Metazoa. Prior experimental work in an invertebrate model of social behavior, the honey bee, revealed distinct brain gene expression patterns in African and European honey bees, and within European honey bees with different behavioral phenotypes. The present work is a computational study of these previous findings in which we analyze, by orthology determination, the extent to which genes that are socially regulated in honey bees are conserved across the Metazoa. We found that the differentially expressed gene sets associated with alarm pheromone response, the difference between old and young bees, and the colony influence on soldier bees, are enriched in widely conserved genes, indicating that these differences have genomic bases shared with many other metazoans. By contrast, the sets of differentially expressed genes associated with the differences between African and European forager and guard bees are depleted in widely conserved genes, indicating that the genomic basis for this social behavior is relatively specific to honey bees. For the alarm pheromone response gene set, we found a particularly high degree of conservation with mammals, even though the alarm pheromone itself is bee-specific. Gene Ontology identification of human orthologs to the strongly conserved honey bee genes associated with the alarm pheromone response shows overrepresentation of protein metabolism, regulation of protein complex formation, and protein folding, perhaps associated with remodeling of critical neural circuits in response to alarm pheromone. We hypothesize that such remodeling may be an adaptation of social animals to process and respond appropriately to the complex patterns of conspecific communication essential for social organization.
Coregulation of FANCA and BRCA1 in human cells.
Haitjema, Anneke; Mol, Berber M; Kooi, Irsan E; Massink, Maarten Pg; Jørgensen, Jens Al; Rockx, Davy Ap; Rooimans, Martin A; de Winter, Johan P; Meijers-Heijboer, Hanne; Joenje, Hans; Dorsman, Josephine C
2014-01-01
Fanconi anemia (FA) is a genetically heterogeneous syndrome associated with increased cancer predisposition. The underlying genes govern the FA pathway which functions to protect the genome during the S-phase of the cell cycle. While upregulation of FA genes has been linked to chemotherapy resistance, little is known about their regulation in response to proliferative stimuli. The purpose of this study was to examine how FA genes are regulated, especially in relation to the cell cycle, in order to reveal their possible participation in biochemical networks. Expression of 14 FA genes was monitored in two human cell-cycle models and in two RB1/E2F pathway-associated primary cancers, retinoblastoma and basal breast cancer. In silico studies were performed to further evaluate coregulation and identify connected networks and diseases. Only FANCA was consistently induced over 2-fold; FANCF failed to exhibit any regulatory fluctuations. Two tools exploiting public data sets indicated coregulation of FANCA with BRCA1. Upregulation of FANCA and BRCA1 correlated with upregulation of E2F3. Genes coregulated with both FANCA and BRCA1 were enriched for MeSH-Term id(s) genomic instability, microcephaly, and Bloom syndrome, and enriched for the cellular component centrosome. The regulation of FA genes appears highly divergent. In RB1-linked tumors, upregulation of FA network genes was associated with reduced expression of FANCF. FANCA and BRCA1 may jointly act in a subnetwork - supporting vital function(s) at the subcellular level (centrosome) as well as at the level of embryonic development (mechanisms controlling head circumference).
FUN-L: gene prioritization for RNAi screens.
Lees, Jonathan G; Hériché, Jean-Karim; Morilla, Ian; Fernández, José M; Adler, Priit; Krallinger, Martin; Vilo, Jaak; Valencia, Alfonso; Ellenberg, Jan; Ranea, Juan A; Orengo, Christine
2015-06-15
Most biological processes remain only partially characterized with many components still to be identified. Given that a whole genome can usually not be tested in a functional assay, identifying the genes most likely to be of interest is of critical importance to avoid wasting resources. Given a set of known functionally related genes and using a state-of-the-art approach to data integration and mining, our Functional Lists (FUN-L) method provides a ranked list of candidate genes for testing. Validation of predictions from FUN-L with independent RNAi screens confirms that FUN-L-produced lists are enriched in genes with the expected phenotypes. In this article, we describe a website front end to FUN-L. The website is freely available to use at http://funl.org © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Roberts, C H; Turino, C; Madrigal, J A; Marsh, S G E
2007-06-01
DNA enrichment by allele-specific hybridization (DEASH) was used as a means to isolate individual alleles of the killer cell immunoglobulin-like receptor (KIR2DL4) gene from heterozygous genomic DNA. Using long-template polymerase chain reaction (LT-PCR), the complete KIR2DL4 gene was amplified from a cell line that had previously been characterized for its KIR gene content by PCR using sequence-specific primers (PCR-SSP). The whole gene amplicons were sequenced and we identified two heterozygous positions in accordance with the predictions of the PCR-SSP. The amplicons were then hybridized to allele-specific, biotinylated oligonucleotide probes and through binding to streptavidin-coated beads, the targeted alleles were enriched. A second PCR amplified only the exonic regions of the enriched allele, and these were then sequenced in full. We show DEASH to be capable of enriching single alleles from a heterozygous PCR product, and through sequencing the enriched DNA, we are able to produce complete coding sequences of the KIR2DL4 alleles in accordance with the typing predicted by PCR-SSP.
Li, Chi-Ming; Guo, Meirong; Borczuk, Alain; Powell, Charles A.; Wei, Michelle; Thaker, Harshwardhan M.; Friedman, Richard; Klein, Ulf; Tycko, Benjamin
2002-01-01
Wilms’ tumor (WT) has been considered a prototype for arrested cellular differentiation in cancer, but previous studies have relied on selected markers. We have now performed an unbiased survey of gene expression in WTs using oligonucleotide microarrays. Statistical criteria identified 357 genes as differentially expressed between WTs and fetal kidneys. This set contained 124 matches to genes on a microarray used by Stuart and colleagues (Stuart RO, Bush KT, Nigam SK: Changes in global gene expression patterns during development and maturation of the rat kidney. Proc Natl Acad Sci USA 2001, 98:5649–5654) to establish genes with stage-specific expression in the developing rat kidney. Mapping between the two data sets showed that WTs systematically overexpressed genes corresponding to the earliest stage of metanephric development, and underexpressed genes corresponding to later stages. Automated clustering identified a smaller group of 27 genes that were highly expressed in WTs compared to fetal kidney and heterologous tumor and normal tissues. This signature set was enriched in genes encoding transcription factors. Four of these, PAX2, EYA1, HBF2, and HOXA11, are essential for cell survival and proliferation in early metanephric development, whereas others, including SIX1, MOX1, and SALL2, are predicted to act at this stage. SIX1 and SALL2 proteins were expressed in the condensing mesenchyme in normal human fetal kidneys, but were absent (SIX1) or reduced (SALL2) in cells at other developmental stages. These data imply that the blastema in WTs has progressed to the committed stage in the mesenchymal-epithelial transition, where it is partially arrested in differentiation. The WT-signature set also contained the Wnt receptor FZD7, the tumor antigen PRAME, the imprinted gene NNAT and the metastasis-associated transcription factor E1AF. PMID:12057921
Identification of the Key Genes and Pathways in Esophageal Carcinoma.
Su, Peng; Wen, Shiwang; Zhang, Yuefeng; Li, Yong; Xu, Yanzhao; Zhu, Yonggang; Lv, Huilai; Zhang, Fan; Wang, Mingbo; Tian, Ziqiang
2016-01-01
Objective . Esophageal carcinoma (EC) is a frequently common malignancy of gastrointestinal cancer in the world. This study aims to screen key genes and pathways in EC and elucidate the mechanism of it. Methods . 5 microarray datasets of EC were downloaded from Gene Expression Omnibus. Differentially expressed genes (DEGs) were screened by bioinformatics analysis. Gene Ontology (GO) enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, and protein-protein interaction (PPI) network construction were performed to obtain the biological roles of DEGs in EC. Quantitative real-time polymerase chain reaction (qRT-PCR) was used to verify the expression level of DEGs in EC. Results . A total of 1955 genes were filtered as DEGs in EC. The upregulated genes were significantly enriched in cell cycle and the downregulated genes significantly enriched in Endocytosis. PPI network displayed CDK4 and CCT3 were hub proteins in the network. The expression level of 8 dysregulated DEGs including CDK4, CCT3, THSD4, SIM2, MYBL2, CENPF, CDCA3, and CDKN3 was validated in EC compared to adjacent nontumor tissues and the results were matched with the microarray analysis. Conclusion . The significantly DEGs including CDK4, CCT3, THSD4, and SIM2 may play key roles in tumorigenesis and development of EC involved in cell cycle and Endocytosis.
Gerrard, Gareth; Valgañón, Mikel; Foong, Hui En; Kasperaviciute, Dalia; Iskander, Deena; Game, Laurence; Müller, Michael; Aitman, Timothy J; Roberts, Irene; de la Fuente, Josu; Foroni, Letizia; Karadimitris, Anastasios
2013-08-01
Diamond-Blackfan anaemia (DBA) is caused by inactivating mutations in ribosomal protein (RP) genes, with mutations in 13 of the 80 RP genes accounting for 50-60% of cases. The remaining 40-50% cases may harbour mutations in one of the remaining RP genes, but the very low frequencies render conventional genetic screening as challenging. We, therefore, applied custom enrichment technology combined with high-throughput sequencing to screen all 80 RP genes. Using this approach, we identified and validated inactivating mutations in 15/17 (88%) DBA patients. Target enrichment combined with high-throughput sequencing is a robust and improved methodology for the genetic diagnosis of DBA. © 2013 John Wiley & Sons Ltd.
Orgeur, Mickael; Martens, Marvin; Leonte, Georgeta; Nassari, Sonya; Bonnin, Marie-Ange; Börno, Stefan T; Timmermann, Bernd; Hecht, Jochen; Duprez, Delphine; Stricker, Sigmar
2018-03-29
Connective tissues support organs and play crucial roles in development, homeostasis and fibrosis, yet our understanding of their formation is still limited. To gain insight into the molecular mechanisms of connective tissue specification, we selected five zinc-finger transcription factors - OSR1, OSR2, EGR1, KLF2 and KLF4 - based on their expression patterns and/or known involvement in connective tissue subtype differentiation. RNA-seq and ChIP-seq profiling of chick limb micromass cultures revealed a set of common genes regulated by all five transcription factors, which we describe as a connective tissue core expression set. This common core was enriched with genes associated with axon guidance and myofibroblast signature, including fibrosis-related genes. In addition, each transcription factor regulated a specific set of signalling molecules and extracellular matrix components. This suggests a concept whereby local molecular niches can be created by the expression of specific transcription factors impinging on the specification of local microenvironments. The regulatory network established here identifies common and distinct molecular signatures of limb connective tissue subtypes, provides novel insight into the signalling pathways governing connective tissue specification, and serves as a resource for connective tissue development. © 2018. Published by The Company of Biologists Ltd.
Improving information retrieval in functional analysis.
Rodriguez, Juan C; González, Germán A; Fresno, Cristóbal; Llera, Andrea S; Fernández, Elmer A
2016-12-01
Transcriptome analysis is essential to understand the mechanisms regulating key biological processes and functions. The first step usually consists of identifying candidate genes; to find out which pathways are affected by those genes, however, functional analysis (FA) is mandatory. The most frequently used strategies for this purpose are Gene Set and Singular Enrichment Analysis (GSEA and SEA) over Gene Ontology. Several statistical methods have been developed and compared in terms of computational efficiency and/or statistical appropriateness. However, whether their results are similar or complementary, the sensitivity to parameter settings, or possible bias in the analyzed terms has not been addressed so far. Here, two GSEA and four SEA methods and their parameter combinations were evaluated in six datasets by comparing two breast cancer subtypes with well-known differences in genetic background and patient outcomes. We show that GSEA and SEA lead to different results depending on the chosen statistic, model and/or parameters. Both approaches provide complementary results from a biological perspective. Hence, an Integrative Functional Analysis (IFA) tool is proposed to improve information retrieval in FA. It provides a common gene expression analytic framework that grants a comprehensive and coherent analysis. Only a minimal user parameter setting is required, since the best SEA/GSEA alternatives are integrated. IFA utility was demonstrated by evaluating four prostate cancer and the TCGA breast cancer microarray datasets, which showed its biological generalization capabilities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evans, Melissa L.; Hori, Tiago S.; Rise, Matthew L.; Fleming, Ian A.
2015-01-01
Captive rearing programs (hatcheries) are often used in conservation and management efforts for at-risk salmonid fish populations. However, hatcheries typically rear juveniles in environments that contrast starkly with natural conditions, which may lead to phenotypic and/or genetic changes that adversely affect the performance of juveniles upon their release to the wild. Environmental enrichment has been proposed as a mechanism to improve the efficacy of population restoration efforts from captive-rearing programs; in this study, we examine the influence of environmental enrichment during embryo and yolk-sac larval rearing on the transcriptome of Atlantic salmon (Salmo salar). Full siblings were reared in either a hatchery environment devoid of structure or an environment enriched with gravel substrate. At the end of endogenous feeding by juveniles, we examined patterns of gene transcript abundance in head tissues using the cGRASP-designed Agilent 4×44K microarray. Significance analysis of microarrays (SAM) indicated that 808 genes were differentially transcribed between the rearing environments and a total of 184 gene ontological (GO) terms were over- or under-represented in this gene list, several associated with mitosis/cell cycle and muscle and heart development. There were also pronounced differences among families in the degree of transcriptional response to rearing environment enrichment, suggesting that gene-by-environment effects, possibly related to parental origin, could influence the efficacy of enrichment interventions. PMID:25742646
Evans, Melissa L; Hori, Tiago S; Rise, Matthew L; Fleming, Ian A
2015-01-01
Captive rearing programs (hatcheries) are often used in conservation and management efforts for at-risk salmonid fish populations. However, hatcheries typically rear juveniles in environments that contrast starkly with natural conditions, which may lead to phenotypic and/or genetic changes that adversely affect the performance of juveniles upon their release to the wild. Environmental enrichment has been proposed as a mechanism to improve the efficacy of population restoration efforts from captive-rearing programs; in this study, we examine the influence of environmental enrichment during embryo and yolk-sac larval rearing on the transcriptome of Atlantic salmon (Salmo salar). Full siblings were reared in either a hatchery environment devoid of structure or an environment enriched with gravel substrate. At the end of endogenous feeding by juveniles, we examined patterns of gene transcript abundance in head tissues using the cGRASP-designed Agilent 4×44K microarray. Significance analysis of microarrays (SAM) indicated that 808 genes were differentially transcribed between the rearing environments and a total of 184 gene ontological (GO) terms were over- or under-represented in this gene list, several associated with mitosis/cell cycle and muscle and heart development. There were also pronounced differences among families in the degree of transcriptional response to rearing environment enrichment, suggesting that gene-by-environment effects, possibly related to parental origin, could influence the efficacy of enrichment interventions.
The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution
Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.
2010-01-01
To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049
2010-01-01
using linker -mediated PCR as described previously (25). Amplified DNA was labeled and hybridized in triplicate by NimbleGen Systems, Inc., to their human...leading edge analysis (37) of these gene sets identified TGFb–induced SMAD3 direct target genes (Supplementary Table S5) as enriched in SOX4 target...3.06E11 PAX5 Paired box 2.07E10 WHN Forkhead 2.94E10 SMAD3 SMAD 1.82E09 SMAD4 SMAD 3.33E09 MYC MYC 6.25E09 NFKAPPAB NF-nB 2.95E08 LEF1/TCF1 LEF
Gardiner, Laura-Jayne; Bansept-Basler, Pauline; Olohan, Lisa; Joynson, Ryan; Brenchley, Rachel; Hall, Neil; O'Sullivan, Donal M; Hall, Anthony
2016-08-01
Previously we extended the utility of mapping-by-sequencing by combining it with sequence capture and mapping sequence data to pseudo-chromosomes that were organized using wheat-Brachypodium synteny. This, with a bespoke haplotyping algorithm, enabled us to map the flowering time locus in the diploid wheat Triticum monococcum L. identifying a set of deleted genes (Gardiner et al., 2014). Here, we develop this combination of gene enrichment and sliding window mapping-by-synteny analysis to map the Yr6 locus for yellow stripe rust resistance in hexaploid wheat. A 110 MB NimbleGen capture probe set was used to enrich and sequence a doubled haploid mapping population of hexaploid wheat derived from an Avalon and Cadenza cross. The Yr6 locus was identified by mapping to the POPSEQ chromosomal pseudomolecules using a bespoke pipeline and algorithm (Chapman et al., 2015). Furthermore the same locus was identified using newly developed pseudo-chromosome sequences as a mapping reference that are based on the genic sequence used for sequence enrichment. The pseudo-chromosomes allow us to demonstrate the application of mapping-by-sequencing to even poorly defined polyploidy genomes where chromosomes are incomplete and sub-genome assemblies are collapsed. This analysis uniquely enabled us to: compare wheat genome annotations; identify the Yr6 locus - defining a smaller genic region than was previously possible; associate the interval with one wheat sub-genome and increase the density of SNP markers associated. Finally, we built the pipeline in iPlant, making it a user-friendly community resource for phenotype mapping. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.