GO-based functional dissimilarity of gene sets.
Díaz-Díaz, Norberto; Aguilar-Ruiz, Jesús S
2011-09-01
The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.
Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.
Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin
2011-04-14
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.
Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.
Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K
2011-01-01
Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.
Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex
2010-01-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
Ficklin, Stephen P; Luo, Feng; Feltus, F Alex
2010-09-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
snpGeneSets: An R Package for Genome-Wide Study Annotation
Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian
2016-01-01
Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis.
de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Verheijen, Mark H G; Posthuma, Danielle
2015-11-01
Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis.
Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis
de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Scharf, J M; Pauls, D L; Yu, D; Illmann, C; Osiecki, L; Neale, B M; Mathews, C A; Reus, V I; Lowe, T L; Freimer, N B; Cox, N J; Davis, L K; Rouleau, G A; Chouinard, S; Dion, Y; Girard, S; Cath, D C; Posthuma, D; Smit, J H; Heutink, P; King, R A; Fernandez, T; Leckman, J F; Sandor, P; Barr, C L; McMahon, W; Lyon, G; Leppert, M; Morgan, J; Weiss, R; Grados, M A; Singer, H; Jankovic, J; Tischfield, J A; Heiman, G A; Verheijen, Mark H G; Posthuma, Danielle
2015-01-01
Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis. PMID:25735483
Functional Abstraction as a Method to Discover Knowledge in Gene Ontologies
Ultsch, Alfred; Lötsch, Jörn
2014-01-01
Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the ‘main story’. We developed a methodology to identify a comprehensibly small number of GO terms as “headlines” of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery. PMID:24587272
Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.
Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J
2017-01-01
The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.
Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew
2012-08-08
Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
BRAIN NETWORKS. Correlated gene expression supports synchronous activity in brain networks.
Richiardi, Jonas; Altmann, Andre; Milazzo, Anna-Clare; Chang, Catie; Chakravarty, M Mallar; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Bromberg, Uli; Büchel, Christian; Conrod, Patricia; Fauth-Bühler, Mira; Flor, Herta; Frouin, Vincent; Gallinat, Jürgen; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Lemaître, Hervé; Mann, Karl F; Martinot, Jean-Luc; Nees, Frauke; Paus, Tomáš; Pausova, Zdenka; Rietschel, Marcella; Robbins, Trevor W; Smolka, Michael N; Spanagel, Rainer; Ströhle, Andreas; Schumann, Gunter; Hawrylycz, Mike; Poline, Jean-Baptiste; Greicius, Michael D
2015-06-12
During rest, brain activity is synchronized between different regions widely distributed throughout the brain, forming functional networks. However, the molecular mechanisms supporting functional connectivity remain undefined. We show that functional brain networks defined with resting-state functional magnetic resonance imaging can be recapitulated by using measures of correlated gene expression in a post mortem brain tissue data set. The set of 136 genes we identify is significantly enriched for ion channels. Polymorphisms in this set of genes significantly affect resting-state functional connectivity in a large sample of healthy adolescents. Expression levels of these genes are also significantly associated with axonal connectivity in the mouse. The results provide convergent, multimodal evidence that resting-state functional networks correlate with the orchestrated activity of dozens of genes linked to ion channel activity and synaptic function. Copyright © 2015, American Association for the Advancement of Science.
Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay
2004-01-01
Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
Yang, Xinan Holly; Li, Meiyi; Wang, Bin; Zhu, Wanqi; Desgardin, Aurelie; Onel, Kenan; de Jong, Jill; Chen, Jianjun; Chen, Luonan; Cunningham, John M
2015-03-24
Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational "gene-to-function, snapshot-to-dynamics, and biology-to-clinic" framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform. We introduced an adjustable "soft threshold" to a functional gene-set algorithm and found that two different analysis methods identified distinct gene-set signatures from the same samples. We identified a 30-gene cluster that characterizes leukemic stem cell (LSC)-depleted cells and a 25-gene cluster that characterizes LSC-enriched cells in parallel; both mark favorable-prognosis in AML. Genes within each signature significantly share common biological processes and/or molecular functions (empirical p = 6e-5 and 0.03 respectively). The 25-gene signature reflects the abnormal development of stem cells in AML, such as AURKA over-expression. We subsequently determined that the clinical relevance of both signatures is independent of known clinical risk classifications in 214 patients with cytogenetically normal AML. We successfully validated the prognosis of both signatures in two independent cohorts of 91 and 242 patients respectively (log-rank p < 0.0015 and 0.05; empirical p < 0.015 and 0.08). The proposed algorithms and computational framework will harness systems biology research because they efficiently translate gene-sets (rather than single genes) into biological discoveries about AML and other complex diseases.
Raychaudhuri, Soumya; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David; Sklar, Pamela; Purcell, Shaun; Daly, Mark J.
2010-01-01
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package. PMID:20838587
A critical assessment of Mus musculus gene function prediction using integrated genomic evidence
Peña-Castillo, Lourdes; Tasan, Murat; Myers, Chad L; Lee, Hyunju; Joshi, Trupti; Zhang, Chao; Guan, Yuanfang; Leone, Michele; Pagnani, Andrea; Kim, Wan Kyu; Krumpelman, Chase; Tian, Weidong; Obozinski, Guillaume; Qi, Yanjun; Mostafavi, Sara; Lin, Guan Ning; Berriz, Gabriel F; Gibbons, Francis D; Lanckriet, Gert; Qiu, Jian; Grant, Charles; Barutcuoglu, Zafer; Hill, David P; Warde-Farley, David; Grouios, Chris; Ray, Debajyoti; Blake, Judith A; Deng, Minghua; Jordan, Michael I; Noble, William S; Morris, Quaid; Klein-Seetharaman, Judith; Bar-Joseph, Ziv; Chen, Ting; Sun, Fengzhu; Troyanskaya, Olga G; Marcotte, Edward M; Xu, Dong; Hughes, Timothy R; Roth, Frederick P
2008-01-01
Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized. PMID:18613946
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-01
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-11
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.
2016-01-01
ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; Taylor, Ronald C.; Weisenhorn, Pamela; Olson, Robert D.; Stevens, Rick L.; Rocha, Miguel; Rocha, Isabel; Best, Aaron A.; DeJongh, Matthew; Tintle, Nathan L.; Parrello, Bruce; Overbeek, Ross; Henry, Christopher S.
2016-01-01
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain. PMID:27933038
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Determining Semantically Related Significant Genes.
Taha, Kamal
2014-01-01
GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
Computation and application of tissue-specific gene set weights.
Frost, H Robert
2018-04-06
Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.
An improved method for functional similarity analysis of genes based on Gene Ontology.
Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia
2016-12-23
Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .
Gene set analysis of purine and pyrimidine antimetabolites cancer therapies.
Fridley, Brooke L; Batzler, Anthony; Li, Liang; Li, Fang; Matimba, Alice; Jenkins, Gregory D; Ji, Yuan; Wang, Liewei; Weinshilboum, Richard M
2011-11-01
Responses to therapies, either with regard to toxicities or efficacy, are expected to involve complex relationships of gene products within the same molecular pathway or functional gene set. Therefore, pathways or gene sets, as opposed to single genes, may better reflect the true underlying biology and may be more appropriate units for analysis of pharmacogenomic studies. Application of such methods to pharmacogenomic studies may enable the detection of more subtle effects of multiple genes in the same pathway that may be missed by assessing each gene individually. A gene set analysis of 3821 gene sets is presented assessing the association between basal messenger RNA expression and drug cytotoxicity using ethnically defined human lymphoblastoid cell lines for two classes of drugs: pyrimidines [gemcitabine (dFdC) and arabinoside] and purines [6-thioguanine and 6-mercaptopurine]. The gene set nucleoside-diphosphatase activity was found to be significantly associated with both dFdC and arabinoside, whereas gene set γ-aminobutyric acid catabolic process was associated with dFdC and 6-thioguanine. These gene sets were significantly associated with the phenotype even after adjusting for multiple testing. In addition, five associated gene sets were found in common between the pyrimidines and two gene sets for the purines (3',5'-cyclic-AMP phosphodiesterase activity and γ-aminobutyric acid catabolic process) with a P value of less than 0.0001. Functional validation was attempted with four genes each in gene sets for thiopurine and pyrimidine antimetabolites. All four genes selected from the pyrimidine gene sets (PSME3, CANT1, ENTPD6, ADRM1) were validated, but only one (PDE4D) was validated for the thiopurine gene sets. In summary, results from the gene set analysis of pyrimidine and purine therapies, used often in the treatment of various cancers, provide novel insight into the relationship between genomic variation and drug response.
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes
Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.
2005-01-01
Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209
The Gene Set Builder: collation, curation, and distribution of sets of genes
Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W
2005-01-01
Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of genes in a user-friendly environment. The application can be accessed via . PMID:16371163
Brown, David A; Di Cerbo, Vincenzo; Feldmann, Angelika; Ahn, Jaewoo; Ito, Shinsuke; Blackledge, Neil P; Nakayama, Manabu; McClellan, Michael; Dimitrova, Emilia; Turberfield, Anne H; Long, Hannah K; King, Hamish W; Kriaucionis, Skirmantas; Schermelleh, Lothar; Kutateladze, Tatiana G; Koseki, Haruhiko; Klose, Robert J
2017-09-05
Chromatin modifications and the promoter-associated epigenome are important for the regulation of gene expression. However, the mechanisms by which chromatin-modifying complexes are targeted to the appropriate gene promoters in vertebrates and how they influence gene expression have remained poorly defined. Here, using a combination of live-cell imaging and functional genomics, we discover that the vertebrate SET1 complex is targeted to actively transcribed gene promoters through CFP1, which engages in a form of multivalent chromatin reading that involves recognition of non-methylated DNA and histone H3 lysine 4 trimethylation (H3K4me3). CFP1 defines SET1 complex occupancy on chromatin, and its multivalent interactions are required for the SET1 complex to place H3K4me3. In the absence of CFP1, gene expression is perturbed, suggesting that normal targeting and function of the SET1 complex are central to creating an appropriately functioning vertebrate promoter-associated epigenome. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Zhu, Xinyu; Ma, Hong; Chen, Zhiduan
2011-03-09
Plants contain numerous Su(var)3-9 homologues (SUVH) and related (SUVR) genes, some of which await functional characterization. Although there have been studies on the evolution of plant Su(var)3-9 SET genes, a systematic evolutionary study including major land plant groups has not been reported. Large-scale phylogenetic and evolutionary analyses can help to elucidate the underlying molecular mechanisms and contribute to improve genome annotation. Putative orthologs of plant Su(var)3-9 SET protein sequences were retrieved from major representatives of land plants. A novel clustering that included most members analyzed, henceforth referred to as core Su(var)3-9 homologues and related (cSUVHR) gene clade, was identified as well as all orthologous groups previously identified. Our analysis showed that plant Su(var)3-9 SET proteins possessed a variety of domain organizations, and can be classified into five types and ten subtypes. Plant Su(var)3-9 SET genes also exhibit a wide range of gene structures among different paralogs within a family, even in the regions encoding conserved PreSET and SET domains. We also found that the majority of SUVH members were intronless and formed three subclades within the SUVH clade. A detailed phylogenetic analysis of the plant Su(var)3-9 SET genes was performed. A novel deep phylogenetic relationship including most plant Su(var)3-9 SET genes was identified. Additional domains such as SAR, ZnF_C2H2 and WIYLD were early integrated into primordial PreSET/SET/PostSET domain organization. At least three classes of gene structures had been formed before the divergence of Physcomitrella patens (moss) from other land plants. One or multiple retroposition events might have occurred among SUVH genes with the donor genes leading to the V-2 orthologous group. The structural differences among evolutionary groups of plant Su(var)3-9 SET genes with different functions were described, contributing to the design of further experimental studies.
Xu, Yuantao; Wu, Guizhi; Hao, Baohai; Chen, Lingling; Deng, Xiuxin; Xu, Qiang
2015-11-23
With the availability of rapidly increasing number of genome and transcriptome sequences, lineage-specific genes (LSGs) can be identified and characterized. Like other conserved functional genes, LSGs play important roles in biological evolution and functions. Two set of citrus LSGs, 296 citrus-specific genes (CSGs) and 1039 orphan genes specific to sweet orange, were identified by comparative analysis between the sweet orange genome sequences and 41 genomes and 273 transcriptomes. With the two sets of genes, gene structure and gene expression pattern were investigated. On average, both the CSGs and orphan genes have fewer exons, shorter gene length and higher GC content when compared with those evolutionarily conserved genes (ECs). Expression profiling indicated that most of the LSGs expressed in various tissues of sweet orange and some of them exhibited distinct temporal and spatial expression patterns. Particularly, the orphan genes were preferentially expressed in callus, which is an important pluripotent tissue of citrus. Besides, part of the CSGs and orphan genes expressed responsive to abiotic stress, indicating their potential functions during interaction with environment. This study identified and characterized two sets of LSGs in citrus, dissected their sequence features and expression patterns, and provided valuable clues for future functional analysis of the LSGs in sweet orange.
Functionally Enigmatic Genes: A Case Study of the Brain Ignorome
Pandey, Ashutosh K.; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W.
2014-01-01
What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed—the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases—ELMOD1, TMEM88B, and DZANK1—we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes. PMID:24523945
Functionally enigmatic genes: a case study of the brain ignorome.
Pandey, Ashutosh K; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W
2014-01-01
What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed--the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum--a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases--ELMOD1, TMEM88B, and DZANK1--we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes.
Consistency of gene starts among Burkholderia genomes
2011-01-01
Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528
Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling; Wang, Xianhui; Kang, Le
2017-06-01
The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain-containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. © The Authors 2017. Published by Oxford University Press.
Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling
2017-01-01
Abstract The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain–containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. PMID:28444351
Phylogenetics and evolution of Trx SET genes in fully sequenced land plants.
Zhu, Xinyu; Chen, Caoyi; Wang, Baohua
2012-04-01
Plant Trx SET proteins are involved in H3K4 methylation and play a key role in plant floral development. Genes encoding Trx SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history of the Trx SET gene family, we made a comprehensive evolutionary analysis on this gene family from 13 major representatives of green plants. A novel clustering (here named as cpTrx clade), which included the III-1, III-2, and III-4 orthologous groups, previously resolved was identified. Our analysis showed that plant Trx proteins possessed a variety of domain organizations and gene structures among paralogs. Additional domains such as PHD, PWWP, and FYR were early integrated into primordial SET-PostSET domain organization of cpTrx clade. We suggested that the PostSET domain was lost in some members of III-4 orthologous group during the evolution of land plants. At least four classes of gene structures had been formed at the early evolutionary stage of land plants. Three intronless orphan Trx SET genes from the Physcomitrella patens (moss) were identified, and supposedly, their parental genes have been eliminated from the genome. The structural differences among evolutionary groups of plant Trx SET genes with different functions were described, contributing to the design of further experimental studies.
Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su
2015-01-01
It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
2015-01-01
Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779
Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis
2012-01-01
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
Disentangling the multigenic and pleiotropic nature of molecular function
2015-01-01
Background Biological processes at the molecular level are usually represented by molecular interaction networks. Function is organised and modularity identified based on network topology, however, this approach often fails to account for the dynamic and multifunctional nature of molecular components. For example, a molecule engaging in spatially or temporally independent functions may be inappropriately clustered into a single functional module. To capture biologically meaningful sets of interacting molecules, we use experimentally defined pathways as spatial/temporal units of molecular activity. Results We defined functional profiles of Saccharomyces cerevisiae based on a minimal set of Gene Ontology terms sufficient to represent each pathway's genes. The Gene Ontology terms were used to annotate 271 pathways, accounting for pathway multi-functionality and gene pleiotropy. Pathways were then arranged into a network, linked by shared functionality. Of the genes in our data set, 44% appeared in multiple pathways performing a diverse set of functions. Linking pathways by overlapping functionality revealed a modular network with energy metabolism forming a sparse centre, surrounded by several denser clusters comprised of regulatory and metabolic pathways. Signalling pathways formed a relatively discrete cluster connected to the centre of the network. Genetic interactions were enriched within the clusters of pathways by a factor of 5.5, confirming the organisation of our pathway network is biologically significant. Conclusions Our representation of molecular function according to pathway relationships enables analysis of gene/protein activity in the context of specific functional roles, as an alternative to typical molecule-centric graph-based methods. The pathway network demonstrates the cooperation of multiple pathways to perform biological processes and organises pathways into functionally related clusters with interdependent outcomes. PMID:26678917
htsint: a Python library for sequencing pipelines that combines data through gene set generation.
Richards, Adam J; Herrel, Anthony; Bonneaud, Camille
2015-09-24
Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
Carlson, Kimberly A.; Gardner, Kylee; Pashaj, Anjeza; Carlson, Darby J.; Yu, Fang; Eudy, James D.; Zhang, Chi; Harshman, Lawrence G.
2015-01-01
Aging is a complex process characterized by a steady decline in an organism's ability to perform life-sustaining tasks. In the present study, two cages of approximately 12,000 mated Drosophila melanogaster females were used as a source of RNA from individuals sampled frequently as a function of age. A linear model for microarray data method was used for the microarray analysis to adjust for the box effect; it identified 1,581 candidate aging genes. Cluster analyses using a self-organizing map algorithm on the 1,581 significant genes identified gene expression patterns across different ages. Genes involved in immune system function and regulation, chorion assembly and function, and metabolism were all significantly differentially expressed as a function of age. The temporal pattern of data indicated that gene expression related to aging is affected relatively early in life span. In addition, the temporal variance in gene expression in immune function genes was compared to a random set of genes. There was an increase in the variance of gene expression within each cohort, which was not observed in the set of random genes. This observation is compatible with the hypothesis that D. melanogaster immune function genes lose control of gene expression as flies age. PMID:26090231
2009-01-01
Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126
Estimation of gene induction enables a relevance-based ranking of gene sets.
Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens
2009-07-01
In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.
Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray
2004-01-01
One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Biological interpretation of genome-wide association studies using predicted gene functions.
Pers, Tune H; Karjalainen, Juha M; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R; Yang, Jian; Lui, Julian C; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S N; Hirschhorn, Joel N; Franke, Lude
2015-01-19
The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes.
Freed, Nikki E; Bumann, Dirk; Silander, Olin K
2016-09-06
Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 481 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 revealed that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, implying that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.
Pathway Distiller - multisource biological pathway consolidation
2012-01-01
Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636
Pathway Distiller - multisource biological pathway consolidation.
Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong
2012-01-01
One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
Uddin, Raihan; Singh, Shiva M.
2017-01-01
As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in “learning and memory” related functions and pathways. Subsequent differential network analysis of this “learning and memory” module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they provide a new insight and generate new hypotheses into the molecular mechanisms responsible for age associated learning impairment, including spatial learning. PMID:29066959
Uddin, Raihan; Singh, Shiva M
2017-01-01
As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in "learning and memory" related functions and pathways. Subsequent differential network analysis of this "learning and memory" module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they provide a new insight and generate new hypotheses into the molecular mechanisms responsible for age associated learning impairment, including spatial learning.
Towards an informative mutant phenotype for every bacterial gene
Deutschbauer, Adam; Price, Morgan N.; Wetmore, Kelly M.; ...
2014-08-11
Mutant phenotypes provide strong clues to the functions of the underlying genes and could allow annotation of the millions of sequenced yet uncharacterized bacterial genes. However, it is not known how many genes have a phenotype under laboratory conditions, how many phenotypes are biologically interpretable for predicting gene function, and what experimental conditions are optimal to maximize the number of genes with a phenotype. To address these issues, we measured the mutant fitness of 1,586 genes of the ethanol-producing bacterium Zymomonas mobilis ZM4 across 492 diverse experiments and found statistically significant phenotypes for 89% of all assayed genes. Thus, inmore » Z. mobilis, most genes have a functional consequence under laboratory conditions. We demonstrate that 41% of Z. mobilis genes have both a strong phenotype and a similar fitness pattern (cofitness) to another gene, and are therefore good candidates for functional annotation using mutant fitness. Among 502 poorly characterized Z. mobilis genes, we identified a significant cofitness relationship for 174. For 57 of these genes without a specific functional annotation, we found additional evidence to support the biological significance of these gene-gene associations, and in 33 instances, we were able to predict specific physiological or biochemical roles for the poorly characterized genes. Last, we identified a set of 79 diverse mutant fitness experiments in Z. mobilis that are nearly as biologically informative as the entire set of 492 experiments. Therefore, our work provides a blueprint for the functional annotation of diverse bacteria using mutant fitness.« less
Biological interpretation of genome-wide association studies using predicted gene functions
Pers, Tune H.; Karjalainen, Juha M.; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R.; Yang, Jian; Lui, Julian C.; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K.; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S.N.; Hirschhorn, Joel N.; Franke, Lude
2015-01-01
The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes. PMID:25597830
Robustness, evolvability, and the logic of genetic regulation.
Payne, Joshua L; Moore, Jason H; Wagner, Andreas
2014-01-01
In gene regulatory circuits, the expression of individual genes is commonly modulated by a set of regulating gene products, which bind to a gene's cis-regulatory region. This region encodes an input-output function, referred to as signal-integration logic, that maps a specific combination of regulatory signals (inputs) to a particular expression state (output) of a gene. The space of all possible signal-integration functions is vast and the mapping from input to output is many-to-one: For the same set of inputs, many functions (genotypes) yield the same expression output (phenotype). Here, we exhaustively enumerate the set of signal-integration functions that yield identical gene expression patterns within a computational model of gene regulatory circuits. Our goal is to characterize the relationship between robustness and evolvability in the signal-integration space of regulatory circuits, and to understand how these properties vary between the genotypic and phenotypic scales. Among other results, we find that the distributions of genotypic robustness are skewed, so that the majority of signal-integration functions are robust to perturbation. We show that the connected set of genotypes that make up a given phenotype are constrained to specific regions of the space of all possible signal-integration functions, but that as the distance between genotypes increases, so does their capacity for unique innovations. In addition, we find that robust phenotypes are (i) evolvable, (ii) easily identified by random mutation, and (iii) mutationally biased toward other robust phenotypes. We explore the implications of these latter observations for mutation-based evolution by conducting random walks between randomly chosen source and target phenotypes. We demonstrate that the time required to identify the target phenotype is independent of the properties of the source phenotype.
Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong
2015-01-01
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.
Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong
2015-01-01
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deutschbauer, Adam; Price, Morgan N.; Wetmore, Kelly M.
Mutant phenotypes provide strong clues to the functions of the underlying genes and could allow annotation of the millions of sequenced yet uncharacterized bacterial genes. However, it is not known how many genes have a phenotype under laboratory conditions, how many phenotypes are biologically interpretable for predicting gene function, and what experimental conditions are optimal to maximize the number of genes with a phenotype. To address these issues, we measured the mutant fitness of 1,586 genes of the ethanol-producing bacterium Zymomonas mobilis ZM4 across 492 diverse experiments and found statistically significant phenotypes for 89% of all assayed genes. Thus, inmore » Z. mobilis, most genes have a functional consequence under laboratory conditions. We demonstrate that 41% of Z. mobilis genes have both a strong phenotype and a similar fitness pattern (cofitness) to another gene, and are therefore good candidates for functional annotation using mutant fitness. Among 502 poorly characterized Z. mobilis genes, we identified a significant cofitness relationship for 174. For 57 of these genes without a specific functional annotation, we found additional evidence to support the biological significance of these gene-gene associations, and in 33 instances, we were able to predict specific physiological or biochemical roles for the poorly characterized genes. Last, we identified a set of 79 diverse mutant fitness experiments in Z. mobilis that are nearly as biologically informative as the entire set of 492 experiments. Therefore, our work provides a blueprint for the functional annotation of diverse bacteria using mutant fitness.« less
Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren
2018-01-01
Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.
Gundersen, Gregory W; Jones, Matthew R; Rouillard, Andrew D; Kou, Yan; Monteiro, Caroline D; Feldmann, Axel S; Hu, Kevin S; Ma'ayan, Avi
2015-09-15
Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/. avi.maayan@mssm.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Multiconstrained gene clustering based on generalized projections
2010-01-01
Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386
Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt
2014-06-01
In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.
Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne
2004-06-01
One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."
2009-01-01
Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443
Robustness, Evolvability, and the Logic of Genetic Regulation
Moore, Jason H.; Wagner, Andreas
2014-01-01
In gene regulatory circuits, the expression of individual genes is commonly modulated by a set of regulating gene products, which bind to a gene’s cis-regulatory region. This region encodes an input-output function, referred to as signal-integration logic, that maps a specific combination of regulatory signals (inputs) to a particular expression state (output) of a gene. The space of all possible signal-integration functions is vast and the mapping from input to output is many-to-one: for the same set of inputs, many functions (genotypes) yield the same expression output (phenotype). Here, we exhaustively enumerate the set of signal-integration functions that yield idential gene expression patterns within a computational model of gene regulatory circuits. Our goal is to characterize the relationship between robustness and evolvability in the signal-integration space of regulatory circuits, and to understand how these properties vary between the genotypic and phenotypic scales. Among other results, we find that the distributions of genotypic robustness are skewed, such that the majority of signal-integration functions are robust to perturbation. We show that the connected set of genotypes that make up a given phenotype are constrained to specific regions of the space of all possible signal-integration functions, but that as the distance between genotypes increases, so does their capacity for unique innovations. In addition, we find that robust phenotypes are (i) evolvable, (ii) easily identified by random mutation, and (iii) mutationally biased toward other robust phenotypes. We explore the implications of these latter observations for mutation-based evolution by conducting random walks between randomly chosen source and target phenotypes. We demonstrate that the time required to identify the target phenotype is independent of the properties of the source phenotype. PMID:23373974
GSCALite: A Web Server for Gene Set Cancer Analysis.
Liu, Chun-Jie; Hu, Fei-Fei; Xia, Mengxuan; Han, Leng; Zhang, Qiong; Guo, An-Yuan
2018-05-22
The availability of cancer genomic data makes it possible to analyze genes related to cancer. Cancer is usually the result of a set of genes and the signal of a single gene could be covered by background noise. Here, we present a web server named Gene Set Cancer Analysis (GSCALite) to analyze a set of genes in cancers with the following functional modules. (i) Differential expression in tumor vs normal, and the survival analysis; (ii) Genomic variations and their survival analysis; (iii) Gene expression associated cancer pathway activity; (iv) miRNA regulatory network for genes; (v) Drug sensitivity for genes; (vi) Normal tissue expression and eQTL for genes. GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers. GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/. guoay@hust.edu.cn or zhangqiong@hust.edu.cn. Supplementary data are available at Bioinformatics online.
Extensive complementarity between gene function prediction methods.
Vidulin, Vedrana; Šmuc, Tomislav; Supek, Fran
2016-12-01
The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them. The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Turning publicly available gene expression data into discoveries using gene set context analysis.
Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai
2016-01-08
Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Assessment of the reliability of protein-protein interactions and protein function prediction.
Deng, Minghua; Sun, Fengzhu; Chen, Ting
2003-01-01
As more and more high-throughput protein-protein interaction data are collected, the task of estimating the reliability of different data sets becomes increasingly important. In this paper, we present our study of two groups of protein-protein interaction data, the physical interaction data and the protein complex data, and estimate the reliability of these data sets using three different measurements: (1) the distribution of gene expression correlation coefficients, (2) the reliability based on gene expression correlation coefficients, and (3) the accuracy of protein function predictions. We develop a maximum likelihood method to estimate the reliability of protein interaction data sets according to the distribution of correlation coefficients of gene expression profiles of putative interacting protein pairs. The results of the three measurements are consistent with each other. The MIPS protein complex data have the highest mean gene expression correlation coefficients (0.256) and the highest accuracy in predicting protein functions (70% sensitivity and specificity), while Ito's Yeast two-hybrid data have the lowest mean (0.041) and the lowest accuracy (15% sensitivity and specificity). Uetz's data are more reliable than Ito's data in all three measurements, and the TAP protein complex data are more reliable than the HMS-PCI data in all three measurements as well. The complex data sets generally perform better in function predictions than do the physical interaction data sets. Proteins in complexes are shown to be more highly correlated in gene expression. The results confirm that the components of a protein complex can be assigned to functions that the complex carries out within a cell. There are three interaction data sets different from the above two groups: the genetic interaction data, the in-silico data and the syn-express data. Their capability of predicting protein functions generally falls between that of the Y2H data and that of the MIPS protein complex data. The supplementary information is available at the following Web site: http://www-hto.usc.edu/-msms/AssessInteraction/.
Guidelines for the functional annotation of microRNAs using the Gene Ontology
D'Eustachio, Peter; Smith, Jennifer R.; Zampetaki, Anna
2016-01-01
MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual). PMID:26917558
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.
Subhash, Santhilal; Kanduri, Chandrasekhar
2016-09-13
High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.
Dwivedi, Bhakti; Kowalski, Jeanne
2018-01-01
While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.
Dwivedi, Bhakti
2018-01-01
While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010
Gene function in early mouse embryonic stem cell differentiation
Sene, Kagnew Hailesellasse; Porter, Christopher J; Palidwor, Gareth; Perez-Iratxeta, Carolina; Muro, Enrique M; Campbell, Pearl A; Rudnicki, Michael A; Andrade-Navarro, Miguel A
2007-01-01
Background Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. Results We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. Conclusion Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database. PMID:17394647
Discovery of error-tolerant biclusters from noisy gene expression data.
Gupta, Rohit; Rao, Navneet; Kumar, Vipin
2011-11-24
An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.
Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian
2017-01-01
Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.
Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian
2017-01-01
Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes. PMID:28729848
Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.
Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou
2016-01-01
For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.
A Guide to the PLAZA 3.0 Plant Comparative Genomic Database.
Vandepoele, Klaas
2017-01-01
PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .
Differential Effect of Active Smoking on Gene Expression in Male and Female Smokers
Paul, Sunirmal; Amundson, Sally A
2015-01-01
Smoking is the second leading cause of preventable death in the United States. Cohort epidemiological studies have demonstrated that women are more vulnerable to cigarette-smoking induced diseases than their male counterparts, however, the molecular basis of these differences has remained unknown. In this study, we explored if there were differences in the gene expression patterns between male and female smokers, and how these patterns might reflect different sex-specific responses to the stress of smoking. Using whole genome microarray gene expression profiling, we found that a substantial number of oxidant related genes were expressed in both male and female smokers, however, smoking-responsive genes did indeed differ greatly between male and female smokers. Gene set enrichment analysis (GSEA) against reference oncogenic signature gene sets identified a large number of oncogenic pathway gene-sets that were significantly altered in female smokers compared to male smokers. In addition, functional annotation with Ingenuity Pathway Analysis (IPA) identified smoking-correlated genes associated with biological functions in male and female smokers that are directly relevant to well-known smoking related pathologies. However, these relevant biological functions were strikingly overrepresented in female smokers compared to male smokers. IPA network analysis with the functional categories of immune and inflammatory response gene products suggested potential interactions between smoking response and female hormones. Our results demonstrate a striking dichotomy between male and female gene expression responses to smoking. This is the first genome-wide expression study to compare the sex-specific impacts of smoking at a molecular level and suggests a novel potential connection between sex hormone signaling and smoking-induced diseases in female smokers. PMID:25621181
Microgravity and Immunity: Changes in Lymphocyte Gene Expression
NASA Technical Reports Server (NTRS)
Risin, D.; Pellis, N. R.; Ward, N. E.; Risin, S. A.
2006-01-01
Earlier studies had shown that modeled and true microgravity (MG) cause multiple direct effects on human lymphocytes. MG inhibits lymphocyte locomotion, suppresses polyclonal and antigen-specific activation, affects signal transduction mechanisms, as well as activation-induced apoptosis. In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes (MGSG) in general and specifically those genes that might be responsible for the functional and structural changes observed earlier. Two sets of experiments targeting different goals were conducted. In the first set, T-lymphocytes from normal donors were activated with antiCD3 and IL2 and then cultured in 1g (static) and modeled MG (MMG) conditions (Rotating Wall Vessel bioreactor) for 24 hours. This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity. In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus (PHA) thus triggering the apoptotic pathway. Total RNA was extracted using the RNeasy isolation kit (Qiagen, Valencia, CA). Affymetrix Gene Chips (U133A), allowing testing for 18,400 human genes, were used for microarray analysis. In the first set of experiments MMG exposure resulted in altered expression of 89 genes, 10 of them were up-regulated and 79 down-regulated. In the second set, changes in expression were revealed in 85 genes, 20 were up-regulated and 65 were down-regulated. The analysis revealed that significant numbers of MGS genes are associated with signal transduction and apoptotic pathways. Interestingly, the majority of genes that responded by up- or down-regulation in the alternative sets of experiments were not the same, possibly reflecting different functional states of the examined T-lymphocyte populations. The responder genes (MGSG) might play an essential role in adaptation to MG and/or be responsible for pathologic changes encountered in Space and thus represent potential targets for molecular-based countermeasures
Kringel, Dario; Lippmann, Catharina; Parnham, Michael J; Kalso, Eija; Ultsch, Alfred; Lötsch, Jörn
2018-06-19
Human genetic research has implicated functional variants of more than one hundred genes in the modulation of persisting pain. Artificial intelligence and machine learning techniques may combine this knowledge with results of genetic research gathered in any context, which permits the identification of the key biological processes involved in chronic sensitization to pain. Based on published evidence, a set of 110 genes carrying variants reported to be associated with modulation of the clinical phenotype of persisting pain in eight different clinical settings was submitted to unsupervised machine-learning aimed at functional clustering. Subsequently, a mathematically supported subset of genes, comprising those most consistently involved in persisting pain, was analyzed by means of computational functional genomics in the Gene Ontology knowledgebase. Clustering of genes with evidence for a modulation of persisting pain elucidated a functionally heterogeneous set. The situation cleared when the focus was narrowed to a genetic modulation consistently observed throughout several clinical settings. On this basis, two groups of biological processes, the immune system and nitric oxide signaling, emerged as major players in sensitization to persisting pain, which is biologically highly plausible and in agreement with other lines of pain research. The present computational functional genomics-based approach provided a computational systems-biology perspective on chronic sensitization to pain. Human genetic control of persisting pain points to the immune system as a source of potential future targets for drugs directed against persisting pain. Contemporary machine-learned methods provide innovative approaches to knowledge discovery from previous evidence. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
shRNA-Induced Gene Knockdown In Vivo to Investigate Neutrophil Function.
Basit, Abdul; Tang, Wenwen; Wu, Dianqing
2016-01-01
To silence genes in neutrophils efficiently, we exploited the RNA interference and developed an shRNA-based gene knockdown technique. This method involves transfection of mouse bone marrow-derived hematopoietic stem cells with retroviral vector carrying shRNA directed at a specific gene. Transfected stem cells are then transplanted into irradiated wild-type mice. After engraftment of stem cells, the transplanted mice have two sets of circulating neutrophils. One set has a gene of interest knocked down while the other set has full complement of expressed genes. This efficient technique provides a unique way to directly compare the response of neutrophils with a knocked-down gene to that of neutrophils with the full complement of expressed genes in the same environment.
A human functional protein interaction network and its application to cancer data analysis
2010-01-01
Background One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system. Results We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers. Conclusions We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases. PMID:20482850
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.
Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas
2017-01-21
We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
Lubin, Johnathan W; Tucey, Timothy M; Lundblad, Victoria
2018-01-01
A leading objective in biology is to identify the complete set of activities that each gene performs in vivo In this study, we have asked whether a genetic approach can provide an efficient means of achieving this goal, through the identification and analysis of a comprehensive set of separation-of-function ( sof - ) mutations in a gene. Toward this goal, we have subjected the Saccharomyces cerevisiae EST1 gene, which encodes a regulatory subunit of telomerase, to intensive mutagenesis (with an average coverage of one mutation for every 4.5 residues), using strategies that eliminated those mutations that disrupted protein folding/stability. The resulting set of sof - mutations defined four biochemically distinct activities for the Est1 telomerase protein: two temporally separable steps in telomerase holoenzyme assembly, a telomerase recruitment activity, and a fourth newly discovered regulatory function. Although biochemically distinct, impairment of each of these four different activities nevertheless conferred a common phenotype (critically short telomeres) comparable to that of an est1 -∆ null strain. This highlights the limitations of gene deletions, even for nonessential genes; we suggest that employing a representative set of sof - mutations for each gene in future high- and low-throughput investigations will provide deeper insights into how proteins interact inside the cell. Copyright © 2018 by the Genetics Society of America.
Learning contextual gene set interaction networks of cancer with condition specificity
2013-01-01
Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further investigations. Conclusions The analysis on the contextual gene sets and characterization of networks of interaction composed of these sets discovered distinct functional differences underlying various types of cancer. The results show that our method successfully reveals many subtype-specific regions in the identified maps of biological contexts, which well represent biological functions that can be connected to specific subtypes. PMID:23418942
Marshall, Elaine; Lowrey, Jacqueline; MacPherson, Sheila; Maybin, Jacqueline A.; Collins, Frances; Critchley, Hilary O. D.
2011-01-01
Context: The endometrium is a multicellular, steroid-responsive tissue that undergoes dynamic remodeling every menstrual cycle in preparation for implantation and, in absence of pregnancy, menstruation. Androgen receptors are present in the endometrium. Objective: The objective of the study was to investigate the impact of androgens on human endometrial stromal cells (hESC). Design: Bioinformatics was used to identify an androgen-regulated gene set and processes associated with their function. Regulation of target genes and impact of androgens on cell function were validated using primary hESC. Setting: The study was conducted at the University Research Institute. Patients: Endometrium was collected from women with regular menses; tissues were used for recovery of cells, total mRNA, or protein and for immunohistochemistry. Results: A new endometrial androgen target gene set (n = 15) was identified. Bioinformatics revealed 12 of these genes interacted in one pathway and identified an association with control of cell survival. Dynamic androgen-dependent changes in expression of the gene set were detected in hESC with nine significantly down-regulated at 2 and/or 8 h. Treatment of hESC with dihydrotestosterone reduced staurosporine-induced apoptosis and cell migration/proliferation. Conclusions: Rigorous in silico analysis resulted in identification of a group of androgen-regulated genes expressed in human endometrium. Pathway analysis and functional assays suggest androgen-dependent changes in gene expression may have a significant impact on stromal cell proliferation, migration, and survival. These data provide the platform for further studies on the role of circulatory or local androgens in the regulation of endometrial function and identify androgens as candidates in the pathogenesis of common endometrial disorders including polycystic ovarian syndrome, cancer, and endometriosis. PMID:21865353
Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J
2007-09-21
Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.
Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H
2015-12-01
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
Maere, Steven; Heymans, Karel; Kuiper, Martin
2005-08-15
The Biological Networks Gene Ontology tool (BiNGO) is an open-source Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks visualized in Cytoscape. BiNGO maps the predominant functional themes of the tested gene set on the GO hierarchy, and takes advantage of Cytoscape's versatile visualization environment to produce an intuitive and customizable visual representation of the results.
Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J
2006-01-01
One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.
Li, Jun; Riehle, Michelle M; Zhang, Yan; Xu, Jiannong; Oduol, Frederick; Gomez, Shawn M; Eiglmeier, Karin; Ueberheide, Beatrix M; Shabanowitz, Jeffrey; Hunt, Donald F; Ribeiro, José MC; Vernick, Kenneth D
2006-01-01
Background Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector. Results We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download. Conclusion Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms. PMID:16569258
Properties of genes essential for mouse development
Kabir, Mitra; Barradas, Ana; Tzotzos, George T.; Hentges, Kathryn E.
2017-01-01
Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development. PMID:28562614
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2011-01-01
Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235
Wu, Xiao-nan; Shi, Tao-tao; He, Yao-hui; Wang, Fei-fei; Sang, Rui; Ding, Jian-cheng; Zhang, Wen-juan; Shu, Xing-yi; Shen, Hai-feng; Yi, Jia; Gao, Xiang; Liu, Wen
2017-01-01
Yin Yang 1 (YY1) is a multifunctional DNA-binding transcription factor shown to be critical in a variety of biological processes, and its activity and function have been shown to be regulated by multitude of mechanisms, which include but are not limited to post-translational modifications (PTMs), its associated proteins and cellular localization. YY2, the paralog of YY1 in mouse and human, has been proposed to function redundantly or oppositely in a context-specific manner compared with YY1. Despite its functional importance, how YY2’s DNA-binding activity and function are regulated, particularly by PTMs, remains completely unknown. Here we report the first PTM with functional characterization on YY2, namely lysine 247 monomethylation (K247me1), which was found to be dynamically regulated by SET7/9 and LSD1 both in vitro and in cultured cells. Functional study revealed that SET7/9-mediated YY2 methylation regulated its DNA-binding activity in vitro and in association with chromatin examined by chromatin immunoprecipitation coupled with sequencing (ChIP-seq) in cultured cells. Knockout of YY2, SET7/9 or LSD1 by CRISPR (clustered, regularly interspaced, short palindromic repeats)/Cas9-mediated gene editing followed by RNA sequencing (RNA-seq) revealed that a subset of genes was positively regulated by YY2 and SET7/9, but negatively regulated by LSD1, which were enriched with genes involved in cell proliferation regulation. Importantly, YY2-regulated gene transcription, cell proliferation and tumor growth were dependent, at least partially, on YY2 K247 methylation. Finally, somatic mutations on YY2 found in cancer, which are in close proximity to K247, altered its methylation, DNA-binding activity and gene transcription it controls. Our findings revealed the first PTM with functional implications imposed on YY2 protein, and linked YY2 methylation with its biological functions. PMID:29098080
Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter
2014-01-01
Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS-box genes for the development of gymnosperms. This study is the first that provides a comprehensive overview of MADS-box genes in conifers and thus will provide a framework for future work on MADS-box genes in seed plants. PMID:24854168
Methodology for the inference of gene function from phenotype data.
Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A
2014-12-12
Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.
Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J
2009-01-01
Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less
Ficklin, Stephen P; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Ficklin, Stephen P.; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance. PMID:23874666
Radiation Quality Effects on Transcriptome Profiles in 3-d Cultures After Particle Irradiation
NASA Technical Reports Server (NTRS)
Patel, Z. S.; Kidane, Y. H.; Huff, J. L.
2014-01-01
In this work, we evaluate the differential effects of low- and high-LET radiation on 3-D organotypic cultures in order to investigate radiation quality impacts on gene expression and cellular responses. Reducing uncertainties in current risk models requires new knowledge on the fundamental differences in biological responses (the so-called radiation quality effects) triggered by heavy ion particle radiation versus low-LET radiation associated with Earth-based exposures. We are utilizing novel 3-D organotypic human tissue models that provide a format for study of human cells within a realistic tissue framework, thereby bridging the gap between 2-D monolayer culture and animal models for risk extrapolation to humans. To identify biological pathway signatures unique to heavy ion particle exposure, functional gene set enrichment analysis (GSEA) was used with whole transcriptome profiling. GSEA has been used extensively as a method to garner biological information in a variety of model systems but has not been commonly used to analyze radiation effects. It is a powerful approach for assessing the functional significance of radiation quality-dependent changes from datasets where the changes are subtle but broad, and where single gene based analysis using rankings of fold-change may not reveal important biological information. We identified 45 statistically significant gene sets at 0.05 q-value cutoff, including 14 gene sets common to gamma and titanium irradiation, 19 gene sets specific to gamma irradiation, and 12 titanium-specific gene sets. Common gene sets largely align with DNA damage, cell cycle, early immune response, and inflammatory cytokine pathway activation. The top gene set enriched for the gamma- and titanium-irradiated samples involved KRAS pathway activation and genes activated in TNF-treated cells, respectively. Another difference noted for the high-LET samples was an apparent enrichment in gene sets involved in cycle cycle/mitotic control. It is plausible that the enrichment in these particular pathways results from the complex DNA damage resulting from high-LET exposure where repair processes are not completed during the same time scale as the less complex damage resulting from low-LET radiation.
Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka)
2011-01-01
Background There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. Results We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16K microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. Conclusions This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes. PMID:22136247
Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka).
Pavey, Scott A; Sutherland, Ben J G; Leong, Jong; Robb, Adrienne; von Schalburg, Kris; Hamon, Troy R; Koop, Ben F; Nielsen, Jennifer L
2011-12-02
There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16 k microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes.
Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean
2006-01-01
Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810
Goedbloed, D J; Czypionka, T; Altmüller, J; Rodriguez, A; Küpfer, E; Segev, O; Blaustein, L; Templeton, A R; Nolte, A W; Steinfartz, S
2017-12-01
The utilization of similar habitats by different species provides an ideal opportunity to identify genes underlying adaptation and acclimatization. Here, we analysed the gene expression of two closely related salamander species: Salamandra salamandra in Central Europe and Salamandra infraimmaculata in the Near East. These species inhabit similar habitat types: 'temporary ponds' and 'permanent streams' during larval development. We developed two species-specific gene expression microarrays, each targeting over 12 000 transcripts, including an overlapping subset of 8331 orthologues. Gene expression was examined for systematic differences between temporary ponds and permanent streams in larvae from both salamander species to establish gene sets and functions associated with these two habitat types. Only 20 orthologues were associated with a habitat in both species, but these orthologues did not show parallel expression patterns across species more than expected by chance. Functional annotation of a set of 106 genes with the highest effect size for a habitat suggested four putative gene function categories associated with a habitat in both species: cell proliferation, neural development, oxygen responses and muscle capacity. Among these high effect size genes was a single orthologue (14-3-3 protein zeta/YWHAZ) that was downregulated in temporary ponds in both species. The emergence of four gene function categories combined with a lack of parallel expression of orthologues (except 14-3-3 protein zeta) suggests that parallel habitat adaptation or acclimatization by larvae from S. salamandra and S. infraimmaculata to temporary ponds and permanent streams is mainly realized by different genes with a converging functionality.
The structure of a gene co-expression network reveals biological functions underlying eQTLs.
Villa-Vialaneix, Nathalie; Liaubet, Laurence; Laurent, Thibault; Cherel, Pierre; Gamot, Adrien; SanCristobal, Magali
2013-01-01
What are the commonalities between genes, whose expression level is partially controlled by eQTL, especially with regard to biological functions? Moreover, how are these genes related to a phenotype of interest? These issues are particularly difficult to address when the genome annotation is incomplete, as is the case for mammalian species. Moreover, the direct link between gene expression and a phenotype of interest may be weak, and thus difficult to handle. In this framework, the use of a co-expression network has proven useful: it is a robust approach for modeling a complex system of genetic regulations, and to infer knowledge for yet unknown genes. In this article, a case study was conducted with a mammalian species. It showed that the use of a co-expression network based on partial correlation, combined with a relevant clustering of nodes, leads to an enrichment of biological functions of around 83%. Moreover, the use of a spatial statistics approach allowed us to superimpose additional information related to a phenotype; this lead to highlighting specific genes or gene clusters that are related to the network structure and the phenotype. Three main results are worth noting: first, key genes were highlighted as a potential focus for forthcoming biological experiments; second, a set of biological functions, which support a list of genes under partial eQTL control, was set up by an overview of the global structure of the gene expression network; third, pH was found correlated with gene clusters, and then with related biological functions, as a result of a spatial analysis of the network topology.
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
Constraint and Contingency in Multifunctional Gene Regulatory Circuits
Payne, Joshua L.; Wagner, Andreas
2013-01-01
Gene regulatory circuits drive the development, physiology, and behavior of organisms from bacteria to humans. The phenotypes or functions of such circuits are embodied in the gene expression patterns they form. Regulatory circuits are typically multifunctional, forming distinct gene expression patterns in different embryonic stages, tissues, or physiological states. Any one circuit with a single function can be realized by many different regulatory genotypes. Multifunctionality presumably constrains this number, but we do not know to what extent. We here exhaustively characterize a genotype space harboring millions of model regulatory circuits and all their possible functions. As a circuit's number of functions increases, the number of genotypes with a given number of functions decreases exponentially but can remain very large for a modest number of functions. However, the sets of circuits that can form any one set of functions becomes increasingly fragmented. As a result, historical contingency becomes widespread in circuits with many functions. Whether a circuit can acquire an additional function in the course of its evolution becomes increasingly dependent on the function it already has. Circuits with many functions also become increasingly brittle and sensitive to mutation. These observations are generic properties of a broad class of circuits and independent of any one circuit genotype or phenotype. PMID:23762020
Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology
2010-01-01
Background In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships. Results The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches. Conclusions The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms. PMID:21172053
Almeida, Luciana O; Neto, Marinaldo P C; Sousa, Lucas O; Tannous, Maryna A; Curti, Carlos; Leopoldino, Andreia M
2017-04-18
Epigenetic modifications are essential in the control of normal cellular processes and cancer development. DNA methylation and histone acetylation are major epigenetic modifications involved in gene transcription and abnormal events driving the oncogenic process. SET protein accumulates in many cancer types, including head and neck squamous cell carcinoma (HNSCC); SET is a member of the INHAT complex that inhibits gene transcription associating with histones and preventing their acetylation. We explored how SET protein accumulation impacts on the regulation of gene expression, focusing on DNA methylation and histone acetylation. DNA methylation profile of 24 tumour suppressors evidenced that SET accumulation decreased DNA methylation in association with loss of 5-methylcytidine, formation of 5-hydroxymethylcytosine and increased TET1 levels, indicating an active DNA demethylation mechanism. However, the expression of some suppressor genes was lowered in cells with high SET levels, suggesting that loss of methylation is not the main mechanism modulating gene expression. SET accumulation also downregulated the expression of 32 genes of a panel of 84 transcription factors, and SET directly interacted with chromatin at the promoter of the downregulated genes, decreasing histone acetylation. Gene expression analysis after cell treatment with 5-aza-2'-deoxycytidine (5-AZA) and Trichostatin A (TSA) revealed that histone acetylation reversed transcription repression promoted by SET. These results suggest a new function for SET in the regulation of chromatin dynamics. In addition, TSA diminished both SET protein levels and SET capability to bind to gene promoter, suggesting that administration of epigenetic modifier agents could be efficient to reverse SET phenotype in cancer.
Pazos Obregón, Flavio; Papalardo, Cecilia; Castro, Sebastián; Guerberoff, Gustavo; Cantera, Rafael
2015-09-15
Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Although roughly a thousand genes are expected to be important for this function in Drosophila melanogaster, just a few hundreds of them are known so far. In this work we trained three learning algorithms to predict a "synaptic function" for genes of Drosophila using data from a whole-body developmental transcriptome published by others. Using statistical and biological criteria to analyze and combine the predictions, we obtained a gene catalogue that is highly enriched in genes of relevance for Drosophila synapse assembly and function but still not recognized as such. The utility of our approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.
A global interaction network maps a wiring diagram of cellular function
Costanzo, Michael; VanderSluis, Benjamin; Koch, Elizabeth N.; Baryshnikova, Anastasia; Pons, Carles; Tan, Guihong; Wang, Wen; Usaj, Matej; Hanchard, Julia; Lee, Susan D.; Pelechano, Vicent; Styles, Erin B.; Billmann, Maximilian; van Leeuwen, Jolanda; van Dyk, Nydia; Lin, Zhen-Yuan; Kuzmin, Elena; Nelson, Justin; Piotrowski, Jeff S.; Srikumar, Tharan; Bahr, Sondra; Chen, Yiqun; Deshpande, Raamesh; Kurat, Christoph F.; Li, Sheena C.; Li, Zhijian; Usaj, Mojca Mattiazzi; Okada, Hiroki; Pascoe, Natasha; Luis, Bryan-Joseph San; Sharifpoor, Sara; Shuteriqi, Emira; Simpkins, Scott W.; Snider, Jamie; Suresh, Harsha Garadi; Tan, Yizhao; Zhu, Hongwei; Malod-Dognin, Noel; Janjic, Vuk; Przulj, Natasa; Troyanskaya, Olga G.; Stagljar, Igor; Xia, Tian; Ohya, Yoshikazu; Gingras, Anne-Claude; Raught, Brian; Boutros, Michael; Steinmetz, Lars M.; Moore, Claire L.; Rosebrock, Adam P.; Caudy, Amy A.; Myers, Chad L.; Andrews, Brenda; Boone, Charles
2017-01-01
We generated a global genetic interaction network for Saccharomyces cerevisiae, constructing over 23 million double mutants, identifying ~550,000 negative and ~350,000 positive genetic interactions. This comprehensive network maps genetic interactions for essential gene pairs, highlighting essential genes as densely connected hubs. Genetic interaction profiles enabled assembly of a hierarchical model of cell function, including modules corresponding to protein complexes and pathways, biological processes, and cellular compartments. Negative interactions connected functionally related genes, mapped core bioprocesses, and identified pleiotropic genes, whereas positive interactions often mapped general regulatory connections among gene pairs, rather than shared functionality. The global network illustrates how coherent sets of genetic interactions connect protein complex and pathway modules to map a functional wiring diagram of the cell. PMID:27708008
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Improving information retrieval in functional analysis.
Rodriguez, Juan C; González, Germán A; Fresno, Cristóbal; Llera, Andrea S; Fernández, Elmer A
2016-12-01
Transcriptome analysis is essential to understand the mechanisms regulating key biological processes and functions. The first step usually consists of identifying candidate genes; to find out which pathways are affected by those genes, however, functional analysis (FA) is mandatory. The most frequently used strategies for this purpose are Gene Set and Singular Enrichment Analysis (GSEA and SEA) over Gene Ontology. Several statistical methods have been developed and compared in terms of computational efficiency and/or statistical appropriateness. However, whether their results are similar or complementary, the sensitivity to parameter settings, or possible bias in the analyzed terms has not been addressed so far. Here, two GSEA and four SEA methods and their parameter combinations were evaluated in six datasets by comparing two breast cancer subtypes with well-known differences in genetic background and patient outcomes. We show that GSEA and SEA lead to different results depending on the chosen statistic, model and/or parameters. Both approaches provide complementary results from a biological perspective. Hence, an Integrative Functional Analysis (IFA) tool is proposed to improve information retrieval in FA. It provides a common gene expression analytic framework that grants a comprehensive and coherent analysis. Only a minimal user parameter setting is required, since the best SEA/GSEA alternatives are integrated. IFA utility was demonstrated by evaluating four prostate cancer and the TCGA breast cancer microarray datasets, which showed its biological generalization capabilities. Copyright © 2016 Elsevier Ltd. All rights reserved.
SET1A/COMPASS and shadow enhancers in the regulation of homeotic gene expression
Cao, Kaixiang; Collings, Clayton K.; Marshall, Stacy A.; Morgan, Marc A.; Rendleman, Emily J.; Wang, Lu; Sze, Christie C.; Sun, Tianjiao; Bartom, Elizabeth T.; Shilatifard, Ali
2017-01-01
The homeotic (Hox) genes are highly conserved in metazoans, where they are required for various processes in development, and misregulation of their expression is associated with human cancer. In the developing embryo, Hox genes are activated sequentially in time and space according to their genomic position within Hox gene clusters. Accumulating evidence implicates both enhancer elements and noncoding RNAs in controlling this spatiotemporal expression of Hox genes, but disentangling their relative contributions is challenging. Here, we identify two cis-regulatory elements (E1 and E2) functioning as shadow enhancers to regulate the early expression of the HoxA genes. Simultaneous deletion of these shadow enhancers in embryonic stem cells leads to impaired activation of HoxA genes upon differentiation, while knockdown of a long noncoding RNA overlapping E1 has no detectable effect on their expression. Although MLL/COMPASS (complex of proteins associated with Set1) family of histone methyltransferases is known to activate transcription of Hox genes in other contexts, we found that individual inactivation of the MLL1-4/COMPASS family members has little effect on early Hox gene activation. Instead, we demonstrate that SET1A/COMPASS is required for full transcriptional activation of multiple Hox genes but functions independently of the E1 and E2 cis-regulatory elements. Our results reveal multiple regulatory layers for Hox genes to fine-tune transcriptional programs essential for development. PMID:28487406
Hsiao, Tzu-Hung; Chiu, Yu-Chiao; Hsu, Pei-Yin; Lu, Tzu-Pin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Huang, Tim H.-M.; Chuang, Eric Y.; Chen, Yidong
2016-01-01
Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC. PMID:26972162
Combining multiple tools outperforms individual methods in gene set enrichment analyses.
Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E
2017-02-01
Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Larsen, P. E.; Trivedi, G.; Sreedasyam, A.
2010-07-06
Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less
Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets.
Park, Inho; Lee, Kwang H; Lee, Doheon
2010-06-15
Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. Supplementary data are available at Bioinformatics online.
The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets. Read the abstract
Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P
2008-01-01
Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951
Identification of functional elements and regulatory circuits by Drosophila modENCODE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.
2010-12-22
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D
2013-10-01
The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
ADGO: analysis of differentially expressed gene sets using composite GO annotation.
Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun
2006-09-15
Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.
Akiyama, Kenji; Kurotani, Atsushi; Iida, Kei; Kuromori, Takashi; Shinozaki, Kazuo; Sakurai, Tetsuya
2014-01-01
Arabidopsis thaliana is one of the most popular experimental plants. However, only 40% of its genes have at least one experimental Gene Ontology (GO) annotation assigned. Systematic observation of mutant phenotypes is an important technique for elucidating gene functions. Indeed, several large-scale phenotypic analyses have been performed and have generated phenotypic data sets from many Arabidopsis mutant lines and overexpressing lines, which are freely available online. Since each Arabidopsis mutant line database uses individual phenotype expression, the differences in the structured term sets used by each database make it difficult to compare data sets and make it impossible to search across databases. Therefore, we obtained publicly available information for a total of 66,209 Arabidopsis mutant lines, including loss-of-function (RATM and TARAPPER) and gain-of-function (AtFOX and OsFOX) lines, and integrated the phenotype data by mapping the descriptions onto Plant Ontology (PO) and Phenotypic Quality Ontology (PATO) terms. This approach made it possible to manage the four different phenotype databases as one large data set. Here, we report a publicly accessible web-based database, the RIKEN Arabidopsis Genome Encyclopedia II (RARGE II; http://rarge-v2.psc.riken.jp/), in which all of the data described in this study are included. Using the database, we demonstrated consistency (in terms of protein function) with a previous study and identified the presumed function of an unknown gene. We provide examples of AT1G21600, which is a subunit in the plastid-encoded RNA polymerase complex, and AT5G56980, which is related to the jasmonic acid signaling pathway.
ISAAC - InterSpecies Analysing Application using Containers.
Baier, Herbert; Schultz, Jörg
2014-01-15
Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.
Soneson, Charlotte; Fontes, Magnus
2012-01-01
Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.
Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter
2014-11-01
MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11-14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14-16 Type II MADS-box genes. The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS-box genes for the development of gymnosperms. This study is the first that provides a comprehensive overview of MADS-box genes in conifers and thus will provide a framework for future work on MADS-box genes in seed plants. © The Author 2014. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Strategies to explore functional genomics data sets in NCBI's GEO database.
Wilhite, Stephen E; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database
Wilhite, Stephen E.; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
USDA-ARS?s Scientific Manuscript database
The Gram-negative bacterium Erwinia amylovora (Burrill) Winslow. et al., causal agent of fire blight disease in pome fruit trees, encodes a type three secretion system (T3SS) that functions to translocate effector proteins into plant cells that collectively function to suppress host defenses and ena...
Costa, Caroline B; Monteiro, Karina M; Teichmann, Aline; da Silva, Edileuza D; Lorenzatto, Karina R; Cancela, Martín; Paes, Jéssica A; Benitz, André de N D; Castillo, Estela; Margis, Rogério; Zaha, Arnaldo; Ferreira, Henrique B
2015-08-01
The histone chaperone SET/TAF-Iβ is implicated in processes of chromatin remodelling and gene expression regulation. It has been associated with the control of developmental processes, but little is known about its function in helminth parasites. In Mesocestoides corti, a partial cDNA sequence related to SET/TAF-Iβ was isolated in a screening for genes differentially expressed in larvae (tetrathyridia) and adult worms. Here, the full-length coding sequence of the M. corti SET/TAF-Iβ gene was analysed and the encoded protein (McSET/TAF) was compared with orthologous sequences, showing that McSET/TAF can be regarded as a SET/TAF-Iβ family member, with a typical nucleosome-assembly protein (NAP) domain and an acidic tail. The expression patterns of the McSET/TAF gene and protein were investigated during the strobilation process by RT-qPCR, using a set of five reference genes, and by immunoblot and immunofluorescence, using monospecific polyclonal antibodies. A gradual increase in McSET/TAF transcripts and McSET/TAF protein was observed upon development induction by trypsin, demonstrating McSET/TAF differential expression during strobilation. These results provided the first evidence for the involvement of a protein from the NAP family of epigenetic effectors in the regulation of cestode development.
Saka, Ernur; Harrison, Benjamin J; West, Kirk; Petruska, Jeffrey C; Rouchka, Eric C
2017-12-06
Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature. We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences. Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
Wagner, Florian
2015-01-01
Method Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. Results I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. PMID:26575370
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.
Wagner, Florian
2015-01-01
Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.
The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets. Read the abstract
Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans
Moretti, S.; Davydov, I.I.; Excoffier, L.
2017-01-01
Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345
WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.
van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.
Different functional classes of genes are characterized by different compositional properties.
D'Onofrio, Giuseppe; Ghosh, Tapash Chandra; Saccone, Salvatore
2007-12-22
A compositional analysis on a set of human genes classified in several functional classes was performed. We found out that the GC3, i.e. the GC level at the third codon positions, of the genes involved in cellular metabolism was significantly higher than those involved in information storage and processing. Analyses of human/Xenopus ortologous genes showed that: (i) the GC3 increment of the genes involved in cellular metabolism was significantly higher than those involved in information storage and processing; and (ii) a strong correlation between the GC3 and the corresponding GCi, i.e. the GC level of introns, was found in each functional class. The non-randomness of the GC increments favours the selective hypothesis of gene/genome evolution.
Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.
Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian
2018-02-23
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.
Functional clustering of time series gene expression data by Granger causality
2012-01-01
Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425
2010-01-01
Background Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development. Conclusions The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction. PMID:21062474
APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Guselnikov, S.V.; Grayfer, L.; De Jesús Andino, F.; Rogozin, I.B.; Robert, J.; Taranin, A.V.
2015-01-01
The ITAM-bearing transmembrane signaling subunits (TSS) are indispensable components of activating leukocyte receptor complexes. The TSS-encoding genes map to paralogous chromosomal regions, which are thought to arise from ancient genome tetraploidization(s). To assess a possible role of tetraploidization in the TSS evolution, we studied TSS and other functionally linked genes in the amphibian species Xenopus laevis whose genome was duplicated about 40 MYR ago. We found that X. laevis has retained a duplicated set of sixteen TSS genes, all except one being transcribed. Furthermore, duplicated TCRα loci and genes encoding TSS-coupling protein kinases have also been retained. No clear evidence for functional divergence of the TSS paralogs was obtained from gene expression and sequence analyses. We suggest that the main factor of maintenance of duplicated TSS genes in X. laevis was a protein dosage effect and that this effect might have facilitated the TSS set expansion in early vertebrates. PMID:26170006
Promoting gene expression in plants by permissive histone lysine methylation
Millar, Tony; Finnegan, E Jean
2009-01-01
Plants utilize sophisticated epigenetic regulatory mechanisms to coordinate changes in gene expression during development and in response to environmental stimuli. Epigenetics refers to the modification of DNA and chromatin associated proteins, which affect gene expression and cell function, without changing the DNA sequence. Such modifications are inherited through mitosis, and in rare instances through meiosis, although it can be reversible and thus regulatory. Epigenetic modifications are controlled by groups of proteins, such as the family of histone lysine methytransferases (HKMTs). The catalytic core known as the SET domain encodes HKMT activity and either promotes or represses gene expression. A large family of SET domain proteins is present in Arabidopsis where there is growing evidence that two classes of these genes are involved in promoting gene expression in a diverse range of developmental processes. This review will focus on the function of these two classes and the processes that they control, highlighting the huge potential this regulatory mechanism has in plants. PMID:19816124
Diverse Antibiotic Resistance Genes in Dairy Cow Manure
Wichmann, Fabienne; Udikovic-Kolic, Nikolina; Andrew, Sheila; Handelsman, Jo
2014-01-01
ABSTRACT Application of manure from antibiotic-treated animals to crops facilitates the dissemination of antibiotic resistance determinants into the environment. However, our knowledge of the identity, diversity, and patterns of distribution of these antibiotic resistance determinants remains limited. We used a new combination of methods to examine the resistome of dairy cow manure, a common soil amendment. Metagenomic libraries constructed with DNA extracted from manure were screened for resistance to beta-lactams, phenicols, aminoglycosides, and tetracyclines. Functional screening of fosmid and small-insert libraries identified 80 different antibiotic resistance genes whose deduced protein sequences were on average 50 to 60% identical to sequences deposited in GenBank. The resistance genes were frequently found in clusters and originated from a taxonomically diverse set of species, suggesting that some microorganisms in manure harbor multiple resistance genes. Furthermore, amid the great genetic diversity in manure, we discovered a novel clade of chloramphenicol acetyltransferases. Our study combined functional metagenomics with third-generation PacBio sequencing to significantly extend the roster of functional antibiotic resistance genes found in animal gut bacteria, providing a particularly broad resource for understanding the origins and dispersal of antibiotic resistance genes in agriculture and clinical settings. PMID:24757214
Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
Peng, Cheng; Shiu, Shin-Han
2016-01-01
Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets. PMID:27935950
Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre FR; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth
2016-01-01
Objective Genome-wide association (GWA) studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Approaches and Results Employing pathways (gene sets) from Reactome, we carried out a two-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CADGWAS data sets (9,889 cases/11,089 controls), nominally significant gene-sets were tested for replication in a meta-analysis of 9 additional studies (15,502 cases/55,730 controls) from the CARDIoGRAM Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication p<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix integrity, innate immunity, axon guidance, and signaling by PDRF, NOTCH, and the TGF-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (e.g. semaphorin regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared to random networks (p<0.001). Network centrality analysis (‘degree’ and ‘betweenness’) further identified genes (e.g. NCAM1, FYN, FURIN etc.) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. Conclusions These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. PMID:25977570
Fuzzy measures on the Gene Ontology for gene product similarity.
Popescu, Mihail; Keller, James M; Mitchell, Joyce A
2006-01-01
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.
Maize GO annotation—methods, evaluation, and review (maize-GAMER)
USDA-ARS?s Scientific Manuscript database
We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this stu...
Genetic Variation at the N-acetyltransferase (NAT) Genes in Global Populations
Functional variability at the N-acetyltransferase (NAT) genes is associated with adverse drug reactions and cancer susceptibility in humans. Previous studies of small sets of ethnic groups have indicated that the NAT genes have high levels of amino acid variation that differ in f...
Salnikova, Lyubov E; Smelaya, Tamara V; Golubev, Arkadiy M; Rubanovich, Alexander V; Moroz, Viktor V
2013-11-01
This study was conducted to establish the possible contribution of functional gene polymorphisms in detoxification/oxidative stress and vascular remodeling pathways to community-acquired pneumonia (CAP) susceptibility in the case-control study (350 CAP patients, 432 control subjects) and to predisposition to the development of CAP complications in the prospective study. All subjects were genotyped for 16 polymorphic variants in the 14 genes of xenobiotics detoxification CYP1A1, AhR, GSTM1, GSTT1, ABCB1, redox-status SOD2, CAT, GCLC, and vascular homeostasis ACE, AGT, AGTR1, NOS3, MTHFR, VEGFα. Risk of pulmonary complications (PC) in the single locus analysis was associated with CYP1A1, GCLC and AGTR1 genes. Extra PC (toxic shock syndrome and myocarditis) were not associated with these genes. We evaluated gene-gene interactions using multi-factor dimensionality reduction, and cumulative gene risk score approaches. The final model which included >5 risk alleles in the CYP1A1 (rs2606345, rs4646903, rs1048943), GCLC, AGT, and AGTR1 genes was associated with pleuritis, empyema, acute respiratory distress syndrome, all PC and acute respiratory failure (ARF). We considered CYP1A1, GCLC, AGT, AGTR1 gene set using Set Distiller mode implemented in GeneDecks for discovering gene-set relations via the degree of sharing descriptors within a given gene set. N-acetylcysteine and oxygen were defined by Set Distiller as the best descriptors for the gene set associated in the present study with PC and ARF. Results of the study are in line with literature data and suggest that genetically determined oxidative stress exacerbation may contribute to the progression of lung inflammation.
Dittmar, W James; McIver, Lauren; Michalak, Pawel; Garner, Harold R; Valdez, Gregorio
2014-07-01
The wealth of publicly available gene expression and genomic data provides unique opportunities for computational inference to discover groups of genes that function to control specific cellular processes. Such genes are likely to have co-evolved and be expressed in the same tissues and cells. Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets make this type of analysis difficult for the average end-user. Here, we describe the implementation of a web server that predicts genes involved in affecting specific cellular processes together with a gene of interest. We termed the server 'EvoCor', to denote that it detects functional relationships among genes through evolutionary analysis and gene expression correlation. This web server integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. This server is easy to use and freely available at http://pilot-hmm.vbi.vt.edu/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Nonell, Lara; Puigdecanet, Eulàlia; Astier, Laura; Solé, Francesc; Bayes-Genis, Antoni
2013-01-01
Molecular mechanisms associated with pathophysiological changes in ventricular remodelling due to myocardial infarction (MI) remain poorly understood. We analyzed changes in gene expression by microarray technology in porcine myocardial tissue at 1, 4, and 6 weeks post-MI. MI was induced by coronary artery ligation in 9 female pigs (30–40 kg). Animals were randomly sacrificed at 1, 4, or 6 weeks post-MI (n = 3 per group) and 3 healthy animals were also included as control group. Total RNA from myocardial samples was hybridized to GeneChip® Porcine Genome Arrays. Functional analysis was obtained with the Ingenuity Pathway Analysis (IPA) online tool. Validation of microarray data was performed by quantitative real-time PCR (qRT-PCR). More than 8,000 different probe sets showed altered expression in the remodelling myocardium at 1, 4, or 6 weeks post-MI. Ninety-seven percent of altered transcripts were detected in the infarct core and 255 probe sets were differentially expressed in the remote myocardium. Functional analysis revealed 28 genes de-regulated in the remote myocardial region in at least one of the three temporal analyzed stages, including genes associated with heart failure (HF), systemic sclerosis and coronary artery disease. In the infarct core tissue, eight major time-dependent gene expression patterns were recognized among 4,221 probe sets commonly altered over time. Altered gene expression of ACVR2B, BID, BMP2, BMPR1A, LMNA, NFKBIA, SMAD1, TGFB3, TNFRSF1A, and TP53 were further validated. The clustering of similar expression patterns for gene products with related function revealed molecular footprints, some of them described for the first time, which elucidate changes in biological processes at different stages after MI. PMID:23372767
A chronological expression profile of gene activity during embryonic mouse brain development.
Goggolidou, P; Soneji, S; Powles-Glover, N; Williams, D; Sethi, S; Baban, D; Simon, M M; Ragoussis, I; Norris, D P
2013-12-01
The brain is a functionally complex organ, the patterning and development of which are key to adult health. To help elucidate the genetic networks underlying mammalian brain patterning, we conducted detailed transcriptional profiling during embryonic development of the mouse brain. A total of 2,400 genes were identified as showing differential expression between three developmental stages. Analysis of the data identified nine gene clusters to demonstrate analogous expression profiles. A significant group of novel genes of as yet undiscovered biological function were detected as being potentially relevant to brain development and function, in addition to genes that have previously identified roles in the brain. Furthermore, analysis for genes that display asymmetric expression between the left and right brain hemispheres during development revealed 35 genes as putatively asymmetric from a combined data set. Our data constitute a valuable new resource for neuroscience and neurodevelopment, exposing possible functional associations between genes, including novel loci, and encouraging their further investigation in human neurological and behavioural disorders.
Benitez, Cecil M.; Qu, Kun; Sugiyama, Takuya; Pauerstein, Philip T.; Liu, Yinghua; Tsai, Jennifer; Gu, Xueying; Ghodasara, Amar; Arda, H. Efsun; Zhang, Jiajing; Dekker, Joseph D.; Tucker, Haley O.; Chang, Howard Y.; Kim, Seung K.
2014-01-01
The regulatory logic underlying global transcriptional programs controlling development of visceral organs like the pancreas remains undiscovered. Here, we profiled gene expression in 12 purified populations of fetal and adult pancreatic epithelial cells representing crucial progenitor cell subsets, and their endocrine or exocrine progeny. Using probabilistic models to decode the general programs organizing gene expression, we identified co-expressed gene sets in cell subsets that revealed patterns and processes governing progenitor cell development, lineage specification, and endocrine cell maturation. Purification of Neurog3 mutant cells and module network analysis linked established regulators such as Neurog3 to unrecognized gene targets and roles in pancreas development. Iterative module network analysis nominated and prioritized transcriptional regulators, including diabetes risk genes. Functional validation of a subset of candidate regulators with corresponding mutant mice revealed that the transcription factors Etv1, Prdm16, Runx1t1 and Bcl11a are essential for pancreas development. Our integrated approach provides a unique framework for identifying regulatory genes and functional gene sets underlying pancreas development and associated diseases such as diabetes mellitus. PMID:25330008
Haralambieva, Iana H.; Oberg, Ann L.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Grill, Diane E.; Middha, Sumit; Bot, Brian M.; Wang, Vivian W.; Smith, David I.; Jacobson, Robert M.; Poland, Gregory A.
2013-01-01
Immune responses to current rubella vaccines demonstrate significant inter-individual variability. We performed mRNA-Seq profiling on PBMCs from high and low antibody responders to rubella vaccination to delineate transcriptional differences upon viral stimulation. Generalized linear models were used to assess the per gene fold change (FC) for stimulated versus unstimulated samples or the interaction between outcome and stimulation. Model results were evaluated by both FC and p-value. Pathway analysis and self-contained gene set tests were performed for assessment of gene group effects. Of 17,566 detected genes, we identified 1,080 highly significant differentially expressed genes upon viral stimulation (p<1.00E−15, FDR<1.00E−14), including various immune function and inflammation-related genes, genes involved in cell signaling, cell regulation and transcription, and genes with unknown function. Analysis by immune outcome and stimulation status identified 27 genes (p≤0.0006 and FDR≤0.30) that responded differently to viral stimulation in high vs. low antibody responders, including major histocompatibility complex (MHC) class I genes (HLA-A, HLA-B and B2M with p = 0.0001, p = 0.0005 and p = 0.0002, respectively), and two genes related to innate immunity and inflammation (EMR3 and MEFV with p = 1.46E−08 and p = 0.0004, respectively). Pathway and gene set analysis also revealed transcriptional differences in antigen presentation and innate/inflammatory gene sets and pathways between high and low responders. Using mRNA-Seq genome-wide transcriptional profiling, we identified antigen presentation and innate/inflammatory genes that may assist in explaining rubella vaccine-induced immune response variations. Such information may provide new scientific insights into vaccine-induced immunity useful in rational vaccine development and immune response monitoring. PMID:23658707
Genome-wide characterization of mammalian promoters with distal enhancer functions.
Dao, Lan T M; Galindo-Albarrán, Ariel O; Castro-Mondragon, Jaime A; Andrieu-Soler, Charlotte; Medina-Rivera, Alejandra; Souaid, Charbel; Charbonnier, Guillaume; Griffon, Aurélien; Vanhille, Laurent; Stephen, Tharshana; Alomairi, Jaafar; Martin, David; Torres, Magali; Fernandez, Nicolas; Soler, Eric; van Helden, Jacques; Puthier, Denis; Spicuglia, Salvatore
2017-07-01
Gene expression in mammals is precisely regulated by the combination of promoters and gene-distal regulatory regions, known as enhancers. Several studies have suggested that some promoters might have enhancer functions. However, the extent of this type of promoters and whether they actually function to regulate the expression of distal genes have remained elusive. Here, by exploiting a high-throughput enhancer reporter assay, we unravel a set of mammalian promoters displaying enhancer activity. These promoters have distinct genomic and epigenomic features and frequently interact with other gene promoters. Extensive CRISPR-Cas9 genomic manipulation demonstrated the involvement of these promoters in the cis regulation of expression of distal genes in their natural loci. Our results have important implications for the understanding of complex gene regulation in normal development and disease.
Rough set soft computing cancer classification and network: one stone, two birds.
Zhang, Yue
2010-07-15
Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article.
Engert, Christoph G; Droste, Rita; van Oudenaarden, Alexander; Horvitz, H Robert
2018-04-01
To better understand the tissue-specific regulation of chromatin state in cell-fate determination and animal development, we defined the tissue-specific expression of all 36 C. elegans presumptive lysine methyltransferase (KMT) genes using single-molecule fluorescence in situ hybridization (smFISH). Most KMTs were expressed in only one or two tissues. The germline was the tissue with the broadest KMT expression. We found that the germline-expressed C. elegans protein SET-17, which has a SET domain similar to that of the PRDM9 and PRDM7 SET-domain proteins, promotes fertility by regulating gene expression in primary spermatocytes. SET-17 drives the transcription of spermatocyte-specific genes from four genomic clusters to promote spermatid development. SET-17 is concentrated in stable chromatin-associated nuclear foci at actively transcribed msp (major sperm protein) gene clusters, which we term msp locus bodies. Our results reveal the function of a PRDM9/7-family SET-domain protein in spermatocyte transcription. We propose that the spatial intranuclear organization of chromatin factors might be a conserved mechanism in tissue-specific control of transcription.
Jin, Yulan; Sharma, Ashok; Bai, Shan; Davis, Colleen; Liu, Haitao; Hopkins, Diane; Barriga, Kathy; Rewers, Marian; She, Jin-Xiong
2014-07-01
There is tremendous scientific and clinical value to further improving the predictive power of autoantibodies because autoantibody-positive (AbP) children have heterogeneous rates of progression to clinical diabetes. This study explored the potential of gene expression profiles as biomarkers for risk stratification among 104 AbP subjects from the Diabetes Autoimmunity Study in the Young (DAISY) using a discovery data set based on microarray and a validation data set based on real-time RT-PCR. The microarray data identified 454 candidate genes with expression levels associated with various type 1 diabetes (T1D) progression rates. RT-PCR analyses of the top-27 candidate genes confirmed 5 genes (BACH2, IGLL3, EIF3A, CDC20, and TXNDC5) associated with differential progression and implicated in lymphocyte activation and function. Multivariate analyses of these five genes in the discovery and validation data sets identified and confirmed four multigene models (BI, ICE, BICE, and BITE, with each letter representing a gene) that consistently stratify high- and low-risk subsets of AbP subjects with hazard ratios >6 (P < 0.01). The results suggest that these genes may be involved in T1D pathogenesis and potentially serve as excellent gene expression biomarkers to predict the risk of progression to clinical diabetes for AbP subjects. © 2014 by the American Diabetes Association.
Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano
2011-04-11
In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
NEAT: an efficient network enrichment analysis test.
Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C
2016-09-05
Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).
[The application of gene expression programming in the diagnosis of heart disease].
Dai, Wenbin; Zhang, Yuntao; Gao, Xingyu
2009-02-01
GEP (Gene expression programming) is a new genetic algorithm, and it has been proved to be excellent in function finding. In this paper, for the purpose of setting up a diagnostic model, GEP is used to deal with the data of heart disease. Eight variables, Sex, Chest pain, Blood pressure, Angina, Peak, Slope, Colored vessels and Thal, are picked out of thirteen variables to form a classified function. This function is used to predict a forecasting set of 100 samples, and the accuracy is 87%. Other algorithms such as SVM (Support vector machine) are applied to the same data and the forecasting results show that GEP is better than other algorithms.
Romand, Raymond; Ripp, Raymond; Poidevin, Laetitia; Boeglin, Marcel; Geffers, Lars; Dollé, Pascal; Poch, Olivier
2015-01-01
An in situ hybridization (ISH) study was performed on 2000 murine genes representing around 10% of the protein-coding genes present in the mouse genome using data generated by the EURExpress consortium. This study was carried out in 25 tissues of late gestation embryos (E14.5), with a special emphasis on the developing ear and on five distinct developing sensory organs, including the cochlea, the vestibular receptors, the sensory retina, the olfactory organ, and the vibrissae follicles. The results obtained from an analysis of more than 11,000 micrographs have been integrated in a newly developed knowledgebase, called ImAnno. In addition to managing the multilevel micrograph annotations performed by human experts, ImAnno provides public access to various integrated databases and tools. Thus, it facilitates the analysis of complex ISH gene expression patterns, as well as functional annotation and interaction of gene sets. It also provides direct links to human pathways and diseases. Hierarchical clustering of expression patterns in the 25 tissues revealed three main branches corresponding to tissues with common functions and/or embryonic origins. To illustrate the integrative power of ImAnno, we explored the expression, function and disease traits of the sensory epithelia of the five presumptive sensory organs. The study identified 623 genes (out of 2000) concomitantly expressed in the five embryonic epithelia, among which many (∼12%) were involved in human disorders. Finally, various multilevel interaction networks were characterized, highlighting differential functional enrichments of directly or indirectly interacting genes. These analyses exemplify an under-represention of "sensory" functions in the sensory gene set suggests that E14.5 is a pivotal stage between the developmental stage and the functional phase that will be fully reached only after birth.
Long noncoding RNAs as enhancers of gene expression.
Ørom, U A; Derrien, T; Guigo, R; Shiekhattar, R
2010-01-01
The human genome contains thousands of long noncoding RNAs (ncRNAs) transcribed from diverse genomic locations. A large set of long ncRNAs is transcribed independent of protein-coding genes. We have used the GENCODE annotation of the human genome to identify 3019 long ncRNAs expressed in various human cell lines and tissue. This set of long ncRNAs responds to differentiation signals in primary human keratinocytes and is coexpressed with important regulators of keratinocyte development. Depletion of a number of these long ncRNAs leads to the repression of specific genes in their surrounding locus, supportive of an activating function for ncRNAs. Using reporter assays, we confirmed such activating function and show that such transcriptional enhancement is mediated through the long ncRNA transcripts. Our studies show that long ncRNAs exhibit functions similar to classically defined enhancers, through an RNA-dependent mechanism.
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes
Kurokawa, Ken; Itoh, Takehiko; Kuwahara, Tomomi; Oshima, Kenshiro; Toh, Hidehiro; Toyoda, Atsushi; Takami, Hideto; Morita, Hidetoshi; Sharma, Vineet K.; Srivastava, Tulika P.; Taylor, Todd D.; Noguchi, Hideki; Mori, Hiroshi; Ogura, Yoshitoshi; Ehrlich, Dusko S.; Itoh, Kikuji; Takagi, Toshihisa; Sakaki, Yoshiyuki; Hayashi, Tetsuya; Hattori, Masahira
2007-01-01
Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal microbiomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes. PMID:17916580
2012-01-01
Background Single nucleotide polymorphism (SNP) validation and large-scale genotyping are required to maximize the use of DNA sequence variation and determine the functional relevance of candidate genes for complex stress tolerance traits through genetic association in rice. We used the bead array platform-based Illumina GoldenGate assay to validate and genotype SNPs in a select set of stress-responsive genes to understand their functional relevance and study the population structure in rice. Results Of the 384 putative SNPs assayed, we successfully validated and genotyped 362 (94.3%). Of these 325 (84.6%) showed polymorphism among the 91 rice genotypes examined. Physical distribution, degree of allele sharing, admixtures and introgression, and amino acid replacement of SNPs in 263 abiotic and 62 biotic stress-responsive genes provided clues for identification and targeted mapping of trait-associated genomic regions. We assessed the functional and adaptive significance of validated SNPs in a set of contrasting drought tolerant upland and sensitive lowland rice genotypes by correlating their allelic variation with amino acid sequence alterations in catalytic domains and three-dimensional secondary protein structure encoded by stress-responsive genes. We found a strong genetic association among SNPs in the nine stress-responsive genes with upland and lowland ecological adaptation. Higher nucleotide diversity was observed in indica accessions compared with other rice sub-populations based on different population genetic parameters. The inferred ancestry of 16% among rice genotypes was derived from admixed populations with the maximum between upland aus and wild Oryza species. Conclusions SNPs validated in biotic and abiotic stress-responsive rice genes can be used in association analyses to identify candidate genes and develop functional markers for stress tolerance in rice. PMID:22921105
Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Yi-Ping; Chuang, Jen-Hua; Yang, Ming-Jie; Yen, Ming-Shyen; Chiou, Shih-Hwa; Chang, Cheng-Chang
2016-01-01
Clear cell (CCC), endometrioid (EC), mucinous (MC) and high-grade serous carcinoma (SC) are the four most common subtypes of epithelial ovarian carcinoma (EOC). The widely accepted dualistic model of ovarian carcinogenesis divided EOCs into type I and II categories based on the molecular features. However, this hypothesis has not been experimentally demonstrated. We carried out a gene set-based analysis by integrating the microarray gene expression profiles downloaded from the publicly available databases. These quantified biological functions of EOCs were defined by 1454 Gene Ontology (GO) term and 674 Reactome pathway gene sets. The pathogenesis of the four EOC subtypes was investigated by hierarchical clustering and exploratory factor analysis. The patterns of functional regulation among the four subtypes containing 1316 cases could be accurately classified by machine learning. The results revealed that the ERBB and PI3K-related pathways played important roles in the carcinogenesis of CCC, EC and MC; while deregulation of cell cycle was more predominant in SC. The study revealed that two different functional regulation patterns exist among the four EOC subtypes, which were compatible with the type I and II classifications proposed by the dualistic model of ovarian carcinogenesis. PMID:27527159
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.
Zheng, Qi; Wang, Xiu-Jie
2008-07-01
Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
Obayashi, Takeshi; Kinoshita, Kengo
2010-05-01
Gene coexpression analyses are a powerful method to predict the function of genes and/or to identify genes that are functionally related to query genes. The basic idea of gene coexpression analyses is that genes with similar functions should have similar expression patterns under many different conditions. This approach is now widely used by many experimental researchers, especially in the field of plant biology. In this review, we will summarize recent successful examples obtained by using our gene coexpression database, ATTED-II. Specifically, the examples will describe the identification of new genes, such as the subunits of a complex protein, the enzymes in a metabolic pathway and transporters. In addition, we will discuss the discovery of a new intercellular signaling factor and new regulatory relationships between transcription factors and their target genes. In ATTED-II, we provide two basic views of gene coexpression, a gene list view and a gene network view, which can be used as guide gene approach and narrow-down approach, respectively. In addition, we will discuss the coexpression effectiveness for various types of gene sets.
Lee, Chai-Jin; Kang, Dongwon; Lee, Sangseon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun
2018-05-25
Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. http://biohealth.snu.ac.kr/software/insilico/. Copyright © 2018 Elsevier Inc. All rights reserved.
Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides
Geiselman, Gina M; Ito, Masakazu; Mondo, Stephen J; Reilly, Morgann C; Cheng, Ya-Fang; Bauer, Stefan; Grigoriev, Igor V; Gladden, John M; Simmons, Blake A; Brem, Rachel B
2018-01-01
The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted function in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. These results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi. PMID:29521624
An integrated analysis of genes and functional pathways for aggression in human and rodent models.
Zhang-James, Yanli; Fernàndez-Castillo, Noèlia; Hess, Jonathan L; Malki, Karim; Glatt, Stephen J; Cormand, Bru; Faraone, Stephen V
2018-06-01
Human genome-wide association studies (GWAS), transcriptome analyses of animal models, and candidate gene studies have advanced our understanding of the genetic architecture of aggressive behaviors. However, each of these methods presents unique limitations. To generate a more confident and comprehensive view of the complex genetics underlying aggression, we undertook an integrated, cross-species approach. We focused on human and rodent models to derive eight gene lists from three main categories of genetic evidence: two sets of genes identified in GWAS studies, four sets implicated by transcriptome-wide studies of rodent models, and two sets of genes with causal evidence from online Mendelian inheritance in man (OMIM) and knockout (KO) mice reports. These gene sets were evaluated for overlap and pathway enrichment to extract their similarities and differences. We identified enriched common pathways such as the G-protein coupled receptor (GPCR) signaling pathway, axon guidance, reelin signaling in neurons, and ERK/MAPK signaling. Also, individual genes were ranked based on their cumulative weights to quantify their importance as risk factors for aggressive behavior, which resulted in 40 top-ranked and highly interconnected genes. The results of our cross-species and integrated approach provide insights into the genetic etiology of aggression.
Naaijen, J; Bralten, J; Poelmans, G; Glennon, J C; Franke, B; Buitelaar, J K
2017-01-10
Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD.
Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino
2014-01-30
The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.
Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J
2017-04-01
Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on Alcoholism.
Jaiswal, Deepika; Jezek, Meagan; Quijote, Jeremiah; Lum, Joanna; Choi, Grace; Kulkarni, Rushmie; Park, DoHwan; Green, Erin M.
2017-01-01
The conserved yeast histone methyltransferase Set1 targets H3 lysine 4 (H3K4) for mono, di, and trimethylation and is linked to active transcription due to the euchromatic distribution of these methyl marks and the recruitment of Set1 during transcription. However, loss of Set1 results in increased expression of multiple classes of genes, including genes adjacent to telomeres and middle sporulation genes, which are repressed under normal growth conditions because they function in meiotic progression and spore formation. The mechanisms underlying Set1-mediated gene repression are varied, and still unclear in some cases, although repression has been linked to both direct and indirect action of Set1, associated with noncoding transcription, and is often dependent on the H3K4me2 mark. We show that Set1, and particularly the H3K4me2 mark, are implicated in repression of a subset of middle sporulation genes during vegetative growth. In the absence of Set1, there is loss of the DNA-binding transcriptional regulator Sum1 and the associated histone deacetylase Hst1 from chromatin in a locus-specific manner. This is linked to increased H4K5ac at these loci and aberrant middle gene expression. These data indicate that, in addition to DNA sequence, histone modification status also contributes to proper localization of Sum1. Our results also show that the role for Set1 in middle gene expression control diverges as cells receive signals to undergo meiosis. Overall, this work dissects an unexplored role for Set1 in gene-specific repression, and provides important insights into a new mechanism associated with the control of gene expression linked to meiotic differentiation. PMID:29066473
Genome-Wide Detection and Analysis of Multifunctional Genes
Pritykin, Yuri; Ghersi, Dario; Singh, Mona
2015-01-01
Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655
Di Narzo, Antonio Fabio; Kozlenkov, Alexey; Roussos, Panos; Hao, Ke; Hurd, Yasmin; Lewis, David A.; Sibille, Etienne; Siever, Larry J.; Koonin, Eugene; Dracheva, Stella
2014-01-01
Editing of the pre-mRNA for the serotonin receptor 2C (5-HT2CR) by site-specific adenosine deamination (A-to-I pre-mRNA editing) substantially increases the functional plasticity of this key neurotransmitter receptor and is thought to contribute to homeostatic mechanisms in neurons. 5-HT2CR mRNA editing generates up to 24 different receptor isoforms. The extent of editing correlates with 5-HT2CR functional activity: more highly edited isoforms exhibit the least function. Altered 5-HT2CR editing has been reported in postmortem brains of suicide victims. We report a comparative analysis of the connections among 5-HT2CR editing, genome-wide gene expression and DNA methylation in suicide victims, individuals with major depressive disorder and non-psychiatric controls. The results confirm previous findings of an overrepresentation of highly edited mRNA variants (which encode hypoactive 5-HT2CR receptors) in the brains of suicide victims. A large set of genes for which the expression level is associated with editing was detected. This signature set of editing-associated genes is significantly enriched for genes that are involved in synaptic transmission, genes that are preferentially expressed in neurons, and genes whose expression is correlated with the level of DNA methylation. Notably, we report that the link between 5-HT2CR editing and gene expression is disrupted in suicide victims. The results suggest that the postulated homeostatic function of 5-HT2CR editing is dysregulated in individuals who committed suicide. PMID:24781207
2014-05-16
native uncharacterized genes for characterized genes from Bacillus subtilis , that is presented in a constitutive expression module. If the B... subtilis gene containing M. mycoides mutant is viable than the function of the conserved hypothetical gene is the same as the input B. subtilis gene...Characterized genes from B. subtilis were swapped with similar, but not so similar as to be clearly the same, essential genes from M. mycoides. The B. subtilis
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
The phenotypic manifestations of rare genic CNVs in autism spectrum disorder
Merikangas, A K; Segurado, R; Heron, E A; Anney, R J L; Paterson, A D; Cook, E H; Pinto, D; Scherer, S W; Szatmari, P; Gill, M; Corvin, A P; Gallagher, L
2015-01-01
Significant evidence exists for the association between copy number variants (CNVs) and Autism Spectrum Disorder (ASD); however, most of this work has focused solely on the diagnosis of ASD. There is limited understanding of the impact of CNVs on the ‘sub-phenotypes' of ASD. The objective of this paper is to evaluate associations between CNVs in differentially brain expressed (DBE) genes or genes previously implicated in ASD/intellectual disability (ASD/ID) and specific sub-phenotypes of ASD. The sample consisted of 1590 cases of European ancestry from the Autism Genome Project (AGP) with a diagnosis of an ASD and at least one rare CNV impacting any gene and a core set of phenotypic measures, including symptom severity, language impairments, seizures, gait disturbances, intelligence quotient (IQ) and adaptive function, as well as paternal and maternal age. Classification analyses using a non-parametric recursive partitioning method (random forests) were employed to define sets of phenotypic characteristics that best classify the CNV-defined groups. There was substantial variation in the classification accuracy of the two sets of genes. The best variables for classification were verbal IQ for the ASD/ID genes, paternal age at birth for the DBE genes and adaptive function for de novo CNVs. CNVs in the ASD/ID list were primarily associated with communication and language domains, whereas CNVs in DBE genes were related to broader manifestations of adaptive function. To our knowledge, this is the first study to examine the associations between sub-phenotypes and CNVs genome-wide in ASD. This work highlights the importance of examining the diverse sub-phenotypic manifestations of CNVs in ASD, including the specific features, comorbid conditions and clinical correlates of ASD that comprise underlying characteristics of the disorder. PMID:25421404
The phenotypic manifestations of rare genic CNVs in autism spectrum disorder.
Merikangas, A K; Segurado, R; Heron, E A; Anney, R J L; Paterson, A D; Cook, E H; Pinto, D; Scherer, S W; Szatmari, P; Gill, M; Corvin, A P; Gallagher, L
2015-11-01
Significant evidence exists for the association between copy number variants (CNVs) and Autism Spectrum Disorder (ASD); however, most of this work has focused solely on the diagnosis of ASD. There is limited understanding of the impact of CNVs on the 'sub-phenotypes' of ASD. The objective of this paper is to evaluate associations between CNVs in differentially brain expressed (DBE) genes or genes previously implicated in ASD/intellectual disability (ASD/ID) and specific sub-phenotypes of ASD. The sample consisted of 1590 cases of European ancestry from the Autism Genome Project (AGP) with a diagnosis of an ASD and at least one rare CNV impacting any gene and a core set of phenotypic measures, including symptom severity, language impairments, seizures, gait disturbances, intelligence quotient (IQ) and adaptive function, as well as paternal and maternal age. Classification analyses using a non-parametric recursive partitioning method (random forests) were employed to define sets of phenotypic characteristics that best classify the CNV-defined groups. There was substantial variation in the classification accuracy of the two sets of genes. The best variables for classification were verbal IQ for the ASD/ID genes, paternal age at birth for the DBE genes and adaptive function for de novo CNVs. CNVs in the ASD/ID list were primarily associated with communication and language domains, whereas CNVs in DBE genes were related to broader manifestations of adaptive function. To our knowledge, this is the first study to examine the associations between sub-phenotypes and CNVs genome-wide in ASD. This work highlights the importance of examining the diverse sub-phenotypic manifestations of CNVs in ASD, including the specific features, comorbid conditions and clinical correlates of ASD that comprise underlying characteristics of the disorder.
Clifford, Jennifer C; Buchanan, Alex; Vining, Oliver; Kidarsa, Teresa A; Chang, Jeff H; McPhail, Kerry L; Loper, Joyce E
2016-10-01
Bacteria can be both highly communicative and highly competitive in natural habitats and antibiotics are thought to play a role in both of these processes. The soil bacterium Pseudomonas protegens Pf-5 produces a spectrum of antibiotics, two of which, pyoluteorin and 2,4-diacetylphloroglucinol (DAPG), function in intracellular and intercellular communication, both as autoinducers of their own production. Here, we demonstrate that phloroglucinol, an intermediate in DAPG biosynthesis, can serve as an intercellular signal influencing the expression of pyoluteorin biosynthesis genes, the production of pyoluteorin, and inhibition of Pythium ultimum, a phytopathogenic oomycete sensitive to pyoluteorin. Through analysis of RNAseq data sets, we show that phloroglucinol had broad effects on the transcriptome of Pf-5, significantly altering the transcription of more than two hundred genes. The effects of nanomolar versus micromolar concentrations of phloroglucinol differed both quantitatively and qualitatively, influencing the expression of distinct sets of genes or having opposite effects on transcript abundance of certain genes. Therefore, our results support the concept of hormesis, a phenomenon associated with signalling molecules that elicit distinct responses at different concentrations. Phloroglucinol is the first example of an intermediate of antibiotic biosynthesis that functions as a chemical messenger influencing gene expression in P. protegens. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.
Broad-Enrich: functional interpretation of large sets of broad genomic regions.
Cavalcante, Raymond G; Lee, Chee; Welch, Ryan P; Patil, Snehal; Weymouth, Terry; Scott, Laura J; Sartor, Maureen A
2014-09-01
Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes. To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations. http://broad-enrich.med.umich.edu for Web version and R package. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Using expression genetics to study the neurobiology of ethanol and alcoholism.
Farris, Sean P; Wolen, Aaron R; Miles, Michael F
2010-01-01
Recent simultaneous progress in human and animal model genetics and the advent of microarray whole genome expression profiling have produced prodigious data sets on genetic loci, potential candidate genes, and differential gene expression related to alcoholism and ethanol behaviors. Validated target genes or gene networks functioning in alcoholism are still of meager proportions. Genetical genomics, which combines genetic analysis of both traditional phenotypes and whole genome expression data, offers a potential methodology for characterizing brain gene networks functioning in alcoholism. This chapter will describe concepts, approaches, and recent findings in the field of genetical genomics as it applies to alcohol research. Copyright 2010 Elsevier Inc. All rights reserved.
Global Mapping of the Yeast Genetic Interaction Network
NASA Astrophysics Data System (ADS)
Tong, Amy Hin Yan; Lesage, Guillaume; Bader, Gary D.; Ding, Huiming; Xu, Hong; Xin, Xiaofeng; Young, James; Berriz, Gabriel F.; Brost, Renee L.; Chang, Michael; Chen, YiQun; Cheng, Xin; Chua, Gordon; Friesen, Helena; Goldberg, Debra S.; Haynes, Jennifer; Humphries, Christine; He, Grace; Hussein, Shamiza; Ke, Lizhu; Krogan, Nevan; Li, Zhijian; Levinson, Joshua N.; Lu, Hong; Ménard, Patrice; Munyana, Christella; Parsons, Ainslie B.; Ryan, Owen; Tonikian, Raffi; Roberts, Tania; Sdicu, Anne-Marie; Shapiro, Jesse; Sheikh, Bilal; Suter, Bernhard; Wong, Sharyl L.; Zhang, Lan V.; Zhu, Hongwei; Burd, Christopher G.; Munro, Sean; Sander, Chris; Rine, Jasper; Greenblatt, Jack; Peter, Matthias; Bretscher, Anthony; Bell, Graham; Roth, Frederick P.; Brown, Grant W.; Andrews, Brenda; Bussey, Howard; Boone, Charles
2004-02-01
A genetic interaction network containing ~1000 genes and ~4000 interactions was mapped by crossing mutations in 132 different query genes into a set of ~4700 viable gene yeast deletion mutants and scoring the double mutant progeny for fitness defects. Network connectivity was predictive of function because interactions often occurred among functionally related genes, and similar patterns of interactions tended to identify components of the same pathway. The genetic network exhibited dense local neighborhoods; therefore, the position of a gene on a partially mapped network is predictive of other genetic interactions. Because digenic interactions are common in yeast, similar networks may underlie the complex genetics associated with inherited phenotypes in other organisms.
Shimoni, Yishai
2018-02-01
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.
2018-01-01
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520
Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses
Bhat, Prajwal; Yang, Haixuan; Bögre, László; Devoto, Alessandra; Paccanaro, Alberto
2012-01-01
The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/]. PMID:22879875
Singh, Anuradha; Mantri, Shrikant; Sharma, Monica; Chaudhury, Ashok; Tuli, Rakesh; Roy, Joy
2014-01-16
The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers.
2014-01-01
Background The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Results Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. Conclusions The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers. PMID:24433256
Gaffoor, Iffa; Brown, Daren W.; Plattner, Ron; Proctor, Robert H.; Qi, Weihong; Trail, Frances
2005-01-01
Polyketides are a class of secondary metabolites that exhibit a vast diversity of form and function. In fungi, these compounds are produced by large, multidomain enzymes classified as type I polyketide synthases (PKSs). In this study we identified and functionally disrupted 15 PKS genes from the genome of the filamentous fungus Gibberella zeae. Five of these genes are responsible for producing the mycotoxins zearalenone, aurofusarin, and fusarin C and the black perithecial pigment. A comprehensive expression analysis of the 15 genes revealed diverse expression patterns during grain colonization, plant colonization, sexual development, and mycelial growth. Expression of one of the PKS genes was not detected under any of 18 conditions tested. This is the first study to genetically characterize a complete set of PKS genes from a single organism. PMID:16278459
Bao, Weier; Greenwold, Matthew J; Sawyer, Roger H
2017-11-01
Gene co-expression network analysis has been a research method widely used in systematically exploring gene function and interaction. Using the Weighted Gene Co-expression Network Analysis (WGCNA) approach to construct a gene co-expression network using data from a customized 44K microarray transcriptome of chicken epidermal embryogenesis, we have identified two distinct modules that are highly correlated with scale or feather development traits. Signaling pathways related to feather development were enriched in the traditional KEGG pathway analysis and functional terms relating specifically to embryonic epidermal development were also enriched in the Gene Ontology analysis. Significant enrichment annotations were discovered from customized enrichment tools such as Modular Single-Set Enrichment Test (MSET) and Medical Subject Headings (MeSH). Hub genes in both trait-correlated modules showed strong specific functional enrichment toward epidermal development. Also, regulatory elements, such as transcription factors and miRNAs, were targeted in the significant enrichment result. This work highlights the advantage of this methodology for functional prediction of genes not previously associated with scale- and feather trait-related modules.
Chen, Wei-Hua; Lu, Guanting; Chen, Xiao; Zhao, Xing-Ming; Bork, Peer
2017-01-04
OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Polyploidization altered gene functions in cotton (Gossypium spp.)
USDA-ARS?s Scientific Manuscript database
Cotton fibers are seed trichomes derived from individual cells of the epidermal layer of the seed coat. It has been known for a long time that a large set of genes determine the development of cotton fiber, and more recently it has been determined that these genes are distributed across the At and ...
A functional polymorphism of the TNF-{alpha} gene that is associated with type 2 DM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Susa, Shinji; Daimon, Makoto; Sakabe, Jun-Ichi
2008-05-09
To examine the association of the tumor necrosis factor-{alpha} (TNF-{alpha}) gene region with type 2 diabetes (DM), 11 single-nucleotide polymorphisms (SNPs) of the region were analyzed. The initial study using a sample set (148 cases vs. 227 controls) showed a significant association of the SNP IVS1G + 123A of the TNF-{alpha} gene with DM (p = 0.0056). Multiple logistic regression analysis using an enlarged sample set (225 vs. 716) revealed the significant association of the SNP with DM independently of any clinical traits examined (OR: 1.49, p = 0.014). The functional relevance of the SNP were examined by the electrophoreticmore » mobility shift assays using nuclear extracts from the U937 and NIH3T3 cells and luciferase assays in these cells with Simian virus 40 promoter- and TNF-{alpha} promoter-reporter gene constructs. The functional analyses showed that YY1 transcription factor bound allele-specifically to the SNP region and, the IVS1 + 123A allele had an increase in luciferase expression compared with the G allele.« less
Partitioning of functional gene expression data using principal points.
Kim, Jaehee; Kim, Haseong
2017-10-12
DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system. A principal points based functional partitioning method is proposed for time-course gene expression data. The method explores the relationship between genes using Legendre coefficients as principal points to extract the features of gene functions. Our proposed method provides high connectivity in connectedness after clustering for simulated data and finds a significant subsets of genes with the increased connectivity. Our approach has comparative advantages that fewer coefficients are used from the functional data and self-consistency of principal points for partitioning. As real data applications, we are able to find partitioned genes through the gene expressions found in budding yeast data and Escherichia coli data. The proposed method benefitted from the use of principal points, dimension reduction, and choice of orthogonal basis system as well as provides appropriately connected genes in the resulting subsets. We illustrate our method by applying with each set of cell-cycle-regulated time-course yeast genes and E. coli genes. The proposed method is able to identify highly connected genes and to explore the complex dynamics of biological systems in functional genomics.
Rough Set Soft Computing Cancer Classification and Network: One Stone, Two Birds
Zhang, Yue
2010-01-01
Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article. PMID:20706619
Yao, Heng; Wang, Xiaoxuan; Chen, Pengcheng; Hai, Ling; Jin, Kang; Yao, Lixia; Mao, Chuanzao; Chen, Xin
2018-05-01
An advanced functional understanding of omics data is important for elucidating the design logic of physiological processes in plants and effectively controlling desired traits in plants. We present the latest versions of the Predicted Arabidopsis Interactome Resource (PAIR) and of the gene set linkage analysis (GSLA) tool, which enable the interpretation of an observed transcriptomic change (differentially expressed genes [DEGs]) in Arabidopsis ( Arabidopsis thaliana ) with respect to its functional impact for biological processes. PAIR version 5.0 integrates functional association data between genes in multiple forms and infers 335,301 putative functional interactions. GSLA relies on this high-confidence inferred functional association network to expand our perception of the functional impacts of an observed transcriptomic change. GSLA then interprets the biological significance of the observed DEGs using established biological concepts (annotation terms), describing not only the DEGs themselves but also their potential functional impacts. This unique analytical capability can help researchers gain deeper insights into their experimental results and highlight prospective directions for further investigation. We demonstrate the utility of GSLA with two case studies in which GSLA uncovered how molecular events may have caused physiological changes through their collective functional influence on biological processes. Furthermore, we showed that typical annotation-enrichment tools were unable to produce similar insights to PAIR/GSLA. The PAIR version 5.0-inferred interactome and GSLA Web tool both can be accessed at http://public.synergylab.cn/pair/. © 2018 American Society of Plant Biologists. All Rights Reserved.
2014-08-15
characterized genes from Bacillus subtilis , that is presented in a constitutive expression module. If the B. subtilis gene containing M. mycoides mutant is...essential gene MMYC_0361 with the rlmH gene from Bacillus subtilis . Mycoplasma mycoides containing the B. subtilis rlmH was viable. This tells us the...viable than the function of the conserved hypothetical gene is the same as the input B. subtilis gene. Table of Contents: Section
Discovering monotonic stemness marker genes from time-series stem cell microarray data.
Wang, Hsei-Wei; Sun, Hsing-Jen; Chang, Ting-Yu; Lo, Hung-Hao; Cheng, Wei-Chung; Tseng, George C; Lin, Chin-Teng; Chang, Shing-Jyh; Pal, Nikhil; Chung, I-Fang
2015-01-01
Identification of genes with ascending or descending monotonic expression patterns over time or stages of stem cells is an important issue in time-series microarray data analysis. We propose a method named Monotonic Feature Selector (MFSelector) based on a concept of total discriminating error (DEtotal) to identify monotonic genes. MFSelector considers various time stages in stage order (i.e., Stage One vs. other stages, Stages One and Two vs. remaining stages and so on) and computes DEtotal of each gene. MFSelector can successfully identify genes with monotonic characteristics. We have demonstrated the effectiveness of MFSelector on two synthetic data sets and two stem cell differentiation data sets: embryonic stem cell neurogenesis (ESCN) and embryonic stem cell vasculogenesis (ESCV) data sets. We have also performed extensive quantitative comparisons of the three monotonic gene selection approaches. Some of the monotonic marker genes such as OCT4, NANOG, BLBP, discovered from the ESCN dataset exhibit consistent behavior with that reported in other studies. The role of monotonic genes found by MFSelector in either stemness or differentiation is validated using information obtained from Gene Ontology analysis and other literature. We justify and demonstrate that descending genes are involved in the proliferation or self-renewal activity of stem cells, while ascending genes are involved in differentiation of stem cells into variant cell lineages. We have developed a novel system, easy to use even with no pre-existing knowledge, to identify gene sets with monotonic expression patterns in multi-stage as well as in time-series genomics matrices. The case studies on ESCN and ESCV have helped to get a better understanding of stemness and differentiation. The novel monotonic marker genes discovered from a data set are found to exhibit consistent behavior in another independent data set, demonstrating the utility of the proposed method. The MFSelector R function and data sets can be downloaded from: http://microarray.ym.edu.tw/tools/MFSelector/.
Song, Hyun-Seob; McClure, Ryan S.; Bernstein, Hans C.; ...
2015-03-27
Cyanobacteria dynamically relay environmental inputs to intracellular adaptations through a coordinated adjustment of photosynthetic efficiency and carbon processing rates. The output of such adaptations is reflected through changes in transcriptional patterns and metabolic flux distributions that ultimately define growth strategy. To address interrelationships between metabolism and regulation, we performed integrative analyses of metabolic and gene co-expression networks in a model cyanobacterium, Synechococcus sp. PCC 7002. Centrality analyses using the gene co-expression network identified a set of key genes, which were defined here as ‘topologically important.’ Parallel in silico gene knock-out simulations, using the genome-scale metabolic network, classified what we termedmore » as ‘functionally important’ genes, deletion of which affected growth or metabolism. A strong positive correlation was observed between topologically and functionally important genes. Functionally important genes exhibited variable levels of topological centrality; however, the majority of topologically central genes were found to be functionally essential for growth. Subsequent functional enrichment analysis revealed that both functionally and topologically important genes in Synechococcus sp. PCC 7002 are predominantly associated with translation and energy metabolism, two cellular processes critical for growth. This research demonstrates how synergistic network-level analyses can be used for reconciliation of metabolic and gene expression data to uncover fundamental biological principles.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Hyun-Seob; McClure, Ryan S.; Bernstein, Hans C.
Cyanobacteria dynamically relay environmental inputs to intracellular adaptations through a coordinated adjustment of photosynthetic efficiency and carbon processing rates. The output of such adaptations is reflected through changes in transcriptional patterns and metabolic flux distributions that ultimately define growth strategy. To address interrelationships between metabolism and regulation, we performed integrative analyses of metabolic and gene co-expression networks in a model cyanobacterium, Synechococcus sp. PCC 7002. Centrality analyses using the gene co-expression network identified a set of key genes, which were defined here as ‘topologically important.’ Parallel in silico gene knock-out simulations, using the genome-scale metabolic network, classified what we termedmore » as ‘functionally important’ genes, deletion of which affected growth or metabolism. A strong positive correlation was observed between topologically and functionally important genes. Functionally important genes exhibited variable levels of topological centrality; however, the majority of topologically central genes were found to be functionally essential for growth. Subsequent functional enrichment analysis revealed that both functionally and topologically important genes in Synechococcus sp. PCC 7002 are predominantly associated with translation and energy metabolism, two cellular processes critical for growth. This research demonstrates how synergistic network-level analyses can be used for reconciliation of metabolic and gene expression data to uncover fundamental biological principles.« less
Meta genome-wide network from functional linkages of genes in human gut microbial ecosystems.
Ji, Yan; Shi, Yixiang; Wang, Chuan; Dai, Jianliang; Li, Yixue
2013-03-01
The human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.
DFP: a Bioconductor package for fuzzy profile identification and gene reduction of microarray data
Glez-Peña, Daniel; Álvarez, Rodrigo; Díaz, Fernando; Fdez-Riverola, Florentino
2009-01-01
Background Expression profiling assays done by using DNA microarray technology generate enormous data sets that are not amenable to simple analysis. The greatest challenge in maximizing the use of this huge amount of data is to develop algorithms to interpret and interconnect results from different genes under different conditions. In this context, fuzzy logic can provide a systematic and unbiased way to both (i) find biologically significant insights relating to meaningful genes, thereby removing the need for expert knowledge in preliminary steps of microarray data analyses and (ii) reduce the cost and complexity of later applied machine learning techniques being able to achieve interpretable models. Results DFP is a new Bioconductor R package that implements a method for discretizing and selecting differentially expressed genes based on the application of fuzzy logic. DFP takes advantage of fuzzy membership functions to assign linguistic labels to gene expression levels. The technique builds a reduced set of relevant genes (FP, Fuzzy Pattern) able to summarize and represent each underlying class (pathology). A last step constructs a biased set of genes (DFP, Discriminant Fuzzy Pattern) by intersecting existing fuzzy patterns in order to detect discriminative elements. In addition, the software provides new functions and visualisation tools that summarize achieved results and aid in the interpretation of differentially expressed genes from multiple microarray experiments. Conclusion DFP integrates with other packages of the Bioconductor project, uses common data structures and is accompanied by ample documentation. It has the advantage that its parameters are highly configurable, facilitating the discovery of biologically relevant connections between sets of genes belonging to different pathologies. This information makes it possible to automatically filter irrelevant genes thereby reducing the large volume of data supplied by microarray experiments. Based on these contributions GENECBR, a successful tool for cancer diagnosis using microarray datasets, has recently been released. PMID:19178723
DFP: a Bioconductor package for fuzzy profile identification and gene reduction of microarray data.
Glez-Peña, Daniel; Alvarez, Rodrigo; Díaz, Fernando; Fdez-Riverola, Florentino
2009-01-29
Expression profiling assays done by using DNA microarray technology generate enormous data sets that are not amenable to simple analysis. The greatest challenge in maximizing the use of this huge amount of data is to develop algorithms to interpret and interconnect results from different genes under different conditions. In this context, fuzzy logic can provide a systematic and unbiased way to both (i) find biologically significant insights relating to meaningful genes, thereby removing the need for expert knowledge in preliminary steps of microarray data analyses and (ii) reduce the cost and complexity of later applied machine learning techniques being able to achieve interpretable models. DFP is a new Bioconductor R package that implements a method for discretizing and selecting differentially expressed genes based on the application of fuzzy logic. DFP takes advantage of fuzzy membership functions to assign linguistic labels to gene expression levels. The technique builds a reduced set of relevant genes (FP, Fuzzy Pattern) able to summarize and represent each underlying class (pathology). A last step constructs a biased set of genes (DFP, Discriminant Fuzzy Pattern) by intersecting existing fuzzy patterns in order to detect discriminative elements. In addition, the software provides new functions and visualisation tools that summarize achieved results and aid in the interpretation of differentially expressed genes from multiple microarray experiments. DFP integrates with other packages of the Bioconductor project, uses common data structures and is accompanied by ample documentation. It has the advantage that its parameters are highly configurable, facilitating the discovery of biologically relevant connections between sets of genes belonging to different pathologies. This information makes it possible to automatically filter irrelevant genes thereby reducing the large volume of data supplied by microarray experiments. Based on these contributions GENECBR, a successful tool for cancer diagnosis using microarray datasets, has recently been released.
Identification and Functional Analysis of Healing Regulators in Drosophila
Álvarez-Fernández, Carmen; Tamirisa, Srividya; Prada, Federico; Chernomoretz, Ariel; Podhajcer, Osvaldo; Blanco, Enrique; Martín-Blanco, Enrique
2015-01-01
Wound healing is an essential homeostatic mechanism that maintains the epithelial barrier integrity after tissue damage. Although we know the overall steps in wound healing, many of the underlying molecular mechanisms remain unclear. Genetically amenable systems, such as wound healing in Drosophila imaginal discs, do not model all aspects of the repair process. However, they do allow the less understood aspects of the healing response to be explored, e.g., which signal(s) are responsible for initiating tissue remodeling? How is sealing of the epithelia achieved? Or, what inhibitory cues cancel the healing machinery upon completion? Answering these and other questions first requires the identification and functional analysis of wound specific genes. A variety of different microarray analyses of murine and humans have identified characteristic profiles of gene expression at the wound site, however, very few functional studies in healing regulation have been carried out. We developed an experimentally controlled method that is healing-permissive and that allows live imaging and biochemical analysis of cultured imaginal discs. We performed comparative genome-wide profiling between Drosophila imaginal cells actively involved in healing versus their non-engaged siblings. Sets of potential wound-specific genes were subsequently identified. Importantly, besides identifying and categorizing new genes, we functionally tested many of their gene products by genetic interference and overexpression in healing assays. This non-saturated analysis defines a relevant set of genes whose changes in expression level are functionally significant for proper tissue repair. Amongst these we identified the TCP1 chaperonin complex as a key regulator of the actin cytoskeleton essential for the wound healing response. There is promise that our newly identified wound-healing genes will guide future work in the more complex mammalian wound healing response. PMID:25647511
HD CAG-correlated gene expression changes support a simple dominant gain of function
Jacobsen, Jessie C.; Gregory, Gillian C.; Woda, Juliana M.; Thompson, Morgan N.; Coser, Kathryn R.; Murthy, Vidya; Kohane, Isaac S.; Gusella, James F.; Seong, Ihn Sik; MacDonald, Marcy E.; Shioda, Toshi; Lee, Jong-Min
2011-01-01
Huntington's disease is initiated by the expression of a CAG repeat-encoded polyglutamine region in full-length huntingtin, with dominant effects that vary continuously with CAG size. The mechanism could involve a simple gain of function or a more complex gain of function coupled to a loss of function (e.g. dominant negative-graded loss of function). To distinguish these alternatives, we compared genome-wide gene expression changes correlated with CAG size across an allelic series of heterozygous CAG knock-in mouse embryonic stem (ES) cell lines (HdhQ20/7, HdhQ50/7, HdhQ91/7, HdhQ111/7), to genes differentially expressed between Hdhex4/5/ex4/5 huntingtin null and wild-type (HdhQ7/7) parental ES cells. The set of 73 genes whose expression varied continuously with CAG length had minimal overlap with the 754-member huntingtin-null gene set but the two were not completely unconnected. Rather, the 172 CAG length-correlated pathways and 238 huntingtin-null significant pathways clustered into 13 shared categories at the network level. A closer examination of the energy metabolism and the lipid/sterol/lipoprotein metabolism categories revealed that CAG length-correlated genes and huntingtin-null-altered genes either were different members of the same pathways or were in unique, but interconnected pathways. Thus, varying the polyglutamine size in full-length huntingtin produced gene expression changes that were distinct from, but related to, the effects of lack of huntingtin. These findings support a simple gain-of-function mechanism acting through a property of the full-length huntingtin protein and point to CAG-correlative approaches to discover its effects. Moreover, for therapeutic strategies based on huntingtin suppression, our data highlight processes that may be more sensitive to the disease trigger than to decreased huntingtin levels. PMID:21536587
Dominguez, Daniel; Tsai, Yi-Hsuan; Gomez, Nicholas; Jha, Deepak Kumar; Davis, Ian; Wang, Zefeng
2016-01-01
Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes. PMID:27364684
The modest beginnings of one genome project.
Kaback, David B
2013-06-01
One of the top things on a geneticist's wish list has to be a set of mutants for every gene in their particular organism. Such a set was produced for the yeast, Saccharomyces cerevisiae near the end of the 20th century by a consortium of yeast geneticists. However, the functional genomic analysis of one chromosome, its smallest, had already begun more than 25 years earlier as a project that was designed to define most or all of that chromosome's essential genes by temperature-sensitive lethal mutations. When far fewer than expected genes were uncovered, the relatively new field of molecular cloning enabled us and indeed, the entire community of yeast researchers to approach this problem more definitively. These studies ultimately led to cloning, genomic sequencing, and the production and phenotypic analysis of the entire set of knockout mutations for this model organism as well as a better concept of what defines an essential function, a wish fulfilled that enables this model eukaryote to continue at the forefront of research in modern biology.
Bag, Susmita; Ramaiah, Sudha; Anbarasu, Anand
2015-01-07
Network study on genes and proteins offers functional basics of the complexity of gene and protein, and its interacting partners. The gene fatty acid-binding protein 4 (fabp4) is found to be highly expressed in adipose tissue, and is one of the most abundant proteins in mature adipocytes. Our investigations on functional modules of fabp4 provide useful information on the functional genes interacting with fabp4, their biochemical properties and their regulatory functions. The present study shows that there are eight set of candidate genes: acp1, ext2, insr, lipe, ostf1, sncg, usp15, and vim that are strongly and functionally linked up with fabp4. Gene ontological analysis of network modules of fabp4 provides an explicit idea on the functional aspect of fabp4 and its interacting nodes. The hierarchal mapping on gene ontology indicates gene specific processes and functions as well as their compartmentalization in tissues. The fabp4 along with its interacting genes are involved in lipid metabolic activity and are integrated in multi-cellular processes of tissues and organs. They also have important protein/enzyme binding activity. Our study elucidated disease-associated nsSNP prediction for fabp4 and it is interesting to note that there are four rsID׳s (rs1051231, rs3204631, rs140925685 and rs141169989) with disease allelic variation (T104P, T126P, G27D and G90V respectively). On the whole, our gene network analysis presents a clear insight about the interactions and functions associated with fabp4 gene network. Copyright © 2014 Elsevier Ltd. All rights reserved.
Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie
2011-09-12
Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
2011-01-01
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
Floris, Matteo; Orsini, Massimiliano; Thanaraj, Thangavel Alphonse
2008-10-02
It is often the case that mammalian genes are alternatively spliced; the resulting alternate transcripts often encode protein isoforms that differ in amino acid sequences. Changes among the protein isoforms can alter the cellular properties of proteins. The effect can range from a subtle modulation to a complete loss of function. (i) We examined human splice-mediated protein isoforms (as extracted from a manually curated data set, and from a computationally predicted data set) for differences in the annotation for protein signatures (Pfam domains and PRINTS fingerprints) and we characterized the differences & their effects on protein functionalities. An important question addressed relates to the extent of protein isoforms that may lack any known function in the cell. (ii) We present a database that reports differences in protein signatures among human splice-mediated protein isoform sequences. (i) Characterization: The work points to distinct sets of alternatively spliced genes with varying degrees of annotation for the splice-mediated protein isoforms. Protein molecular functions seen to be often affected are those that relate to: binding, catalytic, transcription regulation, structural molecule, transporter, motor, and antioxidant; and the processes that are often affected are nucleic acid binding, signal transduction, and protein-protein interactions. Signatures are often included/excluded and truncated in length among protein isoforms; truncation is seen as the predominant type of change. Analysis points to the following novel aspects: (a) Analysis using data from the manually curated Vega indicates that one in 8.9 genes can lead to a protein isoform of no "known" function; and one in 18 expressed protein isoforms can be such an "orphan" isoform; the corresponding numbers as seen with computationally predicted ASD data set are: one in 4.9 genes and one in 9.8 isoforms. (b) When swapping of signatures occurs, it is often between those of same functional classifications. (c) Pfam domains can occur in varying lengths, and PRINTS fingerprints can occur with varying number of constituent motifs among isoforms - since such a variation is seen in large number of genes, it could be a general mechanism to modulate protein function. (ii) The reported resource (at http://www.bioinformatica.crs4.org/tools/dbs/splivap/) provides the community ability to access data on splice-mediated protein isoforms (with value-added annotation such as association with diseases) through changes in protein signatures.
Serial analysis of gene expression (SAGE) in bovine trypanotolerance: preliminary results
2003-01-01
In Africa, trypanosomosis is a tsetse-transmitted disease which represents the most important constraint to livestock production. Several indigenous West African taurine (Bos taurus) breeds, such as the Longhorn (N'Dama) cattle are well known to control trypanosome infections. This genetic ability named "trypanotolerance" results from various biological mechanisms under multigenic control. The methodologies used so far have not succeeded in identifying the complete pool of genes involved in trypanotolerance. New post genomic biotechnologies such as transcriptome analyses are efficient in characterising the pool of genes involved in the expression of specific biological functions. We used the serial analysis of gene expression (SAGE) technique to construct, from Peripheral Blood Mononuclear Cells of an N'Dama cow, 2 total mRNA transcript libraries, at day 0 of a Trypanosoma congolense experimental infection and at day 10 post-infection, corresponding to the peak of parasitaemia. Bioinformatic comparisons in the bovine genomic databases allowed the identification of 187 up- and down- regulated genes, EST and unknown functional genes. Identification of the genes involved in trypanotolerance will allow to set up specific microarray sets for further metabolic and pharmacological studies and to design field marker-assisted selection by introgression programmes. PMID:12927079
Serial analysis of gene expression (SAGE) in bovine trypanotolerance: preliminary results.
Berthier, David; Quéré, Ronan; Thevenon, Sophie; Belemsaga, Désiré; Piquemal, David; Marti, Jacques; Maillard, Jean-Charles
2003-01-01
In Africa, trypanosomosis is a tsetse-transmitted disease which represents the most important constraint to livestock production. Several indigenous West African taurine Bos taurus) breeds, such as the Longhorn (N'Dama) cattle are well known to control trypanosome infections. This genetic ability named "trypanotolerance" results from various biological mechanisms under multigenic control. The methodologies used so far have not succeeded in identifying the complete pool of genes involved in trypanotolerance. New post genomic biotechnologies such as transcriptome analyses are efficient in characterising the pool of genes involved in the expression of specific biological functions. We used the serial analysis of gene expression (SAGE) technique to construct, from Peripheral Blood Mononuclear Cells of an N'Dama cow, 2 total mRNA transcript libraries, at day 0 of a Trypanosoma congolense experimental infection and at day 10 post-infection, corresponding to the peak of parasitaemia. Bioinformatic comparisons in the bovine genomic databases allowed the identification of 187 up- and down- regulated genes, EST and unknown functional genes. Identification of the genes involved in trypanotolerance will allow to set up specific microarray sets for further metabolic and pharmacological studies and to design field marker-assisted selection by introgression programmes.
Homologues of CsLOB1 in citrus function as disease susceptibility genes in citrus canker.
Zhang, Junli; Huguet-Tapia, Jose Carlos; Hu, Yang; Jones, Jeffrey; Wang, Nian; Liu, Sanzhen; White, Frank F
2017-08-01
The lateral organ boundary domain (LBD) genes encode a group of plant-specific proteins that function as transcription factors in the regulation of plant growth and development. Citrus sinensis lateral organ boundary 1 (CsLOB1) is a member of the LBD family and functions as a disease susceptibility gene in citrus bacterial canker (CBC). Thirty-four LBD members have been identified from the Citrus sinensis genome. We assessed the potential for additional members of LBD genes in citrus to function as surrogates for CsLOB1 in CBC, and compared host gene expression on induction of different LBD genes. Using custom-designed transcription activator-like (TAL) effectors, two members of the same clade as CsLOB1, named CsLOB2 and CsLOB3, were found to be capable of functioning similarly to CsLOB1 in CBC. RNA sequencing and quantitative reverse transcription-polymerase chain reaction analyses revealed a set of cell wall metabolic genes that are associated with CsLOB1, CsLOB2 and CsLOB3 expression and may represent downstream genes involved in CBC. © 2016 BSPP AND JOHN WILEY & SONS LTD.
Identification of the Core Set of Carbon-Associated Genes in a Bioenergy Grassland Soil
Howe, Adina; Yang, Fan; Williams, Ryan J.; ...
2016-11-17
Despite the central role of soil microbial communities in global carbon (C) cycling, little is known about soil microbial community structure and even less about their metabolic pathways. Efforts to characterize soil communities often focus on identifying differences in gene content across environmental gradients, but an alternative question is what genes are similar in soils. These genes may indicate critical species or potential functions that are required in all soils. Here we identified the “core” set of C cycling sequences widely present in multiple soil metagenomes from a fertilized prairie (FP). Of 226,887 sequences associated with known enzymes involved inmore » the synthesis, metabolism, and transport of carbohydrates, 843 were identified to be consistently prevalent across four replicate soil metagenomes. This core metagenome was functionally and taxonomically diverse, representing five enzyme classes and 99 enzyme families within the CAZy database. Though it only comprised 0.4% of all CAZy-associated genes identified in FP metagenomes, the core was found to be comprised of functions similar to those within cumulative soils. The FP CAZy-associated core sequences were present in multiple publicly available soil metagenomes and most similar to soils sharing geographic proximity. As a result, in soil ecosystems, where high diversity remains a key challenge for metagenomic investigations, these core genes represent a subset of critical functions necessary for carbohydrate metabolism, which can be targeted to evaluate important C fluxes in these and other similar soils.« less
VitisExpDB: a database resource for grape functional genomics.
Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L
2008-02-28
The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.
VitisExpDB: A database resource for grape functional genomics
Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L
2008-01-01
Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813
Naaijen, J; Bralten, J; Poelmans, G; Faraone, Stephen; Asherson, Philip; Banaschewski, Tobias; Buitelaar, Jan; Franke, Barbara; P Ebstein, Richard; Gill, Michael; Miranda, Ana; D Oades, Robert; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph; Sonuga-Barke, Edmund; Anney, Richard; Mulas, Fernando; Steinhausen, Hans-Christoph; Glennon, J C; Franke, B; Buitelaar, J K
2017-01-01
Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD. PMID:28072412
Thévenin, Annelyse; Ein-Dor, Liat; Ozery-Flato, Michal; Shamir, Ron
2014-09-01
Genomes undergo changes in organization as a result of gene duplications, chromosomal rearrangements and local mutations, among other mechanisms. In contrast to prokaryotes, in which genes of a common function are often organized in operons and reside contiguously along the genome, most eukaryotes show much weaker clustering of genes by function, except for few concrete functional groups. We set out to check systematically if there is a relation between gene function and gene organization in the human genome. We test this question for three types of functional groups: pairs of interacting proteins, complexes and pathways. We find a significant concentration of functional groups both in terms of their distance within the same chromosome and in terms of their dispersal over several chromosomes. Moreover, using Hi-C contact map of the tendency of chromosomal segments to appear close in the 3D space of the nucleus, we show that members of the same functional group that reside on distinct chromosomes tend to co-localize in space. The result holds for all three types of functional groups that we tested. Hence, the human genome shows substantial concentration of functional groups within chromosomes and across chromosomes in space. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Windhorst, Dafna A; Mileva-Seitz, Viara R; Rippe, Ralph C A; Tiemeier, Henning; Jaddoe, Vincent W V; Verhulst, Frank C; van IJzendoorn, Marinus H; Bakermans-Kranenburg, Marian J
2016-08-01
In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome-wide environmental interaction (GWEI) studies in the search for genetic variation underlying individual differences in behavior. Genetic variants in 12 autosomal dopaminergic genes were available in an ethnically homogenous part of a population-based cohort. Harsh parenting was assessed with maternal (n = 1881) and paternal (n = 1710) reports at age 3. Externalizing behavior was assessed with the Child Behavior Checklist (CBCL) at age 5 (71 ± 3.7 months). We conducted gene-set analyses of the association between variation in dopaminergic genes and externalizing behavior, stratified for harsh parenting. The association was statistically significant or approached significance for children without harsh parenting experiences, but was absent in the group with harsh parenting. Similarly, significant associations between single genes and externalizing behavior were only found in the group without harsh parenting. Effect sizes in the groups with and without harsh parenting did not differ significantly. Gene-environment interaction tests were conducted for individual genetic variants, resulting in two significant interaction effects (rs1497023 and rs4922132) after correction for multiple testing. Our findings are suggestive of G × E interplay, with associations between dopamine genes and externalizing behavior present in children without harsh parenting, but not in children with harsh parenting experiences. Harsh parenting may overrule the role of genetic factors in externalizing behavior. Gene-based and gene-set analyses offer promising new alternatives to analyses focusing on single candidate polymorphisms when examining the interplay between genetic and environmental factors.
Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre F R; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth
2015-07-01
Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. © 2015 American Heart Association, Inc.
Primer sets for cloning the human repertoire of T cell Receptor Variable regions.
Boria, Ilenia; Cotella, Diego; Dianzani, Irma; Santoro, Claudio; Sblattero, Daniele
2008-08-29
Amplification and cloning of naïve T cell Receptor (TR) repertoires or antigen-specific TR is crucial to shape immune response and to develop immuno-based therapies. TR variable (V) regions are encoded by several genes that recombine during T cell development. The cloning of expressed genes as large diverse libraries from natural sources relies upon the availability of primers able to amplify as many V genes as possible. Here, we present a list of primers computationally designed on all functional TR V and J genes listed in the IMGT, the ImMunoGeneTics information system. The list consists of unambiguous or degenerate primers suitable to theoretically amplify and clone the entire TR repertoire. We show that it is possible to selectively amplify and clone expressed TR V genes in one single RT-PCR step and from as little as 1000 cells. This new primer set will facilitate the creation of more diverse TR libraries than has been possible using currently available primer sets.
Analysis of functional importance of binding sites in the Drosophila gap gene network model.
Kozlov, Konstantin; Gursky, Vitaly V; Kulakovskiy, Ivan V; Dymova, Arina; Samsonova, Maria
2015-01-01
The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions. We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions. The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.
WormQTLHD—a web database for linking human disease to natural variation data in C. elegans
van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915
Lu, Jianguo; Peatman, Eric; Tang, Haibao; Lewis, Joshua; Liu, Zhanjiang
2012-06-15
Gene duplication has had a major impact on genome evolution. Localized (or tandem) duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes. Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks), and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish. We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication. This is particularly true for the zebrafish genome. Further analysis of the duplicated gene sets indicated that a significant portion of duplicated genes in the zebrafish genome were of recent, lineage-specific duplication events. Most strikingly, a subset of duplicated genes is enriched among the recently duplicated genes involved in immune or sensory response pathways. Such findings demonstrated the significance of continuous gene duplication as well as that of whole genome duplication in the course of genome evolution.
Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.
The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted functionmore » in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. Lastly, these results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.« less
Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides
Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.; ...
2018-03-09
The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted functionmore » in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. Lastly, these results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.« less
Dimerization of a Viral SET Protein Endows its Function
DOE Office of Scientific and Technical Information (OSTI.GOV)
H Wei; M Zhou
Histone modifications are regarded as the most indispensible phenomena in epigenetics. Of these modifications, lysine methylation is of the greatest complexity and importance as site- and state-specific lysine methylation exerts a plethora of effects on chromatin structure and gene transcription. Notably, paramecium bursaria chlorella viruses encode a conserved SET domain methyltransferase, termed vSET, that functions to suppress host transcription by methylating histone H3 at lysine 27 (H3K27), a mark for eukaryotic gene silencing. Unlike mammalian lysine methyltransferases (KMTs), vSET functions only as a dimer, but the underlying mechanism has remained elusive. In this study, we demonstrate that dimeric vSET operatesmore » with negative cooperativity between the two active sites and engages in H3K27 methylation one site at a time. New atomic structures of vSET in the free form and a ternary complex with S-adenosyl homocysteine and a histone H3 peptide and biochemical analyses reveal the molecular origin for the negative cooperativity and explain the substrate specificity of H3K27 methyltransferases. Our study suggests a 'walking' mechanism, by which vSET acts all by itself to globally methylate host H3K27, which is accomplished by the mammalian EZH2 KMT only in the context of the Polycomb repressive complex.« less
Di Narzo, Antonio Fabio; Kozlenkov, Alexey; Roussos, Panos; Hao, Ke; Hurd, Yasmin; Lewis, David A; Sibille, Etienne; Siever, Larry J; Koonin, Eugene; Dracheva, Stella
2014-09-15
Editing of the pre-mRNA for the serotonin receptor 2C (5-HT2CR) by site-specific adenosine deamination (A-to-I pre-mRNA editing) substantially increases the functional plasticity of this key neurotransmitter receptor and is thought to contribute to homeostatic mechanisms in neurons. 5-HT2CR mRNA editing generates up to 24 different receptor isoforms. The extent of editing correlates with 5-HT2CR functional activity: more highly edited isoforms exhibit the least function. Altered 5-HT2CR editing has been reported in postmortem brains of suicide victims. We report a comparative analysis of the connections among 5-HT2CR editing, genome-wide gene expression and DNA methylation in suicide victims, individuals with major depressive disorder and non-psychiatric controls. The results confirm previous findings of an overrepresentation of highly edited mRNA variants (which encode hypoactive 5-HT2CR receptors) in the brains of suicide victims. A large set of genes for which the expression level is associated with editing was detected. This signature set of editing-associated genes is significantly enriched for genes that are involved in synaptic transmission, genes that are preferentially expressed in neurons, and genes whose expression is correlated with the level of DNA methylation. Notably, we report that the link between 5-HT2CR editing and gene expression is disrupted in suicide victims. The results suggest that the postulated homeostatic function of 5-HT2CR editing is dysregulated in individuals who committed suicide. Published by Oxford University Press 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Dynamics of cellular level function and regulation derived from murine expression array data.
de Bivort, Benjamin; Huang, Sui; Bar-Yam, Yaneer
2004-12-21
A major open question of systems biology is how genetic and molecular components interact to create phenotypes at the cellular level. Although much recent effort has been dedicated to inferring effective regulatory influences within small networks of genes, the power of microarray bioinformatics has yet to be used to determine functional influences at the cellular level. In all cases of data-driven parameter estimation, the number of model parameters estimable from a set of data is strictly limited by the size of that set. Rather than infer parameters describing the detailed interactions of just a few genes, we chose a larger-scale investigation so that the cumulative effects of all gene interactions could be analyzed to identify the dynamics of cellular-level function. By aggregating genes into large groups with related behaviors (megamodules), we were able to determine the effective aggregate regulatory influences among 12 major gene groups in murine B lymphocytes over a variety of time steps. Intriguing observations about the behavior of cells at this high level of abstraction include: (i) a medium-term critical global transcriptional dependence on ATP-generating genes in the mitochondria, (ii) a longer-term dependence on glycolytic genes, (iii) the dual role of chromatin-reorganizing genes in transcriptional activation and repression, (iv) homeostasis-favoring influences, (v) the indication that, as a group, G protein-mediated signals are not concentration-dependent in their influence on target gene expression, and (vi) short-term-activating/long-term-repressing behavior of the cell-cycle system that reflects its oscillatory behavior.
A Compendium of Canine Normal Tissue Gene Expression
Chen, Qing-Rong; Wen, Xinyu; Khan, Javed; Khanna, Chand
2011-01-01
Background Our understanding of disease is increasingly informed by changes in gene expression between normal and abnormal tissues. The release of the canine genome sequence in 2005 provided an opportunity to better understand human health and disease using the dog as clinically relevant model. Accordingly, we now present the first genome-wide, canine normal tissue gene expression compendium with corresponding human cross-species analysis. Methodology/Principal Findings The Affymetrix platform was utilized to catalogue gene expression signatures of 10 normal canine tissues including: liver, kidney, heart, lung, cerebrum, lymph node, spleen, jejunum, pancreas and skeletal muscle. The quality of the database was assessed in several ways. Organ defining gene sets were identified for each tissue and functional enrichment analysis revealed themes consistent with known physio-anatomic functions for each organ. In addition, a comparison of orthologous gene expression between matched canine and human normal tissues uncovered remarkable similarity. To demonstrate the utility of this dataset, novel canine gene annotations were established based on comparative analysis of dog and human tissue selective gene expression and manual curation of canine probeset mapping. Public access, using infrastructure identical to that currently in use for human normal tissues, has been established and allows for additional comparisons across species. Conclusions/Significance These data advance our understanding of the canine genome through a comprehensive analysis of gene expression in a diverse set of tissues, contributing to improved functional annotation that has been lacking. Importantly, it will be used to inform future studies of disease in the dog as a model for human translational research and provides a novel resource to the community at large. PMID:21655323
Tumor-stroma interactions a trademark for metastasis.
Morales, Monica; Planet, Evarist; Arnal-Estape, Anna; Pavlovic, Milica; Tarragona, Maria; Gomis, Roger R
2011-10-01
We aimed to unravel genes that are significantly associated with metastasis in order to identify functions that support disseminated disease. We identify genes associated with metastasis and verify its clinical correlations using publicly available primary tumor expression profile data sets. We used facilities in R and Bioconductor (GSEA). Specific data structures and functions were imported. Our results show that genes associated with metastasis in primary tumor enriched for pathways associated with immune infiltration or cytokine-cytokine receptor interaction. As an example, we focus on the enrichment of TGFBR2 and TGF|X A set of communication tools capital for tumor-stroma interactions that define metastasis to the lung and support bone colonization. We showed that tumor-stroma communication through cytokine-cytokine receptor interaction pathway is selected in primary tumors with high risk of relapse. High levels of these factors support systemic instigation of the far metastatic nest as well as local metastatic-specific functions that provide solid ground for metastatic development. Copyright © 2011 Elsevier Ltd. All rights reserved.
Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J
2008-01-01
ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Luo, Yonglun; Blechingberg, Jenny; Fernandes, Ana Miguel; Li, Shengting; Fryland, Tue; Børglum, Anders D; Bolund, Lars; Nielsen, Anders Lade
2015-11-14
FUS (TLS) and EWS (EWSR1) belong to the FET-protein family of RNA and DNA binding proteins. FUS and EWS are structurally and functionally related and participate in transcriptional regulation and RNA processing. FUS and EWS are identified in translocation generated cancer fusion proteins and involved in the human neurological diseases amyotrophic lateral sclerosis and fronto-temporal lobar degeneration. To determine the gene regulatory functions of FUS and EWS at the level of chromatin, we have performed chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq). Our results show that FUS and EWS bind to a subset of actively transcribed genes, that binding often is downstream the poly(A)-signal, and that binding overlaps with RNA polymerase II. Functional examinations of selected target genes identified that FUS and EWS can regulate gene expression at different levels. Gene Ontology analyses showed that FUS and EWS target genes preferentially encode proteins involved in regulatory processes at the RNA level. The presented results yield new insights into gene interactions of EWS and FUS and have identified a set of FUS and EWS target genes involved in pathways at the RNA regulatory level with potential to mediate normal and disease-associated functions of the FUS and EWS proteins.
Functional and evolutionary insights from the Ciona notochord transcriptome.
Reeves, Wendy M; Wu, Yuye; Harder, Matthew J; Veeman, Michael T
2017-09-15
The notochord of the ascidian Ciona consists of only 40 cells, and is a longstanding model for studying organogenesis in a small, simple embryo. Here, we perform RNAseq on flow-sorted notochord cells from multiple stages to define a comprehensive Ciona notochord transcriptome. We identify 1364 genes with enriched expression and extensively validate the results by in situ hybridization. These genes are highly enriched for Gene Ontology terms related to the extracellular matrix, cell adhesion and cytoskeleton. Orthologs of 112 of the Ciona notochord genes have known notochord expression in vertebrates, more than twice as many as predicted by chance alone. This set of putative effector genes with notochord expression conserved from tunicates to vertebrates will be invaluable for testing hypotheses about notochord evolution. The full set of Ciona notochord genes provides a foundation for systems-level studies of notochord gene regulation and morphogenesis. We find only modest overlap between this set of notochord-enriched transcripts and the genes upregulated by ectopic expression of the key notochord transcription factor Brachyury, indicating that Brachyury is not a notochord master regulator gene as strictly defined. © 2017. Published by The Company of Biologists Ltd.
Differential Sensitivity of Target Genes to Translational Repression by miR-17~92
Jin, Hyun Yong; Oda, Hiroyo; Chen, Pengda; Kang, Seung Goo; Valentine, Elizabeth; Liao, Lujian; Zhang, Yaoyang; Gonzalez-Martin, Alicia; Shepherd, Jovan; Head, Steven R.; Kim, Pyeung-Hyeun; Fu, Guo; Liu, Wen-Hsien; Han, Jiahuai
2017-01-01
MicroRNAs (miRNAs) are thought to exert their functions by modulating the expression of hundreds of target genes and each to a small degree, but it remains unclear how small changes in hundreds of target genes are translated into the specific function of a miRNA. Here, we conducted an integrated analysis of transcriptome and translatome of primary B cells from mutant mice expressing miR-17~92 at three different levels to address this issue. We found that target genes exhibit differential sensitivity to miRNA suppression and that only a small fraction of target genes are actually suppressed by a given concentration of miRNA under physiological conditions. Transgenic expression and deletion of the same miRNA gene regulate largely distinct sets of target genes. miR-17~92 controls target gene expression mainly through translational repression and 5’UTR plays an important role in regulating target gene sensitivity to miRNA suppression. These findings provide molecular insights into a model in which miRNAs exert their specific functions through a small number of key target genes. PMID:28241004
Trevisi, Paolo; Priori, Davide; Motta, Vincenzo; Luise, Diana; Jansman, Alfons J M; Koopmans, Sietse-Jan; Bosi, Paolo
2017-01-01
The stomach is an underestimated key interface between the ingesta and the digestive system, affecting the digestion and playing an important role in several endocrine functions. The quality of starter microbiota and the early life feeding of medium chain triglycerides may affect porcine gastric maturation. Two trials (T1, T2) were carried out on 12 and 24 cesarean-delivered piglets (birth, d0), divided over two microbiota treatments, but slaughtered and sampled at two or three weeks of age, respectively. All piglets were fed orally: sow serum (T1) or pasteurized sow colostrum (T2) on d0; simple starter microbiota ( Lactobacillus amylovorus , Clostridium glycolicum and Parabacteroides spp.) (d1-d3); complex microbiota inoculum (sow diluted feces, CA) or a placebo (simple association, SA) (d3-d4) and milk replacer ad libitum (d0-d4). The The T1 piglets and half of the T2 piglets were then fed a moist diet (CTRL); the remaining half of the T2 piglets were fed the CTRL diet fortified with medium chain triglycerides and 7% coconut oil (MCT). Total mRNA from the oxyntic mucosa was analyzed using Affymetrix©Porcine Gene array strips. Exploratory functional analysis of the resulting values was carried out using Gene Set Enrichment Analysis. Complex microbiota upregulated 11 gene sets in piglets of each age group vs. SA. Of these sets, 6 were upregulated at both ages, including the set of gene markers of oxyntic mucosa. In comparison with the piglets receiving SA, the CA enriched the genes in the sets related to interferon response when the CTRL diet was given while the same sets were impoverished by CA with the MCT diet. Early colonization with a complex starter microbiota promoted the functional maturation of the oxyntic mucosa in an age-dependent manner. The dietary fatty acid source may have affected the recruitment and the maturation of the immune cells, particularly when the piglets were early associated with a simplified starter microbiota.
Feng, Juerong; Zhou, Rui; Chang, Ying; Liu, Jing; Zhao, Qiu
2017-01-01
Hepatocellular carcinoma (HCC) has a high incidence and mortality worldwide, and its carcinogenesis and progression are influenced by a complex network of gene interactions. A weighted gene co-expression network was constructed to identify gene modules associated with the clinical traits in HCC (n = 214). Among the 13 modules, high correlation was only found between the red module and metastasis risk (classified by the HCC metastasis gene signature) (R2 = −0.74). Moreover, in the red module, 34 network hub genes for metastasis risk were identified, six of which (ABAT, AGXT, ALDH6A1, CYP4A11, DAO and EHHADH) were also hub nodes in the protein-protein interaction network of the module genes. Thus, a total of six hub genes were identified. In validation, all hub genes showed a negative correlation with the four-stage HCC progression (P for trend < 0.05) in the test set. Furthermore, in the training set, HCC samples with any hub gene lowly expressed demonstrated a higher recurrence rate and poorer survival rate (hazard ratios with 95% confidence intervals > 1). RNA-sequencing data of 142 HCC samples showed consistent results in the prognosis. Gene set enrichment analysis (GSEA) demonstrated that in the samples with any hub gene highly expressed, a total of 24 functional gene sets were enriched, most of which focused on amino acid metabolism and oxidation. In conclusion, co-expression network analysis identified six hub genes in association with HCC metastasis risk and prognosis, which might improve the prognosis by influencing amino acid metabolism and oxidation. PMID:28430663
The promises and pitfalls of RNA-interference-based therapeutics
Castanotto, Daniela; Rossi, John J.
2009-01-01
The discovery that gene expression can be controlled by the Watson–Crick base-pairing of small RNAs with messenger RNAs containing complementary sequence — a process known as RNA interference — has markedly advanced our understanding of eukaryotic gene regulation and function. The ability of short RNA sequences to modulate gene expression has provided a powerful tool with which to study gene function and is set to revolutionize the treatment of disease. Remarkably, despite being just one decade from its discovery, the phenomenon is already being used therapeutically in human clinical trials, and biotechnology companies that focus on RNA-interference-based therapeutics are already publicly traded. PMID:19158789
Distinctive gene expression profiles characterize donor biopsies from HCV-positive kidney donors.
Mas, Valeria R; Archer, Kellie J; Suh, Lacey; Scian, Mariano; Posner, Marc P; Maluf, Daniel G
2010-12-15
Because of the shortage of organs for transplantation, procurement of kidneys from extended criteria donors is inevitable. Frequently, donors infected with hepatitis C virus (HCV) are used. To elucidate an initial compromise of molecular pathways in HCV graft, gene expression profiles were evaluated. Twenty-four donor allograft biopsies (n=12 HCV positive (+) and n=12 HCV negative (-)) were collected at preimplantation time and profiled using microarrays. Donors were age, race, gender, and cold and warm ischemia time matched between groups. Probe level data were read into the R programming environment using the affy Bioconductor package, and the robust multiarray average method was used to obtain probe set expression summaries. To identify probe sets exhibiting differential expression, a two sample t test was performed. Molecular and biologic functions were analyzed using Interaction Networks and Functional Analysis. Fifty-eight probe sets were differentially expressed between HCV (+) versus HCV (-) donors (P<0.001). The molecular functions associated with the two top scored networks from the analysis of the differentially expressed genes were connective tissue development and function and tissue morphology (score 34), cell death, cell signaling, cellular assembly, and organization (score 32). Among the differentially affected top canonical pathways, we found the role of RIG1-like receptors in antiviral innate immunity (P<0.001), natural killer cell signaling (P=0.007), interleukin-8 signaling (P=0.048), interferon signaling (P=0.0 11; INFA21, INFGR1, and MED14), ILK signaling (P=0.001), and apoptosis signaling. A unique gene expression pattern was identified in HCV (+) kidney grafts. Innate immune system and inflammatory pathways were the most affected.
Bonfiglio, F; Henström, M; Nag, A; Hadizadeh, F; Zheng, T; Cenit, M C; Tigchelaar, E; Williams, F; Reznichenko, A; Ek, W E; Rivera, N V; Homuth, G; Aghdassi, A A; Kacprowski, T; Männikkö, M; Karhunen, V; Bujanda, L; Rafter, J; Wijmenga, C; Ronkainen, J; Hysi, P; Zhernakova, A; D'Amato, M
2018-04-19
Irritable bowel syndrome (IBS) shows genetic predisposition, however, large-scale, powered gene mapping studies are lacking. We sought to exploit existing genetic (genotype) and epidemiological (questionnaire) data from a series of population-based cohorts for IBS genome-wide association studies (GWAS) and their meta-analysis. Based on questionnaire data compatible with Rome III Criteria, we identified a total of 1335 IBS cases and 9768 asymptomatic individuals from 5 independent European genotyped cohorts. Individual GWAS were carried out with sex-adjusted logistic regression under an additive model, followed by meta-analysis using the inverse variance method. Functional annotation of significant results was obtained via a computational pipeline exploiting ontology and interaction networks, and tissue-specific and gene set enrichment analyses. Suggestive GWAS signals (P ≤ 5.0 × 10 -6 ) were detected for 7 genomic regions, harboring 64 gene candidates to affect IBS risk via functional or expression changes. Functional annotation of this gene set convincingly (best FDR-corrected P = 3.1 × 10 -10 ) highlighted regulation of ion channel activity as the most plausible pathway affecting IBS risk. Our results confirm the feasibility of population-based studies for gene-discovery efforts in IBS, identify risk genes and loci to be prioritized in independent follow-ups, and pinpoint ion channels as important players and potential therapeutic targets warranting further investigation. © 2018 John Wiley & Sons Ltd.
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool
2013-01-01
Background System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Results Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Conclusions Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr. PMID:23586463
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.
Chen, Edward Y; Tan, Christopher M; Kou, Yan; Duan, Qiaonan; Wang, Zichen; Meirelles, Gabriela Vaz; Clark, Neil R; Ma'ayan, Avi
2013-04-15
System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.
Gene expression profiles in whole blood and associations with metabolic dysregulation in obesity.
Cox, Amanda J; Zhang, Ping; Evans, Tiffany J; Scott, Rodney J; Cripps, Allan W; West, Nicholas P
Gene expression data provides one tool to gain further insight into the complex biological interactions linking obesity and metabolic disease. This study examined associations between blood gene expression profiles and metabolic disease in obesity. Whole blood gene expression profiles, performed using the Illumina HT-12v4 Human Expression Beadchip, were compared between (i) individuals with obesity (O) or lean (L) individuals (n=21 each), (ii) individuals with (M) or without (H) Metabolic Syndrome (n=11 each) matched on age and gender. Enrichment of differentially expressed genes (DEG) into biological pathways was assessed using Ingenuity Pathway Analysis. Association between sets of genes from biological pathways considered functionally relevant and Metabolic Syndrome were further assessed using an area under the curve (AUC) and cross-validated classification rate (CR). For OvL, only 50 genes were significantly differentially expressed based on the selected differential expression threshold (1.2-fold, p<0.05). For MvH, 582 genes were significantly differentially expressed (1.2-fold, p<0.05) and pathway analysis revealed enrichment of DEG into a diverse set of pathways including immune/inflammatory control, insulin signalling and mitochondrial function pathways. Gene sets from the mTOR signalling pathways demonstrated the strongest association with Metabolic Syndrome (p=8.1×10 -8 ; AUC: 0.909, CR: 72.7%). These results support the use of expression profiling in whole blood in the absence of more specific tissue types for investigations of metabolic disease. Using a pathway analysis approach it was possible to identify an enrichment of DEG into biological pathways that could be targeted for in vitro follow-up. Copyright © 2017 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
Discovering semantic features in the literature: a foundation for building functional associations
Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto
2006-01-01
Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716
Ping, Yanyan; Deng, Yulan; Wang, Li; Zhang, Hongyi; Zhang, Yong; Xu, Chaohan; Zhao, Hongying; Fan, Huihui; Yu, Fulong; Xiao, Yun; Li, Xia
2015-01-01
The driver genetic aberrations collectively regulate core cellular processes underlying cancer development. However, identifying the modules of driver genetic alterations and characterizing their functional mechanisms are still major challenges for cancer studies. Here, we developed an integrative multi-omics method CMDD to identify the driver modules and their affecting dysregulated genes through characterizing genetic alteration-induced dysregulated networks. Applied to glioblastoma (GBM), the CMDD identified a core gene module of 17 genes, including seven known GBM drivers, and their dysregulated genes. The module showed significant association with shorter survival of GBM. When classifying driver genes in the module into two gene sets according to their genetic alteration patterns, we found that one gene set directly participated in the glioma pathway, while the other indirectly regulated the glioma pathway, mostly, via their dysregulated genes. Both of the two gene sets were significant contributors to survival and helpful for classifying GBM subtypes, suggesting their critical roles in GBM pathogenesis. Also, by applying the CMDD to other six cancers, we identified some novel core modules associated with overall survival of patients. Together, these results demonstrate integrative multi-omics data can identify driver modules and uncover their dysregulated genes, which is useful for interpreting cancer genome. PMID:25653168
The Pathway Coexpression Network: Revealing pathway relationships
Tanzi, Rudolph E.
2018-01-01
A goal of genomics is to understand the relationships between biological processes. Pathways contribute to functional interplay within biological processes through complex but poorly understood interactions. However, limited functional references for global pathway relationships exist. Pathways from databases such as KEGG and Reactome provide discrete annotations of biological processes. Their relationships are currently either inferred from gene set enrichment within specific experiments, or by simple overlap, linking pathway annotations that have genes in common. Here, we provide a unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) to establish the Pathway Coexpression Network (PCxN). We estimated the correlation between canonical pathways valid in a broad context using a curated collection of 3,207 microarrays from 72 normal human tissues. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. We demonstrate that PCxN provides novel insight into mechanisms of complex diseases using an Alzheimer’s Disease (AD) case study. PCxN retrieved pathways significantly correlated with an expert curated AD gene list. These pathways have known associations with AD and were significantly enriched for genes independently associated with AD. As a further step, we show how PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways. PCxN revealed that correlated pathways from an AD expression profiling study include functional clusters involved in cell adhesion and oxidative stress. PCxN provides expanded connections to pathways from the extracellular matrix. PCxN provides a powerful new framework for interrogation of global pathway relationships. Comprehensive exploration of PCxN can be performed at http://pcxn.org/. PMID:29554099
ADAGE signature analysis: differential expression analysis with data-defined gene sets.
Tan, Jie; Huyck, Matthew; Hu, Dongbo; Zelaya, René A; Hogan, Deborah A; Greene, Casey S
2017-11-22
Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.
A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes
Liu, Jian; Liu, Jin-Xing; Gao, Ying-Lian; Kong, Xiang-Zhen; Wang, Xue-Song; Wang, Dong
2015-01-01
In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data. PMID:26201006
A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes.
Liu, Jian; Liu, Jin-Xing; Gao, Ying-Lian; Kong, Xiang-Zhen; Wang, Xue-Song; Wang, Dong
2015-01-01
In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data.
PGMapper: a web-based tool linking phenotype to genes.
Xiong, Qing; Qiu, Yuhui; Gu, Weikuan
2008-04-01
With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.
Polygenic overlap between schizophrenia risk and antipsychotic response: a genomic medicine approach
Ruderfer, Douglas M; Charney, Alexander W; Readhead, Ben; Kidd, Brian A; Kähler, Anna K; Kenny, Paul J; Keiser, Michael J; Moran, Jennifer L; Hultman, Christina M; Scott, Stuart A; Sullivan, Patrick F; Purcell, Shaun M; Dudley, Joel T; Sklar, Pamela
2016-01-01
Summary Background Therapeutic treatments for schizophrenia do not alleviate symptoms for all patients and efficacy is limited by common, often severe, side-effects. Genetic studies of disease can identify novel drug targets, and drugs for which the mechanism has direct genetic support have increased likelihood of clinical success. Large-scale genetic studies of schizophrenia have increased the number of genes and gene sets associated with risk. We aimed to examine the overlap between schizophrenia risk loci and gene targets of a comprehensive set of medications to potentially inform and improve treatment of schizophrenia. Methods We defined schizophrenia risk loci as genomic regions reaching genome-wide significance in the latest Psychiatric Genomics Consortium schizophrenia genome-wide association study (GWAS) of 36 989 cases and 113 075 controls and loss of function variants observed only once among 5079 individuals in an exome-sequencing study of 2536 schizophrenia cases and 2543 controls (Swedish Schizophrenia Study). Using two large and orthogonally created databases, we collated drug targets into 167 gene sets targeted by pharmacologically similar drugs and examined enrichment of schizophrenia risk loci in these sets. We further linked the exome-sequenced data with a national drug registry (the Swedish Prescribed Drug Register) to assess the contribution of rare variants to treatment response, using clozapine prescription as a proxy for treatment resistance. Findings We combined results from testing rare and common variation and, after correction for multiple testing, two gene sets were associated with schizophrenia risk: agents against amoebiasis and other protozoal diseases (106 genes, p=0·00046, pcorrected =0·024) and antipsychotics (347 genes, p=0·00078, pcorrected=0·046). Further analysis pointed to antipsychotics as having independent enrichment after removing genes that overlapped these two target sets. We noted significant enrichment both in known targets of antipsychotics (70 genes, p=0·0078) and novel predicted targets (277 genes, p=0·019). Patients with treatment-resistant schizophrenia had an excess of rare disruptive variants in gene targets of antipsychotics (347 genes, p=0·0067) and in genes with evidence for a role in antipsychotic efficacy (91 genes, p=0·0029). Interpretation Our results support genetic overlap between schizophrenia pathogenesis and antipsychotic mechanism of action. This finding is consistent with treatment efficacy being polygenic and suggests that single-target therapeutics might be insufficient. We provide evidence of a role for rare functional variants in antipsychotic treatment response, pointing to a subset of patients where their genetic information could inform treatment. Finally, we present a novel framework for identifying treatments from genetic data and improving our understanding of therapeutic mechanism. PMID:26915512
Conjugative plasmids: vessels of the communal gene pool
Norman, Anders; Hansen, Lars H.; Sørensen, Søren J.
2009-01-01
Comparative whole-genome analyses have demonstrated that horizontal gene transfer (HGT) provides a significant contribution to prokaryotic genome innovation. The evolution of specific prokaryotes is therefore tightly linked to the environment in which they live and the communal pool of genes available within that environment. Here we use the term supergenome to describe the set of all genes that a prokaryotic ‘individual’ can draw on within a particular environmental setting. Conjugative plasmids can be considered particularly successful entities within the communal pool, which have enabled HGT over large taxonomic distances. These plasmids are collections of discrete regions of genes that function as ‘backbone modules’ to undertake different aspects of overall plasmid maintenance and propagation. Conjugative plasmids often carry suites of ‘accessory elements’ that contribute adaptive traits to the hosts and, potentially, other resident prokaryotes within specific environmental niches. Insight into the evolution of plasmid modules therefore contributes to our knowledge of gene dissemination and evolution within prokaryotic communities. This communal pool provides the prokaryotes with an important mechanistic framework for obtaining adaptability and functional diversity that alleviates the need for large genomes of specialized ‘private genes’. PMID:19571247
Yu, Yang; Li, Quan-Feng; Zhang, Jin-Ping; Zhang, Fan; Zhou, Yan-Fei; Feng, Yan-Zhao; Chen, Yue-Qin; Zhang, Yu-Chan
2017-01-01
Seed setting rate is one of the most important components of rice grain yield. To date, only several genes regulating setting rate have been identified in plant. In this study, we showed that laccase-13 ( OsLAC13 ), a member of laccase family genes which are known for their roles in modulating phenylpropanoid pathway and secondary lignification in cell wall, exerts a regulatory function in rice seed setting rate. OsLAC13 expressed in anthers and promotes hydrogen peroxide production both in vitro and in the filaments and anther connectives. Knock-out of OsLAC13 showed significantly increased seed setting rate, while overexpression of this gene exhibited induced mitochondrial damage and suppressed sugar transportation in anthers, which in turn affected seed setting rate. OsLAC13 also induced H 2 O 2 production and mitochondrial damage in the root tip cells which caused the lethal phenotype. We also showed that high abundant of OsmiR397, the suppressor of OsLAC13 mRNA, increased the seed setting rate of rice plants, and restrains H 2 O 2 accumulation in roots during oxidative stress. Our results suggested a novel regulatory role of OsLAC13 gene in regulating seed setting rate by affecting H 2 O 2 dynamics and mitochondrial integrity in rice.
Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn
2016-12-01
The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.
Evolution of Prdm Genes in Animals: Insights from Comparative Genomics
Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre
2016-01-01
Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352
Improving the goat long-read assembly with optical mapping
USDA-ARS?s Scientific Manuscript database
Reference genome assemblies provide important context in genetics by standardizing the order of genes and providing a universal set of coordinates for individual nucleotides. Often due to the high complexity of genic regions and higher copy number of genes involved in immune function, immunity-relat...
Characterization of embryo-specific genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1989-01-01
The objective of the proposed research is to characterize the structure and function of a set of genes whose expression is regulated in embryo development, and that is not expressed in mature tissues -- the embryonic genes. In the last two years, using cDNA clones, we have isolated 22 cDNA clones, and characterized the expression pattern of their corresponding RNA. At least 4 cDNA clones detect RNAs of embryonic genes. These cDNA clones detect RNAs expressed in somatic as well as zygotic embryos of carrot. Using the cDNA clones, we screened the genomic library of carrot embryo DNA, and isolatedmore » genomic clones for three genes. The structure and function of two genes DC 8 and DC 59 have been characterized and are reported in this paper.« less
Gastroesophageal reflux activates the NF-κB pathway and impairs esophageal barrier function in mice
Fang, Yu; Chen, Hao; Hu, Yuhui; Djukic, Zorka; Tevebaugh, Whitney; Shaheen, Nicholas J.; Orlando, Roy C.; Hu, Jianguo
2013-01-01
The barrier function of the esophageal epithelium is a major defense against gastroesophageal reflux disease. Previous studies have shown that reflux damage is reflected in a decrease in transepithelial electrical resistance associated with tight junction alterations in the esophageal epithelium. To develop novel therapies, it is critical to understand the molecular mechanisms whereby contact with a refluxate impairs esophageal barrier function. In this study, surgical models of duodenal and mixed reflux were developed in mice. Mouse esophageal epithelium was analyzed by gene microarray. Gene set enrichment analysis showed upregulation of inflammation-related gene sets and the NF-κB pathway due to reflux. Significance analysis of microarrays revealed upregulation of NF-κB target genes. Overexpression of NF-κB subunits (p50 and p65) and NF-κB target genes (matrix metalloproteinases-3 and -9, IL-1β, IL-6, and IL-8) confirmed activation of the NF-κB pathway in the esophageal epithelium. In addition, real-time PCR, Western blotting, and immunohistochemical staining also showed downregulation and mislocalization of claudins-1 and -4. In a second animal experiment, treatment with an NF-κB inhibitor, BAY 11-7085 (20 mg·kg−1·day−1 ip for 10 days), counteracted the effects of duodenal and mixed reflux on epithelial resistance and NF-κB-regulated cytokines. We conclude that gastroesophageal reflux activates the NF-κB pathway and impairs esophageal barrier function in mice and that targeting the NF-κB pathway may strengthen esophageal barrier function against reflux. PMID:23639809
Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.
Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P
2017-11-23
The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In addition, the data provides additional evidence in favor of and against the similarity-based functions assigned to uncharacterized genes.
Disease Model Discovery from 3,328 Gene Knockouts by The International Mouse Phenotyping Consortium
Meehan, Terrence F.; Conte, Nathalie; West, David B.; Jacobsen, Julius O.; Mason, Jeremy; Warren, Jonathan; Chen, Chao-Kung; Tudose, Ilinca; Relac, Mike; Matthews, Peter; Karp, Natasha; Santos, Luis; Fiegel, Tanja; Ring, Natalie; Westerberg, Henrik; Greenaway, Simon; Sneddon, Duncan; Morgan, Hugh; Codner, Gemma F; Stewart, Michelle E; Brown, James; Horner, Neil; Haendel, Melissa; Washington, Nicole; Mungall, Christopher J.; Reynolds, Corey L; Gallegos, Juan; Gailus-Durner, Valerie; Sorg, Tania; Pavlovic, Guillaume; Bower, Lynette R; Moore, Mark; Morse, Iva; Gao, Xiang; Tocchini-Valentini, Glauco P; Obata, Yuichi; Cho, Soo Young; Seong, Je Kyung; Seavitt, John; Beaudet, Arthur L.; Dickinson, Mary E.; Herault, Yann; Wurst, Wolfgang; de Angelis, Martin Hrabe; Lloyd, K.C. Kent; Flenniken, Ann M; Nutter, Lauryl MJ; Newbigging, Susan; McKerlie, Colin; Justice, Monica J.; Murray, Stephen A.; Svenson, Karen L.; Braun, Robert E.; White, Jacqueline K.; Bradley, Allan; Flicek, Paul; Wells, Sara; Skarnes, William C.; Adams, David J.; Parkinson, Helen; Mallon, Ann-Marie; Brown, Steve D.M.; Smedley, Damian
2017-01-01
Although next generation sequencing has revolutionised the ability to associate variants with human diseases, diagnostic rates and development of new therapies are still limited by our lack of knowledge of function and pathobiological mechanism for most genes. To address this challenge, the International Mouse Phenotyping Consortium (IMPC) is creating a genome- and phenome-wide catalogue of gene function by characterizing new knockout mouse strains across diverse biological systems through a broad set of standardised phenotyping tests, with all mice made readily available to the biomedical community. Analysing the first 3328 genes reveals models for 360 diseases including the first for type C Bernard-Soulier, Bardet-Biedl-5 and Gordon Holmes syndromes. 90% of our phenotype annotations are novel, providing the first functional evidence for 1092 genes and candidates in unsolved diseases such as Arrhythmogenic Right Ventricular Dysplasia 3. Finally, we describe our role in variant functional validation with the 100,000 Genomes and other projects. PMID:28650483
Kayano, Mitsunori; Matsui, Hidetoshi; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2016-04-01
High-throughput time course expression profiles have been available in the last decade due to developments in measurement techniques and devices. Functional data analysis, which treats smoothed curves instead of originally observed discrete data, is effective for the time course expression profiles in terms of dimension reduction, robustness, and applicability to data measured at small and irregularly spaced time points. However, the statistical method of differential analysis for time course expression profiles has not been well established. We propose a functional logistic model based on elastic net regularization (F-Logistic) in order to identify the genes with dynamic alterations in case/control study. We employ a mixed model as a smoothing method to obtain functional data; then F-Logistic is applied to time course profiles measured at small and irregularly spaced time points. We evaluate the performance of F-Logistic in comparison with another functional data approach, i.e. functional ANOVA test (F-ANOVA), by applying the methods to real and synthetic time course data sets. The real data sets consist of the time course gene expression profiles for long-term effects of recombinant interferon β on disease progression in multiple sclerosis. F-Logistic distinguishes dynamic alterations, which cannot be found by competitive approaches such as F-ANOVA, in case/control study based on time course expression profiles. F-Logistic is effective for time-dependent biomarker detection, diagnosis, and therapy. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Essentiality, conservation, evolutionary pressure and codon bias in bacterial genomes.
Dilucca, Maddalena; Cimini, Giulio; Giansanti, Andrea
2018-07-15
Essential genes constitute the core of genes which cannot be mutated too much nor lost along the evolutionary history of a species. Natural selection is expected to be stricter on essential genes and on conserved (highly shared) genes, than on genes that are either nonessential or peculiar to a single or a few species. In order to further assess this expectation, we study here how essentiality of a gene is connected with its degree of conservation among several unrelated bacterial species, each one characterised by its own codon usage bias. Confirming previous results on E. coli, we show the existence of a universal exponential relation between gene essentiality and conservation in bacteria. Moreover, we show that, within each bacterial genome, there are at least two groups of functionally distinct genes, characterised by different levels of conservation and codon bias: i) a core of essential genes, mainly related to cellular information processing; ii) a set of less conserved nonessential genes with prevalent functions related to metabolism. In particular, the genes in the first group are more retained among species, are subject to a stronger purifying conservative selection and display a more limited repertoire of synonymous codons. The core of essential genes is close to the minimal bacterial genome, which is in the focus of recent studies in synthetic biology, though we confirm that orthologs of genes that are essential in one species are not necessarily essential in other species. We also list a set of highly shared genes which, reasonably, could constitute a reservoir of targets for new anti-microbial drugs. Copyright © 2018 Elsevier B.V. All rights reserved.
Statistical assessment of crosstalk enrichment between gene groups in biological networks.
McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L
2013-01-01
Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.
FUN-L: gene prioritization for RNAi screens.
Lees, Jonathan G; Hériché, Jean-Karim; Morilla, Ian; Fernández, José M; Adler, Priit; Krallinger, Martin; Vilo, Jaak; Valencia, Alfonso; Ellenberg, Jan; Ranea, Juan A; Orengo, Christine
2015-06-15
Most biological processes remain only partially characterized with many components still to be identified. Given that a whole genome can usually not be tested in a functional assay, identifying the genes most likely to be of interest is of critical importance to avoid wasting resources. Given a set of known functionally related genes and using a state-of-the-art approach to data integration and mining, our Functional Lists (FUN-L) method provides a ranked list of candidate genes for testing. Validation of predictions from FUN-L with independent RNAi screens confirms that FUN-L-produced lists are enriched in genes with the expected phenotypes. In this article, we describe a website front end to FUN-L. The website is freely available to use at http://funl.org © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
GeneTools--application for functional annotation and statistical hypothesis testing.
Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid
2006-10-24
Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no
Thirunavukkarasu, Nepolean; Hossain, Firoz; Arora, Kanika; Sharma, Rinku; Shiriga, Kaliyugam; Mittal, Swati; Mohan, Sweta; Namratha, Pottekatt Mohanlal; Dogga, Sreelatha; Rani, Tikka Shobha; Katragadda, Sumalini; Rathore, Abhishek; Shah, Trushar; Mohapatra, Trilochan; Gupta, Hari Shankar
2014-12-24
Earlier studies were focused on the genetics of temperate and tropical maize under drought. We identified genetic loci and their association with functional mechanisms in 240 accessions of subtropical maize using a high-density marker set under water stress. Out of 61 significant SNPs (11 were false-discovery-rate-corrected associations), identified across agronomic traits, models, and locations by subjecting the accessions to water stress at flowering stage, 48% were associated with drought-tolerant genes. Maize gene models revealed that SNPs mapped for agronomic traits were in fact associated with number of functional traits as follows: stomatal closure, 28; flowering, 15; root development, 5; detoxification, 4; and reduced water potential, 2. Interactions of these SNPS through the functional traits could lead to drought tolerance. The SNPs associated with ABA-dependent signalling pathways played a major role in the plant's response to stress by regulating a series of functions including flowering, root development, auxin metabolism, guard cell functions, and scavenging reactive oxygen species (ROS). ABA signalling genes regulate flowering through epigenetic changes in stress-responsive genes. ROS generated by ABA signalling are reduced by the interplay between ethylene, ABA, and detoxification signalling transductions. Integration of ABA-signalling genes with auxin-inducible genes regulates root development which in turn, maintains the water balance by regulating electrochemical gradient in plant. Several genes are directly or indirectly involved in the functioning of agronomic traits related to water stress. Genes involved in these crucial biological functions interacted significantly in order to maintain the primary as well as exclusive functions related to coping with water stress. SNPs associated with drought-tolerant genes involved in strategic biological functions will be useful to understand the mechanisms of drought tolerance in subtropical maize.
Gibson, Scott M; Ficklin, Stephen P; Isaacson, Sven; Luo, Feng; Feltus, Frank A; Smith, Melissa C
2013-01-01
The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust.
Gene context conservation of a higher order than operons.
Lathe, W C; Snel, B; Bork, P
2000-10-01
Operons, co-transcribed and co-regulated contiguous sets of genes, are poorly conserved over short periods of evolutionary time. The gene order, gene content and regulatory mechanisms of operons can be very different, even in closely related species. Here, we present several lines of evidence which suggest that, although an operon and its individual genes and regulatory structures are rearranged when comparing the genomes of different species, this rearrangement is a conservative process. Genomic rearrangements invariably maintain individual genes in very specific functional and regulatory contexts. We call this conserved context an uber-operon.
Rollick, Natalie C; Lemmex, Devin B; Ono, Yohei; Reno, Carol R; Hart, David A; Lo, Ian KY; Thornton, Gail M
2018-01-01
Background When considering the “joint as an organ”, the tissues in a joint act as complementary components of an organ, and the “set point” is the cellular activity for homeostasis of the joint tissues. Even in the absence of injury, joint tissues have adaptive responses to processes, like aging and menopause, which result in changes to the set point. Purpose The purpose of this study in a preclinical model was to investigate age-related and menopause-related changes in knee-joint tissues with the hypothesis that tissues will change in unique ways that reflect their differing contributions to maintaining joint function (as measured by joint laxity) and the differing processes of aging and menopause. Methods Rabbit knee-joint tissues from three groups were evaluated: young adult (gene expression, n=8; joint laxity, n=7; water content, n=8), aging adult (gene expression, n=6; joint laxity, n=7; water content, n=5), and menopausal adult (gene expression, n=8; joint laxity, n=7; water content, n=8). Surgical menopause was induced with ovariohysterectomy surgery and gene expression was assessed using reverse-transcription quantitative polymerase chain reaction. Results Aging resulted in changes to 37 of the 150 gene–tissue combinations evaluated, and menopause resulted in changes to 39 of the 150. Despite the similar number of changes, only eleven changes were the same in both aging and menopause. No differences in joint laxity were detected comparing young adult rabbits with aging adult rabbits or with menopausal adult rabbits. Conclusion Aging and menopause affected the gene-expression patterns of the tissues of the knee joint differently, suggesting unique changes to the set point of the knee. Interestingly, aging and menopause did not affect knee-joint laxity, suggesting that joint function was maintained, despite changes in gene expression. Taken together, these findings support the theory of the joint as an organ where the tissues of the joint adapt to maintain joint function. PMID:29535510
Zhang, Cui; Li, Zhenkui; Cui, Huiting; Jiang, Yuanyuan; Yang, Zhenke; Wang, Xu; Gao, Han; Liu, Cong; Zhang, Shujia
2017-01-01
ABSTRACT Malaria parasites have a complex life cycle with multiple developmental stages in mosquito and vertebrate hosts, and different developmental stages express unique sets of genes. Unexpectedly, many transcription factors (TFs) commonly found in eukaryotic organisms are absent in malaria parasites; instead, a family of genes encoding proteins similar to the plant Apetala2 (ApiAP2) transcription factors is expanded in the parasites. Several malaria ApiAP2 genes have been shown to play a critical role in parasite development; however, the functions of the majority of the ApiAP2 genes remain to be elucidated. In particular, no study on the Plasmodium yoelii ApiAP2 (PyApiAP2) gene family has been reported so far. This study systematically investigated the functional roles of PyApiAP2 genes in parasite development. Twenty-four of the 26 PyApiAP2 genes were selected for disruption, and 12 were successfully knocked out using the clustered regularly interspaced short palindromic repeat–CRISPR-associated protein 9 (CRISPR-Cas9) method. The effects of gene knockout (KO) on parasite development in mouse and mosquito stages were evaluated. Ten of 12 successfully disrupted genes, including two genes that have not been functionally characterized in any Plasmodium species previously, were shown to be critical for P. yoelii development of sexual and mosquito stages. Additionally, seven of the genes were labeled for protein expression analysis, revealing important information supporting their functions. This study represents the first systematic functional characterization of the P. yoelii ApiAP2 gene family and discovers important insights on the roles of the ApiAP2 genes in parasite development. PMID:29233900
A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1
de Berardinis, Véronique; Vallenet, David; Castelli, Vanina; Besnard, Marielle; Pinet, Agnès; Cruaud, Corinne; Samair, Sumitta; Lechaplais, Christophe; Gyapay, Gabor; Richez, Céline; Durot, Maxime; Kreimeyer, Annett; Le Fèvre, François; Schächter, Vincent; Pezo, Valérie; Döring, Volker; Scarpelli, Claude; Médigue, Claudine; Cohen, Georges N; Marlière, Philippe; Salanoubat, Marcel; Weissenbach, Jean
2008-01-01
We have constructed a collection of single-gene deletion mutants for all dispensable genes of the soil bacterium Acinetobacter baylyi ADP1. A total of 2594 deletion mutants were obtained, whereas 499 (16%) were not, and are therefore candidate essential genes for life on minimal medium. This essentiality data set is 88% consistent with the Escherichia coli data set inferred from the Keio mutant collection profiled for growth on minimal medium, while 80% of the orthologous genes described as essential in Pseudomonas aeruginosa are also essential in ADP1. Several strategies were undertaken to investigate ADP1 metabolism by (1) searching for discrepancies between our essentiality data and current metabolic knowledge, (2) comparing this essentiality data set to those from other organisms, (3) systematic phenotyping of the mutant collection on a variety of carbon sources (quinate, 2-3 butanediol, glucose, etc.). This collection provides a new resource for the study of gene function by forward and reverse genetic approaches and constitutes a robust experimental data source for systems biology approaches. PMID:18319726
Primer sets for cloning the human repertoire of T cell Receptor Variable regions
Boria, Ilenia; Cotella, Diego; Dianzani, Irma; Santoro, Claudio; Sblattero, Daniele
2008-01-01
Background Amplification and cloning of naïve T cell Receptor (TR) repertoires or antigen-specific TR is crucial to shape immune response and to develop immuno-based therapies. TR variable (V) regions are encoded by several genes that recombine during T cell development. The cloning of expressed genes as large diverse libraries from natural sources relies upon the availability of primers able to amplify as many V genes as possible. Results Here, we present a list of primers computationally designed on all functional TR V and J genes listed in the IMGT®, the ImMunoGeneTics information system®. The list consists of unambiguous or degenerate primers suitable to theoretically amplify and clone the entire TR repertoire. We show that it is possible to selectively amplify and clone expressed TR V genes in one single RT-PCR step and from as little as 1000 cells. Conclusion This new primer set will facilitate the creation of more diverse TR libraries than has been possible using currently available primer sets. PMID:18759974
Jani, Saurin D; Argraves, Gary L; Barth, Jeremy L; Argraves, W Scott
2010-04-01
An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies, intermolecular interactions, overlays of genes onto KEGG pathway diagrams and heatmaps of expression intensity values. GeneMesh is freely available online at http://proteogenomics.musc.edu/genemesh/.
Berchtold, Nicole C.; Coleman, Paul D.; Cribbs, David H.; Rogers, Joseph; Gillen, Daniel L.; Cotman, Carl W.
2014-01-01
Synapses are essential for transmitting, processing, and storing information, all of which decline in aging and Alzheimer’s disease (AD). Because synapse loss only partially accounts for the cognitive declines seen in aging and AD, we hypothesized that existing synapses might undergo molecular changes that reduce their functional capacity. Microarrays were used to evaluate expression profiles of 340 synaptic genes in aging (20–99 years) and AD across 4 brain regions from 81 cases. The analysis revealed an unexpectedly large number of significant expression changes in synapse-related genes in aging, with many undergoing progressive downregulation across aging and AD. Functional classification of the genes showing altered expression revealed that multiple aspects of synaptic function are affected, notably synaptic vesicle trafficking and release, neurotransmitter receptors and receptor trafficking, postsynaptic density scaffolding, cell adhesion regulating synaptic stability, and neuromodulatory systems. The widespread declines in synaptic gene expression in normal aging suggests that function of existing synapses might be impaired, and that a common set of synaptic genes are vulnerable to change in aging and AD. PMID:23273601
Maeso, Ignacio; Dunwell, Thomas L; Wyatt, Chris D R; Marlétaz, Ferdinand; Vető, Borbála; Bernal, Juan A; Quah, Shan; Irimia, Manuel; Holland, Peter W H
2016-06-13
A central goal of evolutionary biology is to link genomic change to phenotypic evolution. The origin of new transcription factors is a special case of genomic evolution since it brings opportunities for novel regulatory interactions and potentially the emergence of new biological properties. We demonstrate that a group of four homeobox gene families (Argfx, Leutx, Dprx, Tprx), plus a gene newly described here (Pargfx), arose by tandem gene duplication from the retinal-expressed Crx gene, followed by asymmetric sequence evolution. We show these genes arose as part of repeated gene gain and loss events on a dynamic chromosomal region in the stem lineage of placental mammals, on the forerunner of human chromosome 19. The human orthologues of these genes are expressed specifically in early embryo totipotent cells, peaking from 8-cell to morula, prior to cell fate restrictions; cow orthologues have similar expression. To examine biological roles, we used ectopic gene expression in cultured human cells followed by high-throughput RNA-seq and uncovered extensive transcriptional remodelling driven by three of the genes. Comparison to transcriptional profiles of early human embryos suggest roles in activating and repressing a set of developmentally-important genes that spike at 8-cell to morula, rather than a general role in genome activation. We conclude that a dynamic chromosome region spawned a set of evolutionarily new homeobox genes, the ETCHbox genes, specifically in eutherian mammals. After these genes diverged from the parental Crx gene, we argue they were recruited for roles in the preimplantation embryo including activation of genes at the 8-cell stage and repression after morula. We propose these new homeobox gene roles permitted fine-tuning of cell fate decisions necessary for specification and function of embryonic and extra-embryonic tissues utilised in mammalian development and pregnancy.
Chen, Rui; Davis, Lea K; Guter, Stephen; Wei, Qiang; Jacob, Suma; Potter, Melissa H; Cox, Nancy J; Cook, Edwin H; Sutcliffe, James S; Li, Bingshan
2017-01-01
Autism spectrum disorder (ASD) is one of the most highly heritable neuropsychiatric disorders, but underlying molecular mechanisms are still unresolved due to extreme locus heterogeneity. Leveraging meaningful endophenotypes or biomarkers may be an effective strategy to reduce heterogeneity to identify novel ASD genes. Numerous lines of evidence suggest a link between hyperserotonemia, i.e., elevated serotonin (5-hydroxytryptamine or 5-HT) in whole blood, and ASD. However, the genetic determinants of blood 5-HT level and their relationship to ASD are largely unknown. In this study, pursuing the hypothesis that de novo variants (DNVs) and rare risk alleles acting in a recessive mode may play an important role in predisposition of hyperserotonemia in people with ASD, we carried out whole exome sequencing (WES) in 116 ASD parent-proband trios with most (107) probands having 5-HT measurements. Combined with published ASD DNVs, we identified USP15 as having recurrent de novo loss of function mutations and discovered evidence supporting two other known genes with recurrent DNVs ( FOXP1 and KDM5B ). Genes harboring functional DNVs significantly overlap with functional/disease gene sets known to be involved in ASD etiology, including FMRP targets and synaptic formation and transcriptional regulation genes. We grouped the probands into High-5HT and Normal-5HT groups based on normalized serotonin levels, and used network-based gene set enrichment analysis (NGSEA) to identify novel hyperserotonemia-related ASD genes based on LoF and missense DNVs. We found enrichment in the High-5HT group for a gene network module (DAWN-1) previously implicated in ASD, and this points to the TGF-β pathway and cell junction processes. Through analysis of rare recessively acting variants (RAVs), we also found that rare compound heterozygotes (CHs) in the High-5HT group were enriched for loci in an ASD-associated gene set. Finally, we carried out rare variant group-wise transmission disequilibrium tests (gTDT) and observed significant association of rare variants in genes encoding a subset of the serotonin pathway with ASD. Our study identified USP15 as a novel gene implicated in ASD based on recurrent DNVs. It also demonstrates the potential value of 5-HT as an effective endophenotype for gene discovery in ASD, and the effectiveness of this strategy needs to be further explored in studies of larger sample sizes.
Jones, Melissa K; Lu, Bin; Saghizadeh, Mehrnoosh; Wang, Shaomei
2016-01-01
Retinal degenerative diseases (RDDs) affect millions of people and are the leading cause of vision loss. Although treatment options for RDDs are limited, stem and progenitor cell-based therapies have great potential to halt or slow the progression of vision loss. Our previous studies have shown that a single subretinal injection of human forebrain derived neural progenitor cells (hNPCs) into the Royal College of Surgeons (RCS) retinal degenerate rat offers long-term preservation of photoreceptors and visual function. Furthermore, neural progenitor cells are currently in clinical trials for treating age-related macular degeneration; however, the molecular mechanisms of stem cell-based therapies are largely unknown. This is the first study to analyze gene expression changes in the retina of RCS rats following subretinal injection of hNPCs using high-throughput sequencing. RNA-seq data of retinas from RCS rats injected with hNPCs (RCS(hNPCs)) were compared to sham surgery in RCS (RCS(sham)) and wild-type Long Evans (LE(sham)) rats. Differential gene expression patterns were determined with in silico analysis and confirmed with qRT-PCR. Function, biologic, cellular component, and pathway analyses were performed on differentially expressed genes and investigated with immunofluorescent staining experiments. Analysis of the gene expression data sets identified 1,215 genes that were differentially expressed between RCS(sham) and LE(sham) samples. Additionally, 283 genes were differentially expressed between the RCS(hNPCs) and RCS(sham) samples. Comparison of these two gene sets identified 68 genes with inverse expression (termed rescue genes), including Pdc, Rp1, and Cdc42ep5. Functional, biologic, and cellular component analyses indicate that the immune response is enhanced in RCS(sham). Pathway analysis of the differential expression gene sets identified three affected pathways in RCS(hNPCs), which all play roles in phagocytosis signaling. Immunofluorescent staining detected the increased presence of macrophages and microglia in RCS(sham) retinas, which decreased in RCS(hNPCs) retinas similar to the patterns detected in LE(sham). The results from this study provide evidence of the gene expression changes that occur following treatment with hNPCs in the degenerating retina. This information can be used in future studies to potentially enhance or predict responses to hNPC and other stem cell therapies for retinal degenerative diseases.
Xi, Jianing; Wang, Minghui; Li, Ao
2017-09-26
The accumulating availability of next-generation sequencing data offers an opportunity to pinpoint driver genes that are causally implicated in oncogenesis through computational models. Despite previous efforts made regarding this challenging problem, there is still room for improvement in the driver gene identification accuracy. In this paper, we propose a novel integrated approach called IntDriver for prioritizing driver genes. Based on a matrix factorization framework, IntDriver can effectively incorporate functional information from both the interaction network and Gene Ontology similarity, and detect driver genes mutated in different sets of patients at the same time. When evaluated through known benchmarking driver genes, the top ranked genes of our result show highly significant enrichment for the known genes. Meanwhile, IntDriver also detects some known driver genes that are not found by the other competing approaches. When measured by precision, recall and F1 score, the performances of our approach are comparable or increased in comparison to the competing approaches.
By Stuart G. Baker The program requires Mathematica 7.01.0 The key function is Classify [datalist,options] where datalist={data, genename, dataname} data ={matrix for class 0, matrix for class 1}, matrix is gene expression by specimen genename a list of names of genes, dataname ={name of data set, name of class0, name of class1} |
Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.
2015-01-01
Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Genome-wide essential gene identification in Streptococcus sanguinis
Xu, Ping; Ge, Xiuchun; Chen, Lei; Wang, Xiaojing; Dou, Yuetan; Xu, Jerry Z.; Patel, Jenishkumar R.; Stone, Victoria; Trinh, My; Evans, Karra; Kitten, Todd; Bonchev, Danail; Buck, Gregory A.
2011-01-01
A clear perception of gene essentiality in bacterial pathogens is pivotal for identifying drug targets to combat emergence of new pathogens and antibiotic-resistant bacteria, for synthetic biology, and for understanding the origins of life. We have constructed a comprehensive set of deletion mutants and systematically identified a clearly defined set of essential genes for Streptococcus sanguinis. Our results were confirmed by growing S. sanguinis in minimal medium and by double-knockout of paralogous or isozyme genes. Careful examination revealed that these essential genes were associated with only three basic categories of biological functions: maintenance of the cell envelope, energy production, and processing of genetic information. Our finding was subsequently validated in two other pathogenic streptococcal species, Streptococcus pneumoniae and Streptococcus mutans and in two other gram-positive pathogens, Bacillus subtilis and Staphylococcus aureus. Our analysis has thus led to a simplified model that permits reliable prediction of gene essentiality. PMID:22355642
Gene function prediction based on the Gene Ontology hierarchical structure.
Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao
2014-01-01
The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.
Guan, Yunyan; He, Maoxian; Wu, Houbo
2017-06-01
To explore the molecular mechanism of triploidy effect in the pearl oyster Pinctada fucata, two RNA-seq libraries were constructed from the mantle tissue of diploids and triploids by Roche-454 massive parallel pyrosequencing. The identification of differential expressed genes (DEGs) between diploid and triploid may reveal the molecular mechanism of triploidy effect. In this study, 230 down-regulated and 259 up-regulated DEGs were obtained by comparison between diploid and triploid libraries. The gene ontology and KEGG pathway analysis revealed more functional activation in triploids and it may due to the duplicated gene expression in transcriptional level during whole genome duplication (WGD). To confirm the sequencing data, a set of 11 up-regulated genes related to growth and development control and regulation were analyzed by RT-qPCR in independent experiment. According to the validation and annotation of these genes, it is hypothesized that the set of up-regulated expressed genes had the correlated expression pattern involved in shell building or other interactive probable functions during triploidization. The up- regulation of growth-related genes may support the classic hypotheses of 'energy redistribution' from early research. The results provide valuable resources to understand the molecular mechanism of triploidy effect in both shell building and producing high-quality seawater pearls. Copyright © 2017 Elsevier B.V. All rights reserved.
Empuku, Shinichiro; Nakajima, Kentaro; Akagi, Tomonori; Kaneko, Kunihiko; Hijiya, Naoki; Etoh, Tsuyoshi; Shiraishi, Norio; Moriyama, Masatsugu; Inomata, Masafumi
2016-05-01
Preoperative chemoradiotherapy (CRT) for locally advanced rectal cancer not only improves the postoperative local control rate, but also induces downstaging. However, it has not been established how to individually select patients who receive effective preoperative CRT. The aim of this study was to identify a predictor of response to preoperative CRT for locally advanced rectal cancer. This study is additional to our multicenter phase II study evaluating the safety and efficacy of preoperative CRT using oral fluorouracil (UMIN ID: 03396). From April, 2009 to August, 2011, 26 biopsy specimens obtained prior to CRT were analyzed by cyclopedic microarray analysis. Response to CRT was evaluated according to a histological grading system using surgically resected specimens. To decide on the number of genes for dividing into responder and non-responder groups, we statistically analyzed the data using a dimension reduction method, a principle component analysis. Of the 26 cases, 11 were responders and 15 non-responders. No significant difference was found in clinical background data between the two groups. We determined that the optimal number of genes for the prediction of response was 80 of 40,000 and the functions of these genes were analyzed. When comparing non-responders with responders, genes expressed at a high level functioned in alternative splicing, whereas those expressed at a low level functioned in the septin complex. Thus, an 80-gene expression set that predicts response to preoperative CRT for locally advanced rectal cancer was identified using a novel statistical method.
Martin, Guiomar; Soy, Judit; Monte, Elena
2016-01-01
Members of the PIF quartet (PIFq; PIF1, PIF3, PIF4, and PIF5) collectively contribute to induce growth in Arabidopsis seedlings under short day (SD) conditions, specifically promoting elongation at dawn. Their action involves the direct regulation of growth-related and hormone-associated genes. However, a comprehensive definition of the PIFq-regulated transcriptome under SD is still lacking. We have recently shown that SD and free-running (LL) conditions correspond to "growth" and "no growth" conditions, respectively, correlating with greater abundance of PIF protein in SD. Here, we present a genomic analysis whereby we first define SD-regulated genes at dawn compared to LL in the wild type, followed by identification of those SD-regulated genes whose expression depends on the presence of PIFq. By using this sequential strategy, we have identified 349 PIF/SD-regulated genes, approximately 55% induced and 42% repressed by both SD and PIFq. Comparison with available databases indicates that PIF/SD-induced and PIF/SD-repressed sets are differently phased at dawn and mid-morning, respectively. In addition, we found that whereas rhythmicity of the PIF/SD-induced gene set is lost in LL, most PIF/SD-repressed genes keep their rhythmicity in LL, suggesting differential regulation of both gene sets by the circadian clock. Moreover, we also uncovered distinct overrepresented functions in the induced and repressed gene sets, in accord with previous studies in other examined PIF-regulated processes. Interestingly, promoter analyses showed that, whereas PIF/SD-induced genes are enriched in direct PIF targets, PIF/SD-repressed genes are mostly indirectly regulated by the PIFs and might be more enriched in ABA-regulated genes.
Probing the Xenopus laevis inner ear transcriptome for biological function
2012-01-01
Background The senses of hearing and balance depend upon mechanoreception, a process that originates in the inner ear and shares features across species. Amphibians have been widely used for physiological studies of mechanotransduction by sensory hair cells. In contrast, much less is known of the genetic basis of auditory and vestibular function in this class of animals. Among amphibians, the genus Xenopus is a well-characterized genetic and developmental model that offers unique opportunities for inner ear research because of the amphibian capacity for tissue and organ regeneration. For these reasons, we implemented a functional genomics approach as a means to undertake a large-scale analysis of the Xenopus laevis inner ear transcriptome through microarray analysis. Results Microarray analysis uncovered genes within the X. laevis inner ear transcriptome associated with inner ear function and impairment in other organisms, thereby supporting the inclusion of Xenopus in cross-species genetic studies of the inner ear. The use of gene categories (inner ear tissue; deafness; ion channels; ion transporters; transcription factors) facilitated the assignment of functional significance to probe set identifiers. We enhanced the biological relevance of our microarray data by using a variety of curation approaches to increase the annotation of the Affymetrix GeneChip® Xenopus laevis Genome array. In addition, annotation analysis revealed the prevalence of inner ear transcripts represented by probe set identifiers that lack functional characterization. Conclusions We identified an abundance of targets for genetic analysis of auditory and vestibular function. The orthologues to human genes with known inner ear function and the highly expressed transcripts that lack annotation are particularly interesting candidates for future analyses. We used informatics approaches to impart biologically relevant information to the Xenopus inner ear transcriptome, thereby addressing the impediment imposed by insufficient gene annotation. These findings heighten the relevance of Xenopus as a model organism for genetic investigations of inner ear organogenesis, morphogenesis, and regeneration. PMID:22676585
Graubner, Felix R.; Gram, Aykut; Kautz, Ewa; Bauersachs, Stefan; Aslan, Selim; Agaoglu, Ali R.; Boos, Alois
2017-01-01
Abstract In the dog, there is no luteolysis in the absence of pregnancy. Thus, this species lacks any anti-luteolytic endocrine signal as found in other species that modulate uterine function during the critical period of pregnancy establishment. Nevertheless, in the dog an embryo-maternal communication must occur in order to prevent rejection of embryos. Based on this hypothesis, we performed microarray analysis of canine uterine samples collected during pre-attachment phase (days 10-12) and in corresponding non-pregnant controls, in order to elucidate the embryo attachment signal. An additional goal was to identify differences in uterine responses to pre-attachment embryos between dogs and other mammalian species exhibiting different reproductive patterns with regard to luteolysis, implantation, and preparation for placentation. Therefore, the canine microarray data were compared with gene sets from pigs, cattle, horses, and humans. We found 412 genes differentially regulated between the two experimental groups. The functional terms most strongly enriched in response to pre-attachment embryos related to extracellular matrix function and remodeling, and to immune and inflammatory responses. Several candidate genes were validated by semi-quantitative PCR. When compared with other species, best matches were found with human and equine counterparts. Especially for the pig, the majority of overlapping genes showed opposite expression patterns. Interestingly, 1926 genes did not pair with any of the other gene sets. Using a microarray approach, we report the uterine changes in the dog driven by the presence of embryos and compare these results with datasets from other mammalian species, finding common-, contrary-, and exclusively canine-regulated genes. PMID:28651344
Molecular and functional definition of the developing human striatum.
Onorati, Marco; Castiglioni, Valentina; Biasci, Daniele; Cesana, Elisabetta; Menon, Ramesh; Vuono, Romina; Talpo, Francesca; Laguna Goya, Rocio; Lyons, Paul A; Bulfamante, Gaetano P; Muzio, Luca; Martino, Gianvito; Toselli, Mauro; Farina, Cinthia; Barker, Roger A; Biella, Gerardo; Cattaneo, Elena
2014-12-01
The complexity of the human brain derives from the intricate interplay of molecular instructions during development. Here we systematically investigated gene expression changes in the prenatal human striatum and cerebral cortex during development from post-conception weeks 2 to 20. We identified tissue-specific gene coexpression networks, differentially expressed genes and a minimal set of bimodal genes, including those encoding transcription factors, that distinguished striatal from neocortical identities. Unexpected differences from mouse striatal development were discovered. We monitored 36 determinants at the protein level, revealing regional domains of expression and their refinement, during striatal development. We electrophysiologically profiled human striatal neurons differentiated in vitro and determined their refined molecular and functional properties. These results provide a resource and opportunity to gain global understanding of how transcriptional and functional processes converge to specify human striatal and neocortical neurons during development.
Molecular definition of the identity and activation of natural killer cells.
Bezman, Natalie A; Kim, Charles C; Sun, Joseph C; Min-Oo, Gundula; Hendricks, Deborah W; Kamimura, Yosuke; Best, J Adam; Goldrath, Ananda W; Lanier, Lewis L
2012-10-01
Using whole-genome microarray data sets of the Immunological Genome Project, we demonstrate a closer transcriptional relationship between NK cells and T cells than between any other leukocytes, distinguished by their shared expression of genes encoding molecules with similar signaling functions. Whereas resting NK cells are known to share expression of a few genes with cytotoxic CD8(+) T cells, our transcriptome-wide analysis demonstrates that the commonalities extend to hundreds of genes, many encoding molecules with unknown functions. Resting NK cells demonstrate a 'preprimed' state compared with naive T cells, which allows NK cells to respond more rapidly to viral infection. Collectively, our data provide a global context for known and previously unknown molecular aspects of NK cell identity and function by delineating the genome-wide repertoire of gene expression of NK cells in various states.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2016-02-16
The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof
Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius
2015-11-04
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius
2015-09-01
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-09-15
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius
2015-08-18
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Reilly, Matthew T; Harris, R Adron; Noronha, Antonio
2012-01-01
Over the last 50 years, researchers have made substantial progress in identifying genetic variations that underlie the complex phenotype of alcoholism. Not much is known, however, about how this genetic variation translates into altered biological function. Genetic animal models recapitulating specific characteristics of the human condition have helped elucidate gene function and the genetic basis of disease. In particular, major advances have come from the ability to manipulate genes through a variety of genetic technologies that provide an unprecedented capacity to determine gene function in the living organism and in alcohol-related behaviors. Even newer genetic-engineering technologies have given researchers the ability to control when and where a specific gene or mutation is activated or deleted, allowing investigators to narrow the role of the gene's function to circumscribed neural pathways and across development. These technologies are important for all areas of neuroscience, and several public and private initiatives are making a new generation of genetic-engineering tools available to the scientific community at large. Finally, high-throughput "next-generation sequencing" technologies are set to rapidly increase knowledge of the genome, epigenome, and transcriptome, which, combined with genetically engineered mouse mutants, will enhance insight into biological function. All of these resources will provide deeper insight into the genetic basis of alcoholism.
Functional Analysis With a Barcoder Yeast Gene Overexpression System
Douglas, Alison C.; Smith, Andrew M.; Sharifpoor, Sara; Yan, Zhun; Durbic, Tanja; Heisler, Lawrence E.; Lee, Anna Y.; Ryan, Owen; Göttert, Hendrikje; Surendra, Anu; van Dyk, Dewald; Giaever, Guri; Boone, Charles; Nislow, Corey; Andrews, Brenda J.
2012-01-01
Systematic analysis of gene overexpression phenotypes provides an insight into gene function, enzyme targets, and biological pathways. Here, we describe a novel functional genomics platform that enables a highly parallel and systematic assessment of overexpression phenotypes in pooled cultures. First, we constructed a genome-level collection of ~5100 yeast barcoder strains, each of which carries a unique barcode, enabling pooled fitness assays with a barcode microarray or sequencing readout. Second, we constructed a yeast open reading frame (ORF) galactose-induced overexpression array by generating a genome-wide set of yeast transformants, each of which carries an individual plasmid-born and sequence-verified ORF derived from the Saccharomyces cerevisiae full-length EXpression-ready (FLEX) collection. We combined these collections genetically using synthetic genetic array methodology, generating ~5100 strains, each of which is barcoded and overexpresses a specific ORF, a set we termed “barFLEX.” Additional synthetic genetic array allows the barFLEX collection to be moved into different genetic backgrounds. As a proof-of-principle, we describe the properties of the barFLEX overexpression collection and its application in synthetic dosage lethality studies under different environmental conditions. PMID:23050238
Integrative and conjugative elements and their hosts: composition, distribution and organization
Touchon, Marie; Rocha, Eduardo P. C.
2017-01-01
Abstract Conjugation of single-stranded DNA drives horizontal gene transfer between bacteria and was widely studied in conjugative plasmids. The organization and function of integrative and conjugative elements (ICE), even if they are more abundant, was only studied in a few model systems. Comparative genomics of ICE has been precluded by the difficulty in finding and delimiting these elements. Here, we present the results of a method that circumvents these problems by requiring only the identification of the conjugation genes and the species’ pan-genome. We delimited 200 ICEs and this allowed the first large-scale characterization of these elements. We quantified the presence in ICEs of a wide set of functions associated with the biology of mobile genetic elements, including some that are typically associated with plasmids, such as partition and replication. Protein sequence similarity networks and phylogenetic analyses revealed that ICEs are structured in functional modules. Integrases and conjugation systems have different evolutionary histories, even if the gene repertoires of ICEs can be grouped in function of conjugation types. Our characterization of the composition and organization of ICEs paves the way for future functional and evolutionary analyses of their cargo genes, composed of a majority of unknown function genes. PMID:28911112
Strakova, Eva; Zikova, Alice; Vohradsky, Jiri
2014-01-01
A computational model of gene expression was applied to a novel test set of microarray time series measurements to reveal regulatory interactions between transcriptional regulators represented by 45 sigma factors and the genes expressed during germination of a prokaryote Streptomyces coelicolor. Using microarrays, the first 5.5 h of the process was recorded in 13 time points, which provided a database of gene expression time series on genome-wide scale. The computational modeling of the kinetic relations between the sigma factors, individual genes and genes clustered according to the similarity of their expression kinetics identified kinetically plausible sigma factor-controlled networks. Using genome sequence annotations, functional groups of genes that were predominantly controlled by specific sigma factors were identified. Using external binding data complementing the modeling approach, specific genes involved in the control of the studied process were identified and their function suggested.
Malki, K; Pain, O; Tosto, M G; Du Rietz, E; Carboni, L; Schalkwyk, L C
2015-01-01
Despite moderate heritability estimates, progress in uncovering the molecular substrate underpinning major depressive disorder (MDD) has been slow. In this study, we used prefrontal cortex (PFC) gene expression from a genetic rat model of MDD to inform probe set prioritization in PFC in a human post-mortem study to uncover genes and gene pathways associated with MDD. Gene expression differences between Flinders sensitive (FSL) and Flinders resistant (FRL) rat lines were statistically evaluated using the RankProd, non-parametric algorithm. Top ranking probe sets in the rat study were subsequently used to prioritize orthologous selection in a human PFC in a case–control post-mortem study on MDD from the Stanley Brain Consortium. Candidate genes in the human post-mortem study were then tested against a matched control sample using the RankProd method. A total of 1767 probe sets were differentially expressed in the PFC between FSL and FRL rat lines at (q⩽0.001). A total of 898 orthologous probe sets was found on Affymetrix's HG-U95A chip used in the human study. Correcting for the number of multiple, non-independent tests, 20 probe sets were found to be significantly dysregulated between human cases and controls at q⩽0.05. These probe sets tagged the expression profile of 18 human genes (11 upregulated and seven downregulated). Using an integrative rat–human study, a number of convergent genes that may have a role in pathogenesis of MDD were uncovered. Eighty percent of these genes were functionally associated with a key stress response signalling cascade, involving NF-κB (nuclear factor kappa-light-chain-enhancer of activated B cells), AP-1 (activator protein 1) and ERK/MAPK, which has been systematically associated with MDD, neuroplasticity and neurogenesis. PMID:25734512
Schwarz, Jodi A; Brokstein, Peter B; Voolstra, Christian; Terry, Astrid Y; Miller, David J; Szmant, Alina M; Coffroth, Mary Alice; Medina, Mónica
2008-01-01
Background Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Results We generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembled into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. Conclusion Partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies. PMID:18298846
Schwarz, Jodi A.; Brokstein, Peter B.; Voolstra, Christian R.; ...
2008-02-25
Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Here we generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembledmore » into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. In conclusion, partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schwarz, Jodi A.; Brokstein, Peter B.; Voolstra, Christian R.
Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Here we generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembledmore » into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. In conclusion, partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies.« less
Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O
2015-08-25
Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.
Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun
2017-09-21
High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.
A graph-theoretic approach for inparalog detection.
Tremblay-Savard, Olivier; Swenson, Krister M
2012-01-01
Understanding the history of a gene family that evolves through duplication, speciation, and loss is a fundamental problem in comparative genomics. Features such as function, position, and structural similarity between genes are intimately connected to this history; relationships between genes such as orthology (genes related through a speciation event) or paralogy (genes related through a duplication event) are usually correlated with these features. For example, recent work has shown that in human and mouse there is a strong connection between function and inparalogs, the paralogs that were created since the speciation event separating the human and mouse lineages. Methods exist for detecting inparalogs that either use information from only two species, or consider a set of species but rely on clustering methods. In this paper we present a graph-theoretic approach for finding lower bounds on the number of inparalogs for a given set of species; we pose an edge covering problem on the similarity graph and give an efficient 2/3-approximation as well as a faster heuristic. Since the physical position of inparalogs corresponding to recent speciations is not likely to have changed since the duplication, we also use our predictions to estimate the types of duplications that have occurred in some vertebrates and drosophila.
Ma, Jian; Stiller, Jiri; Zhao, Qiang; Feng, Qi; Cavanagh, Colin; Wang, Penghao; Gardiner, Donald; Choulet, Frédéric; Feuillet, Catherine; Zheng, You-Liang; Wei, Yuming; Yan, Guijun; Han, Bin; Manners, John M.; Liu, Chunji
2014-01-01
Fusarium pathogens cause two major diseases in cereals, Fusarium crown rot (FCR) and head blight (FHB). A large-effect locus conferring resistance to FCR disease was previously located to chromosome arm 3BL (designated as Qcrs-3B) and several independent sets of near isogenic lines (NILs) have been developed for this locus. In this study, five sets of the NILs were used to examine transcriptional changes associated with the Qcrs-3B locus and to identify genes linked to the resistance locus as a step towards the isolation of the causative gene(s). Of the differentially expressed genes (DEGs) detected between the NILs, 12.7% was located on the single chromosome 3B. Of the expressed genes containing SNP (SNP-EGs) detected, 23.5% was mapped to this chromosome. Several of the DEGs and SNP-EGs are known to be involved in host-pathogen interactions, and a large number of the DEGs were among those detected for FHB in previous studies. Of the DEGs detected, 22 were mapped in the Qcrs-3B interval and they included eight which were detected in the resistant isolines only. The enrichment of DEG, and not necessarily those containing SNPs between the resistant and susceptible isolines, around the Qcrs-3B locus is suggestive of local regulation of this region by the resistance allele. Functions for 13 of these DEGs are known. Of the SNP-EGs, 28 were mapped in the Qcrs-3B interval and biological functions for 16 of them are known. These results provide insights into responses regulated by the 3BL locus and identify a tractable number of target genes for fine mapping and functional testing to identify the causative gene(s) at this QTL. PMID:25405461
Wu, Ai-Min; Hörnblad, Emma; Voxeur, Aline; Gerber, Lorenz; Rihouey, Christophe; Lerouge, Patrice; Marchant, Alan
2010-06-01
The hemicellulose glucuronoxylan (GX) is a major component of plant secondary cell walls. However, our understanding of GX synthesis remains limited. Here, we identify and analyze two new genes from Arabidopsis (Arabidopsis thaliana), IRREGULAR XYLEM9-LIKE (IRX9-L) and IRX14-LIKE (IRX14-L) that encode glycosyltransferase family 43 members proposed to function during xylan backbone elongation. We place IRX9-L and IRX14-L in a genetic framework with six previously described glycosyltransferase genes (IRX9, IRX10, IRX10-L, IRX14, FRAGILE FIBER8 [FRA8], and FRA8 HOMOLOG [F8H]) and investigate their function in GX synthesis. Double-mutant analysis identifies IRX9-L and IRX14-L as functional homologs of IRX9 and IRX14, respectively. Characterization of irx9 irx10 irx14 fra8 and irx9-L irx10-L irx14-L f8h quadruple mutants allows definition of a set of genes comprising IRX9, IRX10, IRX14, and FRA8 that perform the main role in GX synthesis during vegetative development. The IRX9-L, IRX10-L, IRX14-L, and F8H genes are able to partially substitute for their respective homologs and normally perform a minor function. The irx14 irx14-L double mutant virtually lacks xylan, whereas irx9 irx9-L and fra8 f8h double mutants form lowered amounts of GX displaying a greatly reduced degree of backbone polymerization. Our findings reveal two distinct sets of four genes each differentially contributing to GX biosynthesis.
Wu, Ai-Min; Hörnblad, Emma; Voxeur, Aline; Gerber, Lorenz; Rihouey, Christophe; Lerouge, Patrice; Marchant, Alan
2010-01-01
The hemicellulose glucuronoxylan (GX) is a major component of plant secondary cell walls. However, our understanding of GX synthesis remains limited. Here, we identify and analyze two new genes from Arabidopsis (Arabidopsis thaliana), IRREGULAR XYLEM9-LIKE (IRX9-L) and IRX14-LIKE (IRX14-L) that encode glycosyltransferase family 43 members proposed to function during xylan backbone elongation. We place IRX9-L and IRX14-L in a genetic framework with six previously described glycosyltransferase genes (IRX9, IRX10, IRX10-L, IRX14, FRAGILE FIBER8 [FRA8], and FRA8 HOMOLOG [F8H]) and investigate their function in GX synthesis. Double-mutant analysis identifies IRX9-L and IRX14-L as functional homologs of IRX9 and IRX14, respectively. Characterization of irx9 irx10 irx14 fra8 and irx9-L irx10-L irx14-L f8h quadruple mutants allows definition of a set of genes comprising IRX9, IRX10, IRX14, and FRA8 that perform the main role in GX synthesis during vegetative development. The IRX9-L, IRX10-L, IRX14-L, and F8H genes are able to partially substitute for their respective homologs and normally perform a minor function. The irx14 irx14-L double mutant virtually lacks xylan, whereas irx9 irx9-L and fra8 f8h double mutants form lowered amounts of GX displaying a greatly reduced degree of backbone polymerization. Our findings reveal two distinct sets of four genes each differentially contributing to GX biosynthesis. PMID:20424005
Treatment-Induced Autophagy Associated with Tumor Dormancy and Relapse
2017-07-01
disease function by Ingenuity Pathway Analysis (IPA). The 239 genes involved in dormancy showed a z-score increase in disease states related to acute ...genes shared by both week 6 groups, one relapsing and the other dormant, showed predicted activation of both chronic and acute disease states. In...genes among 239 shared probe sets involved in maintenance of dormancy shows predicted activation of disease states related to acute inflammation, 682
A transversal approach to predict gene product networks from ontology-based similarity
Chabalier, Julie; Mosser, Jean; Burgun, Anita
2007-01-01
Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807
Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso
2013-09-26
Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
Phylogenetically informed logic relationships improve detection of biological network organization
2011-01-01
Background A "phylogenetic profile" refers to the presence or absence of a gene across a set of organisms, and it has been proven valuable for understanding gene functional relationships and network organization. Despite this success, few studies have attempted to search beyond just pairwise relationships among genes. Here we search for logic relationships involving three genes, and explore its potential application in gene network analyses. Results Taking advantage of a phylogenetic matrix constructed from the large orthologs database Roundup, we invented a method to create balanced profiles for individual triplets of genes that guarantee equal weight on the different phylogenetic scenarios of coevolution between genes. When we applied this idea to LAPP, the method to search for logic triplets of genes, the balanced profiles resulted in significant performance improvement and the discovery of hundreds of thousands more putative triplets than unadjusted profiles. We found that logic triplets detected biological network organization and identified key proteins and their functions, ranging from neighbouring proteins in local pathways, to well separated proteins in the whole pathway, and to the interactions among different pathways at the system level. Finally, our case study suggested that the directionality in a logic relationship and the profile of a triplet could disclose the connectivity between the triplet and surrounding networks. Conclusion Balanced profiles are superior to the raw profiles employed by traditional methods of phylogenetic profiling in searching for high order gene sets. Gene triplets can provide valuable information in detection of biological network organization and identification of key genes at different levels of cellular interaction. PMID:22172058
Castillo, Andres; Wang, Lu; Koriyama, Chihaya; Eizuru, Yoshito; Jordan, King; Akiba, Suminori
2014-10-01
Previous studies have reported the detection of a truncated E1 mRNA generated from HPV-18 in HeLa cells. Although it is unclear whether a truncated E1 protein could function as a replicative helicase for viral replication, it would still retain binding sites for potential interactions with different host cell proteins. Furthermore, in this study, we found evidence in support of expression of full-length HPV-18 E1 mRNA in HeLa cells. To determine whether interactions between E1 and cellular proteins play an important role in cellular processes other than viral replication, genome-wide expression profiles of HPV-18 positive HeLa cells were compared before and after the siRNA knockdown of E1 expression. Differential expression and gene set enrichment analysis uncovered four functionally related sets of genes implicated in host defence mechanisms against viral infection. These included the toll-like receptor, interferon and apoptosis pathways, along with the antiviral interferon-stimulated gene set. In addition, we found that the transcriptional coactivator E1A-binding protein p300 (EP300) was downregulated, which is interesting given that EP300 is thought to be required for the transcription of HPV-18 genes in HeLa cells. The observed changes in gene expression produced via the silencing of HPV-18 E1 expression in HeLa cells indicate that in addition to its well-known role in viral replication, the E1 protein may also play an important role in mitigating the host's ability to defend against viral infection.
Identification of a set of genes showing regionally enriched expression in the mouse brain
D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM
2008-01-01
Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066
Identification of a set of genes showing regionally enriched expression in the mouse brain.
D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa L C; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven J M
2008-07-14
The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.
Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans
Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo
2012-01-01
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122
Pharmaco-miR: linking microRNAs and drug effects
Rukov, Jakob Lewin; Wilentzik, Roni; Jaffe, Ishai; Vinther, Jeppe; Shomron, Noam
2014-01-01
MicroRNAs (miRNAs) are short regulatory RNAs that down-regulate gene expression. They are essential for cell homeostasis and active in many disease states. A major discovery is the ability of miRNAs to determine the efficacy of drugs, which has given rise to the field of ‘miRNA pharmacogenomics’ through ‘Pharmaco-miRs’. miRNAs play a significant role in pharmacogenomics by down-regulating genes that are important for drug function. These interactions can be described as triplet sets consisting of a miRNA, a target gene and a drug associated with the gene. We have developed a web server which links miRNA expression and drug function by combining data on miRNA targeting and protein–drug interactions. miRNA targeting information derive from both experimental data and computational predictions, and protein–drug interactions are annotated by the Pharmacogenomics Knowledge base (PharmGKB). Pharmaco-miR’s input consists of miRNAs, genes and/or drug names and the output consists of miRNA pharmacogenomic sets or a list of unique associated miRNAs, genes and drugs. We have furthermore built a database, named Pharmaco-miR Verified Sets (VerSe), which contains miRNA pharmacogenomic data manually curated from the literature, can be searched and downloaded via Pharmaco-miR and informs on trends and generalities published in the field. Overall, we present examples of how Pharmaco-miR provides possible explanations for previously published observations, including how the cisplatin and 5-fluorouracil resistance induced by miR-148a may be caused by miR-148a targeting of the gene KIT. The information is available at www.Pharmaco-miR.org. PMID:23376192
Microgravity and immunity: Changes in lymphocyte gene expression.
NASA Astrophysics Data System (ADS)
Risin, D.; Ward, N. E.; Risin, S. A.; Pellis, N. R.
Earlier studies had shown that modeled and true microgravity MG cause multiple direct effects on human lymphocytes MG inhibits lymphocyte locomotion suppresses polyclonal and antigen-specific activation affects signal transduction mechanisms as well as activation-induced apoptosis In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes MGSG in general and specifically those genes that might be responsible for the functional and structural changes observed earlier Two sets of experiments targeting different goals were conducted In the first set T-lymphocytes from normal donors were activated with anti-CD3 and IL2 and then cultured in 1g static and modeled MG MMG conditions Rotating Wall Vessel bioreactor for 24 hours This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus PHA thus triggering the apoptotic pathway Total RNA was extracted using the RNeasy isolation kit Qiagen Valencia CA Affymetrix Gene Chips U133A allowing testing for 18 400 human genes were used for microarray analysis The experiments were performed in triplicates with T-cells obtained from different blood donors to minimize the possible input of biological variation in gene expression and discriminate changes that are associated with the
Extensive Gains and Losses of Olfactory Receptor Genes in Mammalian Evolution
Niimura, Yoshihito; Nei, Masatoshi
2007-01-01
Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammalian OR genes, we identified the entire set of OR genes in platypuses, opossums, cows, dogs, rats, and macaques and studied the evolutionary change of the genes together with those of humans and mice. We found that platypuses and primates have <400 functional OR genes while the other species have 800–1,200 functional OR genes. We then estimated the numbers of gains and losses of OR genes for each branch of the phylogenetic tree of mammals. This analysis showed that (i) gene expansion occurred in the placental lineage each time after it diverged from monotremes and from marsupials and (ii) hundreds of gains and losses of OR genes have occurred in an order-specific manner, making the gene repertoires highly variable among different orders. It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes. This fluctuation seems to have been aided by the stochastic nature of OR gene expression. PMID:17684554
Isaacson, Sven; Luo, Feng; Feltus, Frank A.; Smith, Melissa C.
2013-01-01
The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust. PMID:23409071
Gesing, Stefan; Schindler, Daniel; Nowrousian, Minou
2013-09-01
Ascomycetes differentiate four major morphological types of fruiting bodies (apothecia, perithecia, pseudothecia and cleistothecia) that are derived from an ancestral fruiting body. Thus, fruiting body differentiation is most likely controlled by a set of common core genes. One way to identify such genes is to search for genes with evolutionary conserved expression patterns. Using suppression subtractive hybridization (SSH), we selected differentially expressed transcripts in Pyronema confluens (Pezizales) by comparing two cDNA libraries specific for sexual and for vegetative development, respectively. The expression patterns of selected genes from both libraries were verified by quantitative real time PCR. Expression of several corresponding homologous genes was found to be conserved in two members of the Sordariales (Sordaria macrospora and Neurospora crassa), a derived group of ascomycetes that is only distantly related to the Pezizales. Knockout studies with N. crassa orthologues of differentially regulated genes revealed a functional role during fruiting body development for the gene NCU05079, encoding a putative MFS peptide transporter. These data indicate conserved gene expression patterns and a functional role of the corresponding genes during fruiting body development; such genes are candidates of choice for further functional analysis. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Chen, Xiaoping; Zhu, Wei; Azam, Sarwar; Li, Heying; Zhu, Fanghe; Li, Haifen; Hong, Yanbin; Liu, Haiyan; Zhang, Erhua; Wu, Hong; Yu, Shanlin; Zhou, Guiyuan; Li, Shaoxiong; Zhong, Ni; Wen, Shijie; Li, Xingyu; Knapp, Steve J; Ozias-Akins, Peggy; Varshney, Rajeev K; Liang, Xuanqiang
2013-01-01
The failure of peg penetration into the soil leads to seed abortion in peanut. Knowledge of genes involved in these processes is comparatively deficient. Here, we used RNA-seq to gain insights into transcriptomes of aerial and subterranean pods. More than 2 million transcript reads with an average length of 396 bp were generated from one aerial (AP) and two subterranean (SP1 and SP2) pod libraries using pyrosequencing technology. After assembly, sets of 49 632, 49 952 and 50 494 from a total of 74 974 transcript assembly contigs (TACs) were identified in AP, SP1 and SP2, respectively. A clear linear relationship in the gene expression level was observed between these data sets. In brief, 2194 differentially expressed TACs with a 99.0% true-positive rate were identified, among which 859 and 1068 TACs were up-regulated in aerial and subterranean pods, respectively. Functional analysis showed that putative function based on similarity with proteins catalogued in UniProt and gene ontology term classification could be determined for 59 342 (79.2%) and 42 955 (57.3%) TACs, respectively. A total of 2968 TACs were mapped to 174 KEGG pathways, of which 168 were shared by aerial and subterranean transcriptomes. TACs involved in photosynthesis were significantly up-regulated and enriched in the aerial pod. In addition, two senescence-associated genes were identified as significantly up-regulated in the aerial pod, which potentially contribute to embryo abortion in aerial pods, and in turn, to cessation of swelling. The data set generated in this study provides evidence for some functional genes as robust candidates underlying aerial and subterranean pod development and contributes to an elucidation of the evolutionary implications resulting from fruit development under light and dark conditions. © 2012 The Authors Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.
Verhagen, Lilly M; Zomer, Aldert; Maes, Mailis; Villalba, Julian A; Del Nogal, Berenice; Eleveld, Marc; van Hijum, Sacha Aft; de Waard, Jacobus H; Hermans, Peter Wm
2013-02-01
Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.
2013-01-01
Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts. PMID:23375113
2014-01-01
Background A thorough investigation of the neurobiology of HIV-induced neuronal dysfunction and its evolving phenotype in the setting of viral suppression has been limited by the lack of validated small animal models to probe the effects of concomitant low level expression of multiple HIV-1 products in disease-relevant cells in the CNS. Results We report the results of gene expression profiling of the hippocampus of HIV-1 Tg rats, a rodent model of HIV infection in which multiple HIV-1 proteins are expressed under the control of the viral LTR promoter in disease-relevant cells including microglia and astrocytes. The Gene Set Enrichment Analysis (GSEA) algorithm was used for pathway analysis. Gene expression changes observed are consistent with astrogliosis and microgliosis and include evidence of inflammation and cell proliferation. Among the genes with increased expression in HIV-1 Tg rats was the interferon stimulated gene 15 (ISG-15), which was previously shown to be increased in the cerebrospinal fluid (CSF) of HIV patients and to correlate with neuropsychological impairment and neuropathology, and prostaglandin D2 (PGD2) synthase (Ptgds), which has been associated with immune activation and the induction of astrogliosis and microgliosis. GSEA-based pathway analysis highlighted a broad dysregulation of genes involved in neuronal trophism and neurodegenerative disorders. Among the latter are genesets associated with Huntington’s disease, Parkinson’s disease, mitochondrial, peroxisome function, and synaptic trophism and plasticity, such as IGF, ErbB and netrin signaling and the PI3K signal transduction pathway, a mediator of neural plasticity and of a vast array of trophic signals. Additionally, gene expression analyses also show altered lipid metabolism and peroxisomes dysfunction. Supporting the functional significance of these gene expression alterations, HIV-1 Tg rats showed working memory impairments in spontaneous alternation behavior in the T-Maze, a paradigm sensitive to prefrontal cortex and hippocampal function. Conclusions Altogether, differentially regulated genes and pathway analysis identify specific pathways that can be targeted therapeutically to increase trophic support, e.g. IGF, ErbB and netrin signaling, and reduce neuroinflammation, e.g. PGD2 synthesis, which may be beneficial in the treatment of chronic forms of HIV-associated neurocognitive disorders in the setting of viral suppression. PMID:24980976
Drug2Gene: an exhaustive resource to explore effectively the drug-target relation network.
Roider, Helge G; Pavlova, Nadia; Kirov, Ivaylo; Slavov, Stoyan; Slavov, Todor; Uzunov, Zlatyo; Weiss, Bertram
2014-03-11
Information about drug-target relations is at the heart of drug discovery. There are now dozens of databases providing drug-target interaction data with varying scope, and focus. Therefore, and due to the large chemical space, the overlap of the different data sets is surprisingly small. As searching through these sources manually is cumbersome, time-consuming and error-prone, integrating all the data is highly desirable. Despite a few attempts, integration has been hampered by the diversity of descriptions of compounds, and by the fact that the reported activity values, coming from different data sets, are not always directly comparable due to usage of different metrics or data formats. We have built Drug2Gene, a knowledge base, which combines the compound/drug-gene/protein information from 19 publicly available databases. A key feature is our rigorous unification and standardization process which makes the data truly comparable on a large scale, allowing for the first time effective data mining in such a large knowledge corpus. As of version 3.2, Drug2Gene contains 4,372,290 unified relations between compounds and their targets most of which include reported bioactivity data. We extend this set with putative (i.e. homology-inferred) relations where sufficient sequence homology between proteins suggests they may bind to similar compounds. Drug2Gene provides powerful search functionalities, very flexible export procedures, and a user-friendly web interface. Drug2Gene v3.2 has become a mature and comprehensive knowledge base providing unified, standardized drug-target related information gathered from publicly available data sources. It can be used to integrate proprietary data sets with publicly available data sets. Its main goal is to be a 'one-stop shop' to identify tool compounds targeting a given gene product or for finding all known targets of a drug. Drug2Gene with its integrated data set of public compound-target relations is freely accessible without restrictions at http://www.drug2gene.com.
Niu, Sheng-Yong; Yang, Jinyu; McDermaid, Adam; Zhao, Jing; Kang, Yu; Ma, Qin
2017-05-08
Metagenomic and metatranscriptomic sequencing approaches are more frequently being used to link microbiota to important diseases and ecological changes. Many analyses have been used to compare the taxonomic and functional profiles of microbiota across habitats or individuals. While a large portion of metagenomic analyses focus on species-level profiling, some studies use strain-level metagenomic analyses to investigate the relationship between specific strains and certain circumstances. Metatranscriptomic analysis provides another important insight into activities of genes by examining gene expression levels of microbiota. Hence, combining metagenomic and metatranscriptomic analyses will help understand the activity or enrichment of a given gene set, such as drug-resistant genes among microbiome samples. Here, we summarize existing bioinformatics tools of metagenomic and metatranscriptomic data analysis, the purpose of which is to assist researchers in deciding the appropriate tools for their microbiome studies. Additionally, we propose an Integrated Meta-Function mapping pipeline to incorporate various reference databases and accelerate functional gene mapping procedures for both metagenomic and metatranscriptomic analyses. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Annotation of gene function in citrus using gene expression information and co-expression networks
2014-01-01
Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir; Emmert-Streib, Frank
2016-01-01
Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
The prevalence of terraced treescapes in analyses of phylogenetic data sets.
Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J
2018-04-04
The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.
Zeng, Jia; Hannenhalli, Sridhar
2013-01-01
Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study. Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human. Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
Mandelker, Diana; Schmidt, Ryan J; Ankala, Arunkanth; McDonald Gibson, Kristin; Bowser, Mark; Sharma, Himanshu; Duffy, Elizabeth; Hegde, Madhuri; Santani, Avni; Lebo, Matthew; Funke, Birgit
2016-12-01
Next-generation sequencing (NGS) is now routinely used to interrogate large sets of genes in a diagnostic setting. Regions of high sequence homology continue to be a major challenge for short-read technologies and can lead to false-positive and false-negative diagnostic errors. At the scale of whole-exome sequencing (WES), laboratories may be limited in their knowledge of genes and regions that pose technical hurdles due to high homology. We have created an exome-wide resource that catalogs highly homologous regions that is tailored toward diagnostic applications. This resource was developed using a mappability-based approach tailored to current Sanger and NGS protocols. Gene-level and exon-level lists delineate regions that are difficult or impossible to analyze via standard NGS. These regions are ranked by degree of affectedness, annotated for medical relevance, and classified by the type of homology (within-gene, different functional gene, known pseudogene, uncharacterized noncoding region). Additionally, we provide a list of exons that cannot be analyzed by short-amplicon Sanger sequencing. This resource can help guide clinical test design, supplemental assay implementation, and results interpretation in the context of high homology.Genet Med 18 12, 1282-1289.
Hwang, Sun-Goo; Kim, Dong Sub; Hwang, Jung Eun; Han, A-Reum; Jang, Cheol Seong
2014-05-15
In order to better understand the biological systems that are affected in response to cosmic ray (CR), we conducted weighted gene co-expression network analysis using the module detection method. By using the Pearson's correlation coefficient (PCC) value, we evaluated complex gene-gene functional interactions between 680 CR-responsive probes from integrated microarray data sets, which included large-scale transcriptional profiling of 1000 microarray samples. These probes were divided into 6 distinct modules that contained 20 enriched gene ontology (GO) functions, such as oxidoreductase activity, hydrolase activity, and response to stimulus and stress. In particular, modules 1 and 2 commonly showed enriched annotation categories such as oxidoreductase activity, including enriched cis-regulatory elements known as ROS-specific regulators. These results suggest that the ROS-mediated irradiation response pathway is affected by CR in modules 1 and 2. We found 243 ionizing radiation (IR)-responsive probes that exhibited similarities in expression patterns in various irradiation microarray data sets. The expression patterns of 6 randomly selected IR-responsive genes were evaluated by quantitative reverse transcription polymerase chain reaction following treatment with CR, gamma rays (GR), and ion beam (IB); similar patterns were observed among these genes under these 3 treatments. Moreover, we constructed subnetworks of IR-responsive genes and evaluated the expression levels of their neighboring genes following GR treatment; similar patterns were observed among them. These results of network-based analyses might provide a clue to understanding the complex biological system related to the CR response in plants. Copyright © 2014 Elsevier B.V. All rights reserved.
Decoding the genome beyond sequencing: the new phase of genomic research.
Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J
2011-10-01
While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Xu, Aishi; Li, Guang; Yang, Dong; Wu, Songfeng; Ouyang, Hongsheng; Xu, Ping; He, Fuchu
2015-12-04
Although the "missing protein" is a temporary concept in C-HPP, the biological information for their "missing" could be an important clue in evolutionary studies. Here we classified missing-protein-encoding genes into two groups, the genes encoding PE2 proteins (with transcript evidence) and the genes encoding PE3/4 proteins (with no transcript evidence). These missing-protein-encoding genes distribute unevenly among different chromosomes, chromosomal regions, or gene clusters. In the view of evolutionary features, PE3/4 genes tend to be young, spreading at the nonhomology chromosomal regions and evolving at higher rates. Interestingly, there is a higher proportion of singletons in PE3/4 genes than the proportion of singletons in all genes (background) and OTCSGs (organ, tissue, cell type-specific genes). More importantly, most of the paralogous PE3/4 genes belong to the newly duplicated members of the paralogous gene groups, which mainly contribute to special biological functions, such as "smell perception". These functions are heavily restricted into specific type of cells, tissues, or specific developmental stages, acting as the new functional requirements that facilitated the emergence of the missing-protein-encoding genes during evolution. In addition, the criteria for the extremely special physical-chemical proteins were first set up based on the properties of PE2 proteins, and the evolutionary characteristics of those proteins were explored. Overall, the evolutionary analyses of missing-protein-encoding genes are expected to be highly instructive for proteomics and functional studies in the future.
DiRE: identifying distant regulatory elements of co-expressed genes
Gotea, Valer; Ovcharenko, Ivan
2008-01-01
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org. PMID:18487623
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes
Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J
2008-01-01
Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Transcription Profiling of the mgrA Regulon in Staphylococcus aureus
Luong, Thanh T.; Dunman, Paul M.; Murphy, Ellen; Projan, Steven J.; Lee, Chia Y.
2006-01-01
MgrA has been shown to affect multiple Staphylococcus aureus genes involved in virulence and antibiotic resistance. To comprehensively identify the target genes regulated by mgrA, we employed a microarray method to analyze the transcription profiles of S. aureus Newman, its isogeneic mgrA mutant, and an MgrA-overproducing derivative. We compared genes that were differentially expressed at exponential or early stationary growth phases. Our results showed that MgrA affected an impressive number of genes, 175 of which were positively regulated and 180 of which were negatively regulated in an mgrA-specific manner. The target genes included all functional categories. The microarray results were validated by real-time reverse transcription-PCR quantitation of a set of selected genes from different functional categories. Our data also indicate that mgrA regulates virulence factors in a fashion analogous to that of the accessory gene regulatory locus (agr). Accordingly, exoproteins are upregulated and surface proteins are downregulated by the regulator, suggesting that mgrA may function in concert with agr. The fact that a large number of genes are regulated by mgrA implies that MgrA is a major global regulator in S. aureus. PMID:16484201
Identification and function analysis of contrary genes in Dupuytren's contracture.
Ji, Xianglu; Tian, Feng; Tian, Lijie
2015-07-01
The present study aimed to analyze the expression of genes involved in Dupuytren's contracture (DC), using bioinformatic methods. The profile of GSE21221 was downloaded from the gene expression ominibus, which included six samples, derived from fibroblasts and six healthy control samples, derived from carpal-tunnel fibroblasts. A Distributed Intrusion Detection System was used in order to identify differentially expressed genes. The term contrary genes is proposed. Contrary genes were the genes that exhibited opposite expression patterns in the positive and negative groups, and likely exhibited opposite functions. These were identified using Coexpress software. Gene ontology (GO) function analysis was conducted for the contrary genes. A network of GO terms was constructed using the reduce and visualize gene ontology database. Significantly expressed genes (801) and contrary genes (98) were screened. A significant association was observed between Chitinase-3-like protein 1 and ten genes in the positive gene set. Positive regulation of transcription and the activation of nuclear factor-κB (NF-κB)-inducing kinase activity exhibited the highest degree values in the network of GO terms. In the present study, the expression of genes involved in the development of DC was analyzed, and the concept of contrary genes proposed. The genes identified in the present study are involved in the positive regulation of transcription and activation of NF-κB-inducing kinase activity. The contrary genes and GO terms identified in the present study may potentially be used for DC diagnosis and treatment.
Prediction of gene expression in embryonic structures of Drosophila melanogaster.
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-07-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-01-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945
Pace, Tomasino; Olivieri, Anna; Sanchez, Massimo; Albanesi, Veronica; Picci, Leonardo; Siden Kiamos, Inga; Janse, Chris J; Waters, Andrew P; Pizzi, Elisabetta; Ponzi, Marta
2006-05-01
Transmission of the malaria parasite depends on specialized gamete precursors (gametocytes) that develop in the bloodstream of a vertebrate host. Gametocyte/gamete differentiation requires controlled patterns of gene expression and regulation not only of stage and gender-specific genes but also of genes associated with DNA replication and mitosis. Once taken up by mosquito, male gametocytes undergo three mitotic cycles within few minutes to produce eight motile gametes. Here we analysed, in two Plasmodium species, the expression of SET, a conserved nuclear protein involved in chromatin dynamics. SET is expressed in both asexual and sexual blood stages but strongly accumulates in male gametocytes. We demonstrated functionally the presence of two distinct promoters upstream of the set open reading frame, the one active in all blood stage parasites while the other active only in gametocytes and in a fraction of schizonts possibly committed to sexual differentiation. In ookinetes both promoters exhibit a basal activity, while in the oocysts the gametocyte-specific promoter is silent and the reporter gene is only transcribed from the constitutive promoter. This transcriptional control, described for the first time in Plasmodium, provides a mechanism by which single-copy genes can be differently modulated during parasite development. In male gametocytes an overexpression of SET might contribute to a prompt entry and execution of S/M phases within mosquito vector.
Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055
Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.
2009-01-01
Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint. PMID:19506001
Protective pathways against colitis mediated by appendicitis and appendectomy
Cheluvappa, R; Luo, A S; Palmer, C; Grimm, M C
2011-01-01
Appendicitis followed by appendectomy (AA) at a young age protects against inflammatory bowel disease (IBD). Using a novel murine appendicitis model, we showed that AA protected against subsequent experimental colitis. To delineate genes/pathways involved in this protection, AA was performed and samples harvested from the most distal colon. RNA was extracted from four individual colonic samples per group (AA group and double-laparotomy control group) and each sample microarray analysed followed by gene-set enrichment analysis (GSEA). The gene-expression study was validated by quantitative reverse transcription–polymerase chain reaction (RT–PCR) of 14 selected genes across the immunological spectrum. Distal colonic expression of 266 gene-sets was up-regulated significantly in AA group samples (false discovery rates < 1%; P-value < 0·001). Time–course RT–PCR experiments involving the 14 genes displayed down-regulation over 28 days. The IBD-associated genes tnfsf10, SLC22A5, C3, ccr5, irgm, ptger4 and ccl20 were modulated in AA mice 3 days after surgery. Many key immunological and cellular function-associated gene-sets involved in the protective effect of AA in experimental colitis were identified. The down-regulation of 14 selected genes over 28 days after surgery indicates activation, repression or de-repression of these genes leading to downstream AA-conferred anti-colitis protection. Further analysis of these genes, profiles and biological pathways may assist in developing better therapeutic strategies in the management of intractable IBD. PMID:21707591
Hess, Jonathan L.; Tylee, Daniel S.; Barve, Rahul; de Jong, Simone; Ophoff, Roel A.; Kumarasinghe, Nishantha; Tooney, Paul; Schall, Ulrich; Gardiner, Erin; Beveridge, Natalie Jane; Scott, Rodney J.; Yasawardene, Surangi; Perera, Antionette; Mendis, Jayan; Carr, Vaughan; Kelly, Brian; Cairns, Murray; Tsuang, Ming T.; Glatt, Stephen J.
2016-01-01
The application of microarray technology in schizophrenia research was heralded as paradigm-shifting, as it allowed for high-throughput assessment of cell and tissue function. This technology was widely adopted, initially in studies of postmortem brain tissue, and later in studies of peripheral blood. The collective body of schizophrenia microarray literature contains apparent inconsistencies between studies, with failures to replicate top hits, in part due to small sample sizes, cohort-specific effects, differences in array types, and other confounders. In an attempt to summarize existing studies of schizophrenia cases and non-related comparison subjects, we performed two mega-analyses of a combined set of microarray data from postmortem prefrontal cortices (n = 315) and from ex-vivo blood tissues (n = 578). We adjusted regression models per gene to remove non-significant covariates, providing best-estimates of transcripts dysregulated in schizophrenia. We also examined dysregulation of functionally related gene sets and gene co-expression modules, and assessed enrichment of cell types and genetic risk factors. The identities of the most significantly dysregulated genes were largely distinct for each tissue, but the findings indicated common emergent biological functions (e.g. immunity) and regulatory factors (e.g., predicted targets of transcription factors and miRNA species across tissues). Our network-based analyses converged upon similar patterns of heightened innate immune gene expression in both brain and blood in schizophrenia. We also constructed generalizable machine-learning classifiers using the blood-based microarray data. Our study provides an informative atlas for future pathophysiologic and biomarker studies of schizophrenia. PMID:27450777
Molecular characterization of the apical organ of the anthozoan Nematostella vectensis
Sinigaglia, Chiara; Busengdal, Henriette; Lerner, Avi; Oliveri, Paola; Rentzsch, Fabian
2015-01-01
Apical organs are sensory structures present in many marine invertebrate larvae where they are considered to be involved in their settlement, metamorphosis and locomotion. In bilaterians they are characterised by a tuft of long cilia and receptor cells and they are associated with groups of neurons, but their relatively low morphological complexity and dispersed phylogenetic distribution have left their evolutionary relationship unresolved. Moreover, since apical organs are not present in the standard model organisms, their development and function are not well understood. To provide a foundation for a better understanding of this structure we have characterised the molecular composition of the apical organ of the sea anemone Nematostella vectensis. In a microarray-based comparison of the gene expression profiles of planulae with either a wildtype or an experimentally expanded apical organ, we identified 78 evolutionarily conserved genes, which are predominantly or specifically expressed in the apical organ of Nematostella. This gene set comprises signalling molecules, transcription factors, structural and metabolic genes. The majority of these genes, including several conserved, but previously uncharacterized ones, are potentially involved in different aspects of the development or function of the long cilia of the apical organ. To demonstrate the utility of this gene set for comparative analyses, we further analysed the expression of a subset of previously uncharacterized putative orthologs in sea urchin larvae and detected expression for twelve out of eighteen of them in the apical domain. Our study provides a molecular characterization of the apical organ of Nematostella and represents an informative tool for future studies addressing the development, function and evolutionary history of apical organ cells. PMID:25478911
Hess, Jonathan L; Tylee, Daniel S; Barve, Rahul; de Jong, Simone; Ophoff, Roel A; Kumarasinghe, Nishantha; Tooney, Paul; Schall, Ulrich; Gardiner, Erin; Beveridge, Natalie Jane; Scott, Rodney J; Yasawardene, Surangi; Perera, Antionette; Mendis, Jayan; Carr, Vaughan; Kelly, Brian; Cairns, Murray; Tsuang, Ming T; Glatt, Stephen J
2016-10-01
The application of microarray technology in schizophrenia research was heralded as paradigm-shifting, as it allowed for high-throughput assessment of cell and tissue function. This technology was widely adopted, initially in studies of postmortem brain tissue, and later in studies of peripheral blood. The collective body of schizophrenia microarray literature contains apparent inconsistencies between studies, with failures to replicate top hits, in part due to small sample sizes, cohort-specific effects, differences in array types, and other confounders. In an attempt to summarize existing studies of schizophrenia cases and non-related comparison subjects, we performed two mega-analyses of a combined set of microarray data from postmortem prefrontal cortices (n=315) and from ex-vivo blood tissues (n=578). We adjusted regression models per gene to remove non-significant covariates, providing best-estimates of transcripts dysregulated in schizophrenia. We also examined dysregulation of functionally related gene sets and gene co-expression modules, and assessed enrichment of cell types and genetic risk factors. The identities of the most significantly dysregulated genes were largely distinct for each tissue, but the findings indicated common emergent biological functions (e.g. immunity) and regulatory factors (e.g., predicted targets of transcription factors and miRNA species across tissues). Our network-based analyses converged upon similar patterns of heightened innate immune gene expression in both brain and blood in schizophrenia. We also constructed generalizable machine-learning classifiers using the blood-based microarray data. Our study provides an informative atlas for future pathophysiologic and biomarker studies of schizophrenia. Published by Elsevier B.V.
Graubner, Felix R; Gram, Aykut; Kautz, Ewa; Bauersachs, Stefan; Aslan, Selim; Agaoglu, Ali R; Boos, Alois; Kowalewski, Mariusz P
2017-08-01
In the dog, there is no luteolysis in the absence of pregnancy. Thus, this species lacks any anti-luteolytic endocrine signal as found in other species that modulate uterine function during the critical period of pregnancy establishment. Nevertheless, in the dog an embryo-maternal communication must occur in order to prevent rejection of embryos. Based on this hypothesis, we performed microarray analysis of canine uterine samples collected during pre-attachment phase (days 10-12) and in corresponding non-pregnant controls, in order to elucidate the embryo attachment signal. An additional goal was to identify differences in uterine responses to pre-attachment embryos between dogs and other mammalian species exhibiting different reproductive patterns with regard to luteolysis, implantation, and preparation for placentation. Therefore, the canine microarray data were compared with gene sets from pigs, cattle, horses, and humans. We found 412 genes differentially regulated between the two experimental groups. The functional terms most strongly enriched in response to pre-attachment embryos related to extracellular matrix function and remodeling, and to immune and inflammatory responses. Several candidate genes were validated by semi-quantitative PCR. When compared with other species, best matches were found with human and equine counterparts. Especially for the pig, the majority of overlapping genes showed opposite expression patterns. Interestingly, 1926 genes did not pair with any of the other gene sets. Using a microarray approach, we report the uterine changes in the dog driven by the presence of embryos and compare these results with datasets from other mammalian species, finding common-, contrary-, and exclusively canine-regulated genes. © The Authors 2017. Published by Oxford University Press on behalf of Society for the Study of Reproduction.
Abernathy, Jason; Brezas, Andreas; Snekvik, Kevin R; Hardy, Ronald W; Overturf, Ken
2017-01-01
Finding suitable alternative protein sources for diets of carnivorous fish species remains a major concern for sustainable aquaculture. Through genetic selection, we created a strain of rainbow trout that outperforms parental lines in utilizing an all-plant protein diet and does not develop enteritis in the distal intestine, as is typical with salmonids on long-term plant protein-based feeds. By incorporating this strain into functional analyses, we set out to determine which genes are critical to plant protein utilization in the absence of gut inflammation. After a 12-week feeding trial with our selected strain and a control trout strain fed either a fishmeal-based diet or an all-plant protein diet, high-throughput RNA sequencing was completed on both liver and muscle tissues. Differential gene expression analyses, weighted correlation network analyses and further functional characterization were performed. A strain-by-diet design revealed differential expression ranging from a few dozen to over one thousand genes among the various comparisons and tissues. Major gene ontology groups identified between comparisons included those encompassing central, intermediary and foreign molecule metabolism, associated biosynthetic pathways as well as immunity. A systems approach indicated that genes involved in purine metabolism were highly perturbed. Systems analysis among the tissues tested further suggests the interplay between selection for growth, dietary utilization and protein tolerance may also have implications for nonspecific immunity. By combining data from differential gene expression and co-expression networks using selected trout, along with ontology and pathway analyses, a set of 63 candidate genes for plant diet tolerance was found. Risk loci in human inflammatory bowel diseases were also found in our datasets, indicating rainbow trout selected for plant-diet tolerance may have added utility as a potential biomedical model.
Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study
Weißenborn, Sandra; Walther, Dirk
2017-01-01
Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570
Huang, Ruili; Wallqvist, Anders; Covell, David G
2006-03-01
We have analyzed the level of gene coregulation, using gene expression patterns measured across the National Cancer Institute's 60 tumor cell panels (NCI(60)), in the context of predefined pathways or functional categories annotated by KEGG (Kyoto Encyclopedia of Genes and Genomes), BioCarta, and GO (Gene Ontology). Statistical methods were used to evaluate the level of gene expression coherence (coordinated expression) by comparing intra- and interpathway gene-gene correlations. Our results show that gene expression in pathways, or groups of functionally related genes, has a significantly higher level of coherence than that of a randomly selected set of genes. Transcriptional-level gene regulation appears to be on a "need to be" basis, such that pathways comprising genes encoding closely interacting proteins and pathways responsible for vital cellular processes or processes that are related to growth or proliferation, specifically in cancer cells, such as those engaged in genetic information processing, cell cycle, energy metabolism, and nucleotide metabolism, tend to be more modular (lower degree of gene sharing) and to have genes significantly more coherently expressed than most signaling and regular metabolic pathways. Hierarchical clustering of pathways based on their differential gene expression in the NCI(60) further revealed interesting interpathway communications or interactions indicative of a higher level of pathway regulation. The knowledge of the nature of gene expression regulation and biological pathways can be applied to understanding the mechanism by which small drug molecules interfere with biological systems.
Natural variation in PTB1 regulates rice seed setting rate by controlling pollen tube growth.
Li, Shuangcheng; Li, Wenbo; Huang, Bin; Cao, Xuemei; Zhou, Xingyu; Ye, Shumei; Li, Chengbo; Gao, Fengyan; Zou, Ting; Xie, Kailong; Ren, Yun; Ai, Peng; Tang, Yangfan; Li, Xuemei; Deng, Qiming; Wang, Shiquan; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Li, Ping
2013-01-01
Grain number, panicle seed setting rate, panicle number and grain weight are the most important components of rice grain yield. To date, several genes related to grain weight, grain number and panicle number have been described in rice. However, no genes regulating the panicle seed setting rate have been functionally characterized. Here we show that the domestication-related POLLEN TUBE BLOCKED 1 (PTB1), a RING-type E3 ubiquitin ligase, positively regulates the rice panicle seed setting rate by promoting pollen tube growth. The natural variation in expression of PTB1 which is affected by the promoter haplotype and the environmental temperature, correlates with the rice panicle seed setting rate. Our results support the hypothesis that PTB1 is an important maternal sporophytic factor of pollen tube growth and a key modulator of the rice panicle seed setting rate. This finding has implications for the improvement of rice yield.
Baym, Michael; Shaket, Lev; Anzai, Isao A; Adesina, Oluwakemi; Barstow, Buz
2016-11-10
Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction.
Wang, Yan; Morkin, Melina I.; Fernandez, Stephanie G.; Mlacker, Gregory M.; Shechter, Jesse M.; Liu, Xiongfei; Patel, Karan H.; Lapins, Allison; Yang, Steven; Dombrowski, Susan M.
2014-01-01
The failure of the CNS neurons to regenerate axons after injury or stroke is a major clinical problem. Transcriptional regulators like Set-β are well positioned to regulate intrinsic axon regeneration capacity, which declines developmentally in maturing CNS neurons. Set-β also functions at cellular membranes and its subcellular localization is disrupted in Alzheimer's disease, but many of its biological mechanisms have not been explored in neurons. We found that Set-β was upregulated postnatally in CNS neurons, and was primarily localized to the nucleus but was also detected in the cytoplasm and adjacent to the plasma membrane. Remarkably, nuclear Set-β suppressed, whereas Set-β localized to cytoplasmic membranes promoted neurite growth in rodent retinal ganglion cells and hippocampal neurons. Mimicking serine 9 phosphorylation, as found in Alzheimer's disease brains, delayed nuclear import and furthermore blocked the ability of nuclear Set-β to suppress neurite growth. We also present data on gene regulation and protein binding partner recruitment by Set-β in primary neurons, raising the hypothesis that nuclear Set-β may preferentially regulate gene expression whereas Set-β at cytoplasmic membranes may regulate unique cofactors, including PP2A, which we show also regulates axon growth in vitro. Finally, increasing recruitment of Set-β to cellular membranes promoted adult rat optic nerve axon regeneration after injury in vivo. Thus, Set-β differentially regulates axon growth and regeneration depending on subcellular localization and phosphorylation. PMID:24849368
SorghumFDB: sorghum functional genomics database with multidimensional network analysis.
Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen
2016-01-01
Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. © The Author(s) 2016. Published by Oxford University Press.
Sun, Haimeng; Yang, Zhongchen; Wei, Caijie; Wu, Weizhong
2018-04-26
An up-flow vertical flow constructed wetland (AC-VFCW) filled with ceramsite and 5% external carbon source poly(3-hydroxybutyrate-hydroxyvalerate) (PHBV) as substrate was set for nitrogen removal with micro aeration. Simultaneous nitrification and denitrification process was observed with 90.4% NH 4 + -N and 92.1% TN removal efficiencies. Nitrification and denitrification genes were both preferentially enriched on the surface of PHBV. Nitrogen transformation along the flow direction showed that NH 4 + -N was oxidized to NO 3 - -N at the lowermost 10 cm of the substrate and NO 3 - -N gradually degraded over the depth. AmoA gene was more enriched at -10 and -50 cm layers. NirS gene was the dominant functional gene at the bottom layer with the abundance of 2.05 × 10 7 copies g -1 substrate while nosZ gene was predominantly abundant with 7.51 × 10 6 and 2.64 × 10 6 copies g -1 substrate at the middle and top layer, respectively, indicating that functional division of dominant nitrogen functional genes forms along the flow direction in AC-VFCW. Copyright © 2018. Published by Elsevier Ltd.
QSAR Study for Carcinogenic Potency of Aromatic Amines Based on GEP and MLPs
Song, Fucheng; Zhang, Anling; Liang, Hui; Cui, Lianhua; Li, Wenlian; Si, Hongzong; Duan, Yunbo; Zhai, Honglin
2016-01-01
A new analysis strategy was used to classify the carcinogenicity of aromatic amines. The physical-chemical parameters are closely related to the carcinogenicity of compounds. Quantitative structure activity relationship (QSAR) is a method of predicting the carcinogenicity of aromatic amine, which can reveal the relationship between carcinogenicity and physical-chemical parameters. This study accessed gene expression programming by APS software, the multilayer perceptrons by Weka software to predict the carcinogenicity of aromatic amines, respectively. All these methods relied on molecular descriptors calculated by CODESSA software and eight molecular descriptors were selected to build function equations. As a remarkable result, the accuracy of gene expression programming in training and test sets are 0.92 and 0.82, the accuracy of multilayer perceptrons in training and test sets are 0.84 and 0.74 respectively. The precision of the gene expression programming is obviously superior to multilayer perceptrons both in training set and test set. The QSAR application in the identification of carcinogenic compounds is a high efficiency method. PMID:27854309
Ji, S C; Pan, Y T; Lu, Q Y; Sun, Z Y; Liu, Y Z
2014-03-17
The purpose of this study was to identify critical genes associated with septic multiple trauma by comparing peripheral whole blood samples from multiple trauma patients with and without sepsis. A microarray data set was downloaded from the Gene Expression Omnibus (GEO) database. This data set included 70 samples, 36 from multiple trauma patients with sepsis and 34 from multiple trauma patients without sepsis (as a control set). The data were preprocessed, and differentially expressed genes (DEGs) were then screened for using packages of the R language. Functional analysis of DEGs was performed with DAVID. Interaction networks were then established for the most up- and down-regulated genes using HitPredict. Pathway-enrichment analysis was conducted for genes in the networks using WebGestalt. Fifty-eight DEGs were identified. The expression levels of PLAU (down-regulated) and MMP8 (up-regulated) presented the largest fold-changes, and interaction networks were established for these genes. Further analysis revealed that PLAT (plasminogen activator, tissue) and SERPINF2 (serpin peptidase inhibitor, clade F, member 2), which interact with PLAU, play important roles in the pathway of the component and coagulation cascade. We hypothesize that PLAU is a major regulator of the component and coagulation cascade, and down-regulation of PLAU results in dysfunction of the pathway, causing sepsis.
Identification of a B cell signature associated with renal transplant tolerance in humans
Newell, Kenneth A.; Asare, Adam; Kirk, Allan D.; Gisler, Trang D.; Bourcier, Kasia; Suthanthiran, Manikkam; Burlingham, William J.; Marks, William H.; Sanz, Ignacio; Lechler, Robert I.; Hernandez-Fuentes, Maria P.; Turka, Laurence A.; Seyfert-Margolis, Vicki L.
2010-01-01
Establishing long-term allograft acceptance without the requirement for continuous immunosuppression, a condition known as allograft tolerance, is a highly desirable therapeutic goal in solid organ transplantation. Determining which recipients would benefit from withdrawal or minimization of immunosuppression would be greatly facilitated by biomarkers predictive of tolerance. In this study, we identified the largest reported cohort to our knowledge of tolerant renal transplant recipients, as defined by stable graft function and receiving no immunosuppression for more than 1 year, and compared their gene expression profiles and peripheral blood lymphocyte subsets with those of subjects with stable graft function who are receiving immunosuppressive drugs as well as healthy controls. In addition to being associated with clinical and phenotypic parameters, renal allograft tolerance was strongly associated with a B cell signature using several assays. Tolerant subjects showed increased expression of multiple B cell differentiation genes, and a set of just 3 of these genes distinguished tolerant from nontolerant recipients in a unique test set of samples. This B cell signature was associated with upregulation of CD20 mRNA in urine sediment cells and elevated numbers of peripheral blood naive and transitional B cells in tolerant participants compared with those receiving immunosuppression. These results point to a critical role for B cells in regulating alloimmunity and provide a candidate set of genes for wider-scale screening of renal transplant recipients. PMID:20501946
GeneRIF indexing: sentence selection based on machine learning.
Jimeno-Yepes, Antonio J; Sticco, J Caitlin; Mork, James G; Aronson, Alan R
2013-05-31
A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.
GOMA: functional enrichment analysis tool based on GO modules
Huang, Qiang; Wu, Ling-Yun; Wang, Yong; Zhang, Xiang-Sun
2013-01-01
Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results. PMID:23237213
Duan, Fenghai; Xu, Ye
2017-01-01
To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.
Environmental drivers of the distribution of nitrogen functional genes at a watershed scale.
Tsiknia, Myrto; Paranychianakis, Nikolaos V; Varouchakis, Emmanouil A; Nikolaidis, Nikolaos P
2015-06-01
To date only few studies have dealt with the biogeography of microbial communities at large spatial scales, despite the importance of such information to understand and simulate ecosystem functioning. Herein, we describe the biogeographic patterns of microorganisms involved in nitrogen (N)-cycling (diazotrophs, ammonia oxidizers, denitrifiers) as well as the environmental factors shaping these patterns across the Koiliaris Critical Zone Observatory, a typical Mediterranean watershed. Our findings revealed that a proportion of variance ranging from 40 to 80% of functional genes abundance could be explained by the environmental variables monitored, with pH, soil texture, total organic carbon and potential nitrification rate being identified as the most important drivers. The spatial autocorrelation of N-functional genes ranged from 0.2 to 6.2 km and prediction maps, generated by cokriging, revealed distinct patterns of functional genes. The inclusion of functional genes in statistical modeling substantially improved the proportion of variance explained by the models, a result possibly due to the strong relationships that were identified among microbial groups. Significant relationships were set between functional groups, which were further mediated by land use (natural versus agricultural lands). These relationships, in combination with the environmental variables, allow us to provide insights regarding the ecological preferences of N-functional groups and among them the recently identified clade II of nitrous oxide reducers. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Chang, Chia-Ming; Yang, Yi-Ping; Chuang, Jen-Hua; Chuang, Chi-Mu; Lin, Tzu-Wei; Wang, Peng-Hui; Yu, Mu-Hsien
2017-01-01
The clinical characteristics of clear cell carcinoma (CCC) and endometrioid carcinoma EC) are concomitant with endometriosis (ES), which leads to the postulation of malignant transformation of ES to endometriosis-associated ovarian carcinoma (EAOC). Different deregulated functional areas were proposed accounting for the pathogenesis of EAOC transformation, and there is still a lack of a data-driven analysis with the accumulated experimental data in publicly-available databases to incorporate the deregulated functions involved in the malignant transformation of EOAC. We used the microarray gene expression datasets of ES, CCC and EC downloaded from the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) database. Then, we investigated the pathogenesis of EAOC by a data-driven, function-based analytic model with the quantified molecular functions defined by 1454 Gene Ontology (GO) term gene sets. This model converts the gene expression profiles to the functionome consisting of 1454 quantified GO functions, and then, the key functions involving the malignant transformation of EOAC can be extracted by a series of filters. Our results demonstrate that the deregulated oxidoreductase activity, metabolism, hormone activity, inflammatory response, innate immune response and cell-cell signaling play the key roles in the malignant transformation of EAOC. These results provide the evidence supporting the specific molecular pathways involved in the malignant transformation of EAOC. PMID:29113136
RAMONA: a Web application for gene set analysis on multilevel omics data.
Sass, Steffen; Buettner, Florian; Mueller, Nikola S; Theis, Fabian J
2015-01-01
Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data, which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds. RAMONA is implemented as ASP.NET Web application and publicly available at http://icb.helmholtz-muenchen.de/ramona. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
GeneSigDB: a manually curated database and resource for analysis of gene expression signatures
Culhane, Aedín C.; Schröder, Markus S.; Sultana, Razvan; Picard, Shaita C.; Martinelli, Enzo N.; Kelly, Caroline; Haibe-Kains, Benjamin; Kapushesky, Misha; St Pierre, Anne-Alyssa; Flahive, William; Picard, Kermshlise C.; Gusenleitner, Daniel; Papenhausen, Gerald; O'Connor, Niall; Correll, Mick; Quackenbush, John
2012-01-01
GeneSigDB (http://www.genesigdb.org or http://compbio.dfci.harvard.edu/genesigdb/) is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a ‘basket’ feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from http://www.genesigdb.org. PMID:22110038
Marriott, Andrew S.; Vasieva, Olga; Fang, Yongxiang; Copeland, Nikki A.; McLennan, Alexander G.; Jones, Nigel J.
2016-01-01
Regulation of gene expression is one of several roles proposed for the stress-induced nucleotide diadenosine tetraphosphate (Ap4A). We have examined this directly by a comparative RNA-Seq analysis of KBM-7 chronic myelogenous leukemia cells and KBM-7 cells in which the NUDT2 Ap4A hydrolase gene had been disrupted (NuKO cells), causing a 175-fold increase in intracellular Ap4A. 6,288 differentially expressed genes were identified with P < 0.05. Of these, 980 were up-regulated and 705 down-regulated in NuKO cells with a fold-change ≥ 2. Ingenuity® Pathway Analysis (IPA®) was used to assign these genes to known canonical pathways and functional networks. Pathways associated with interferon responses, pattern recognition receptors and inflammation scored highly in the down-regulated set of genes while functions associated with MHC class II antigens were prominent among the up-regulated genes, which otherwise showed little organization into major functional gene sets. Tryptophan catabolism was also strongly down-regulated as were numerous genes known to be involved in tumor promotion in other systems, with roles in the epithelial-mesenchymal transition, proliferation, invasion and metastasis. Conversely, some pro-apoptotic genes were up-regulated. Major upstream factors predicted by IPA® for gene down-regulation included NFκB, STAT1/2, IRF3/4 and SP1 but no major factors controlling gene up-regulation were identified. Potential mechanisms for gene regulation mediated by Ap4A and/or NUDT2 disruption include binding of Ap4A to the HINT1 co-repressor, autocrine activation of purinoceptors by Ap4A, chromatin remodeling, effects of NUDT2 loss on transcript stability, and inhibition of ATP-dependent regulatory factors such as protein kinases by Ap4A. Existing evidence favors the last of these as the most probable mechanism. Regardless, our results suggest that the NUDT2 protein could be a novel cancer chemotherapeutic target, with its inhibition potentially exerting strong anti-tumor effects via multiple pathways involving metastasis, invasion, immunosuppression and apoptosis. PMID:27144453
Marriott, Andrew S; Vasieva, Olga; Fang, Yongxiang; Copeland, Nikki A; McLennan, Alexander G; Jones, Nigel J
2016-01-01
Regulation of gene expression is one of several roles proposed for the stress-induced nucleotide diadenosine tetraphosphate (Ap4A). We have examined this directly by a comparative RNA-Seq analysis of KBM-7 chronic myelogenous leukemia cells and KBM-7 cells in which the NUDT2 Ap4A hydrolase gene had been disrupted (NuKO cells), causing a 175-fold increase in intracellular Ap4A. 6,288 differentially expressed genes were identified with P < 0.05. Of these, 980 were up-regulated and 705 down-regulated in NuKO cells with a fold-change ≥ 2. Ingenuity® Pathway Analysis (IPA®) was used to assign these genes to known canonical pathways and functional networks. Pathways associated with interferon responses, pattern recognition receptors and inflammation scored highly in the down-regulated set of genes while functions associated with MHC class II antigens were prominent among the up-regulated genes, which otherwise showed little organization into major functional gene sets. Tryptophan catabolism was also strongly down-regulated as were numerous genes known to be involved in tumor promotion in other systems, with roles in the epithelial-mesenchymal transition, proliferation, invasion and metastasis. Conversely, some pro-apoptotic genes were up-regulated. Major upstream factors predicted by IPA® for gene down-regulation included NFκB, STAT1/2, IRF3/4 and SP1 but no major factors controlling gene up-regulation were identified. Potential mechanisms for gene regulation mediated by Ap4A and/or NUDT2 disruption include binding of Ap4A to the HINT1 co-repressor, autocrine activation of purinoceptors by Ap4A, chromatin remodeling, effects of NUDT2 loss on transcript stability, and inhibition of ATP-dependent regulatory factors such as protein kinases by Ap4A. Existing evidence favors the last of these as the most probable mechanism. Regardless, our results suggest that the NUDT2 protein could be a novel cancer chemotherapeutic target, with its inhibition potentially exerting strong anti-tumor effects via multiple pathways involving metastasis, invasion, immunosuppression and apoptosis.
Integrative and conjugative elements and their hosts: composition, distribution and organization.
Cury, Jean; Touchon, Marie; Rocha, Eduardo P C
2017-09-06
Conjugation of single-stranded DNA drives horizontal gene transfer between bacteria and was widely studied in conjugative plasmids. The organization and function of integrative and conjugative elements (ICE), even if they are more abundant, was only studied in a few model systems. Comparative genomics of ICE has been precluded by the difficulty in finding and delimiting these elements. Here, we present the results of a method that circumvents these problems by requiring only the identification of the conjugation genes and the species' pan-genome. We delimited 200 ICEs and this allowed the first large-scale characterization of these elements. We quantified the presence in ICEs of a wide set of functions associated with the biology of mobile genetic elements, including some that are typically associated with plasmids, such as partition and replication. Protein sequence similarity networks and phylogenetic analyses revealed that ICEs are structured in functional modules. Integrases and conjugation systems have different evolutionary histories, even if the gene repertoires of ICEs can be grouped in function of conjugation types. Our characterization of the composition and organization of ICEs paves the way for future functional and evolutionary analyses of their cargo genes, composed of a majority of unknown function genes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Reilly, Matthew T.; Harris, R. Adron; Noronha, Antonio
2012-01-01
Over the last 50 years, researchers have made substantial progress in identifying genetic variations that underlie the complex phenotype of alcoholism. Not much is known, however, about how this genetic variation translates into altered biological function. Genetic animal models recapitulating specific characteristics of the human condition have helped elucidate gene function and the genetic basis of disease. In particular, major advances have come from the ability to manipulate genes through a variety of genetic technologies that provide an unprecedented capacity to determine gene function in the living organism and in alcohol-related behaviors. Even newer genetic-engineering technologies have given researchers the ability to control when and where a specific gene or mutation is activated or deleted, allowing investigators to narrow the role of the gene’s function to circumscribed neural pathways and across development. These technologies are important for all areas of neuroscience, and several public and private initiatives are making a new generation of genetic-engineering tools available to the scientific community at large. Finally, high-throughput “next-generation sequencing” technologies are set to rapidly increase knowledge of the genome, epigenome, and transcriptome, which, combined with genetically engineered mouse mutants, will enhance insight into biological function. All of these resources will provide deeper insight into the genetic basis of alcoholism. PMID:23134044
Microarray analysis reveals key genes and pathways in Tetralogy of Fallot
He, Yue-E; Qiu, Hui-Xian; Jiang, Jian-Bing; Wu, Rong-Zhou; Xiang, Ru-Lian; Zhang, Yuan-Hai
2017-01-01
The aim of the present study was to identify key genes that may be involved in the pathogenesis of Tetralogy of Fallot (TOF) using bioinformatics methods. The GSE26125 microarray dataset, which includes cardiovascular tissue samples derived from 16 children with TOF and five healthy age-matched control infants, was downloaded from the Gene Expression Omnibus database. Differential expression analysis was performed between TOF and control samples to identify differentially expressed genes (DEGs) using Student's t-test, and the R/limma package, with a log2 fold-change of >2 and a false discovery rate of <0.01 set as thresholds. The biological functions of DEGs were analyzed using the ToppGene database. The ReactomeFIViz application was used to construct functional interaction (FI) networks, and the genes in each module were subjected to pathway enrichment analysis. The iRegulon plugin was used to identify transcription factors predicted to regulate the DEGs in the FI network, and the gene-transcription factor pairs were then visualized using Cytoscape software. A total of 878 DEGs were identified, including 848 upregulated genes and 30 downregulated genes. The gene FI network contained seven function modules, which were all comprised of upregulated genes. Genes enriched in Module 1 were enriched in the following three neurological disorder-associated signaling pathways: Parkinson's disease, Alzheimer's disease and Huntington's disease. Genes in Modules 0, 3 and 5 were dominantly enriched in pathways associated with ribosomes and protein translation. The Xbox binding protein 1 transcription factor was demonstrated to be involved in the regulation of genes encoding the subunits of cytoplasmic and mitochondrial ribosomes, as well as genes involved in neurodegenerative disorders. Therefore, dysfunction of genes involved in signaling pathways associated with neurodegenerative disorders, ribosome function and protein translation may contribute to the pathogenesis of TOF. PMID:28713939
Fully moderated T-statistic for small sample size gene expression arrays.
Yu, Lianbo; Gulati, Parul; Fernandez, Soledad; Pennell, Michael; Kirschner, Lawrence; Jarjoura, David
2011-09-15
Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.
Wang, W; Huang, S; Hou, W; Liu, Y; Fan, Q; He, A; Wen, Y; Hao, J; Guo, X; Zhang, F
2017-10-01
Several genome-wide association studies (GWAS) of bone mineral density (BMD) have successfully identified multiple susceptibility genes, yet isolated susceptibility genes are often difficult to interpret biologically. The aim of this study was to unravel the genetic background of BMD at pathway level, by integrating BMD GWAS data with genome-wide expression quantitative trait loci (eQTLs) and methylation quantitative trait loci (meQTLs) data METHOD: We employed the GWAS datasets of BMD from the Genetic Factors for Osteoporosis Consortium (GEFOS), analysing patients' BMD. The areas studied included 32 735 femoral necks, 28 498 lumbar spines, and 8143 forearms. Genome-wide eQTLs (containing 923 021 eQTLs) and meQTLs (containing 683 152 unique methylation sites with local meQTLs) data sets were collected from recently published studies. Gene scores were first calculated by summary data-based Mendelian randomisation (SMR) software and meQTL-aligned GWAS results. Gene set enrichment analysis (GSEA) was then applied to identify BMD-associated gene sets with a predefined significance level of 0.05. We identified multiple gene sets associated with BMD in one or more regions, including relevant known biological gene sets such as the Reactome Circadian Clock (GSEA p-value = 1.0 × 10 -4 for LS and 2.7 × 10 -2 for femoral necks BMD in eQTLs-based GSEA) and insulin-like growth factor receptor binding (GSEA p-value = 5.0 × 10 -4 for femoral necks and 2.6 × 10 -2 for lumbar spines BMD in meQTLs-based GSEA). Our results provided novel clues for subsequent functional analysis of bone metabolism, and illustrated the benefit of integrating eQTLs and meQTLs data into pathway association analysis for genetic studies of complex human diseases. Cite this article : W. Wang, S. Huang, W. Hou, Y. Liu, Q. Fan, A. He, Y. Wen, J. Hao, X. Guo, F. Zhang. Integrative analysis of GWAS, eQTLs and meQTLs data suggests that multiple gene sets are associated with bone mineral density. Bone Joint Res 2017;6:572-576. © 2017 Wang et al.
Patterns of population differentiation of candidate genes for cardiovascular disease.
Kullo, Iftikhar J; Ding, Keyue
2007-07-12
The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. We calculated FST for 15,559 common SNPs (minor allele frequency > or = 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. These results suggest a possible basis for varying susceptibility to CVD among ethnic groups.
Meslin, Camille; Plakke, Melissa S.; Deutsch, Aaron B.; Small, Brandon S.; Morehouse, Nathan I.; Clark, Nathan L.
2015-01-01
Persistent adaptive challenges are often met with the evolution of novel physiological traits. Although there are specific examples of single genes providing new physiological functions, studies on the origin of complex organ functions are lacking. One such derived set of complex functions is found in the Lepidopteran bursa copulatrix, an organ within the female reproductive tract that digests nutrients from the male ejaculate or spermatophore. Here, we characterized bursa physiology and the evolutionary mechanisms by which it was equipped with digestive and absorptive functionality. By studying the transcriptome of the bursa and eight other tissues, we revealed a suite of highly expressed and secreted gene products providing the bursa with a combination of stomach-like traits for mechanical and enzymatic digestion of the male spermatophore. By subsequently placing these bursa genes in an evolutionary framework, we found that the vast majority of their novel digestive functions were co-opted by borrowing genes that continue to be expressed in nonreproductive tissues. However, a number of bursa-specific genes have also arisen, some of which represent unique gene families restricted to Lepidoptera and may provide novel bursa-specific functions. This pattern of promiscuous gene borrowing and relatively infrequent evolution of tissue-specific duplicates stands in contrast to studies of the evolution of novelty via single gene co-option. Our results suggest that the evolution of complex organ-level phenotypes may often be enabled (and subsequently constrained) by changes in tissue specificity that allow expression of existing genes in novel contexts, such as reproduction. The extent to which the selective pressures encountered in these novel roles require resolution via duplication and sub/neofunctionalization is likely to be determined by the need for specialized reproductive functionality. Thus, complex physiological phenotypes such as that found in the bursa offer important opportunities for understanding the relative role of pleiotropy and specialization in adaptive evolution. PMID:25725432
Lv, Yufeng; Wei, Wenhao; Huang, Zhong; Chen, Zhichao; Fang, Yuan; Pan, Lili; Han, Xueqiong; Xu, Zihai
2018-06-20
The aim of this study was to develop a novel long non-coding RNA (lncRNA) expression signature to accurately predict early recurrence for patients with hepatocellular carcinoma (HCC) after curative resection. Using expression profiles downloaded from The Cancer Genome Atlas database, we identified multiple lncRNAs with differential expression between early recurrence (ER) group and non-early recurrence (non-ER) group of HCC. Least absolute shrinkage and selection operator (LASSO) for logistic regression models were used to develop a lncRNA-based classifier for predicting ER in the training set. An independent test set was used to validated the predictive value of this classifier. Futhermore, a co-expression network based on these lncRNAs and its highly related genes was constructed and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of genes in the network were performed. We identified 10 differentially expressed lncRNAs, including 3 that were upregulated and 7 that were downregulated in ER group. The lncRNA-based classifier was constructed based on 7 lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861 and LINC02084), and its accuracy was 0.83 in training set, 0.87 in test set and 0.84 in total set. And ROC curve analysis showed the AUROC was 0.741 in training set, 0.824 in the test set and 0.765 in total set. A functional enrichment analysis suggested that the genes of which is highly related to 4 lncRNAs were involved in immune system. This 7-lncRNA expression profile can effectively predict the early recurrence after surgical resection for HCC. This article is protected by copyright. All rights reserved.
Bessonov, Kyrylo; Walkey, Christopher J.; Shelp, Barry J.; van Vuuren, Hennie J. J.; Chiu, David; van der Merwe, George
2013-01-01
Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples. PMID:24130853
A Comprehensive Analysis of Nuclear-Encoded Mitochondrial Genes in Schizophrenia.
Gonçalves, Vanessa F; Cappi, Carolina; Hagen, Christian M; Sequeira, Adolfo; Vawter, Marquis P; Derkach, Andriy; Zai, Clement C; Hedley, Paula L; Bybjerg-Grauholm, Jonas; Pouget, Jennie G; Cuperfain, Ari B; Sullivan, Patrick F; Christiansen, Michael; Kennedy, James L; Sun, Lei
2018-05-01
The genetic risk factors of schizophrenia (SCZ), a severe psychiatric disorder, are not yet fully understood. Multiple lines of evidence suggest that mitochondrial dysfunction may play a role in SCZ, but comprehensive association studies are lacking. We hypothesized that variants in nuclear-encoded mitochondrial genes influence susceptibility to SCZ. We conducted gene-based and gene-set analyses using summary association results from the Psychiatric Genomics Consortium Schizophrenia Phase 2 (PGC-SCZ2) genome-wide association study comprising 35,476 cases and 46,839 control subjects. We applied the MAGMA method to three sets of nuclear-encoded mitochondrial genes: oxidative phosphorylation genes, other nuclear-encoded mitochondrial genes, and genes involved in nucleus-mitochondria crosstalk. Furthermore, we conducted a replication study using the iPSYCH SCZ sample of 2290 cases and 21,621 control subjects. In the PGC-SCZ2 sample, 1186 mitochondrial genes were analyzed, among which 159 had p values < .05 and 19 remained significant after multiple testing correction. A meta-analysis of 818 genes combining the PGC-SCZ2 and iPSYCH samples resulted in 104 nominally significant and nine significant genes, suggesting a polygenic model for the nuclear-encoded mitochondrial genes. Gene-set analysis, however, did not show significant results. In an in silico protein-protein interaction network analysis, 14 mitochondrial genes interacted directly with 158 SCZ risk genes identified in PGC-SCZ2 (permutation p = .02), and aldosterone signaling in epithelial cells and mitochondrial dysfunction pathways appeared to be overrepresented in this network of mitochondrial and SCZ risk genes. This study provides evidence that specific aspects of mitochondrial function may play a role in SCZ, but we did not observe its broad involvement even using a large sample. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit
2016-03-01
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.
Bockamp, Ernesto; Sprengel, Rolf; Eshkind, Leonid; Lehmann, Thomas; Braun, Jan M; Emmrich, Frank; Hengstler, Jan G
2008-03-01
Many mouse models are currently available, providing avenues to elucidate gene function and to recapitulate specific pathological conditions. To a large extent, successful translation of clinical evidence or analytical data into appropriate mouse models is possible through progress in transgenic or gene-targeting technology. Beginning with a review of standard mouse transgenics and conventional gene targeting, this article will move on to discussing the basics of conditional gene expression: the tetracycline (tet)-off and tet-on systems based on the transactivators tet-controlled transactivator (Tta) and reverse tet-on transactivator (rtTA) that allow downregulation or induction of gene expression; Cre or Flp recombinase-mediated modifications, including excision, inversion, insertion and interchromosomal translocation; combination of the tet and Cre systems, permitting inducible knockout, reporter gene activation or activation of point mutations; the avian retroviral system based on delivery of rtTA specifically into cells expressing the avian retroviral receptor, which enables cell type-specific, inducible gene expression; the tamoxifen system, one of the most frequently applied steroid receptor-based systems, allows rapid activation of a fusion protein between the gene of interest and a mutant domain of the estrogen receptor, whereby activation does not depend on transcription; and techniques for cell type-specific ablation. The diphtheria toxin receptor system offers the advantage that it can be combined with the 'zoo' of Cre recombinase driver mice. Having described the basics we move on to the cutting edge: generation of genome-wide sets of conditional knockout mice. To this end, large ongoing projects apply two strategies: gene trapping based on random integration of trapping vectors into introns leading to truncation of the transcript, and gene targeting, representing the directed approach using homologous recombination. It can be expected that in the near future genome-wide sets of such mice will be available. Finally, the possibilities of conditional expression systems for investigating gene function in tissue regeneration will be illustrated by examples for neurodegenerative disease, liver regeneration and wound healing of the skin.
Guo, Nancy L; Wan, Ying-Wooi; Denvir, James; Porter, Dale W; Pacurari, Maricica; Wolfarth, Michael G; Castranova, Vincent; Qian, Yong
2012-01-01
Concerns over the potential for multi-walled carbon nanotubes (MWCNT) to induce lung carcinogenesis have emerged. This study sought to (1) identify gene expression signatures in the mouse lungs following pharyngeal aspiration of well-dispersed MWCNT and (2) determine if these genes were associated with human lung cancer risk and progression. Genome-wide mRNA expression profiles were analyzed in mouse lungs (n=160) exposed to 0, 10, 20, 40, or 80 µg of MWCNT by pharyngeal aspiration at 1, 7, 28, and 56 days post-exposure. By using pairwise-Statistical Analysis of Microarray (SAM) and linear modeling, 24 genes were selected, which have significant changes in at least two time points, have a more than 1.5 fold change at all doses, and are significant in the linear model for the dose or the interaction of time and dose. Additionally, a 38-gene set was identified as related to cancer from 330 genes differentially expressed at day 56 post-exposure in functional pathway analysis. Using the expression profiles of the cancer-related gene set in 8 mice at day 56 post-exposure to 10 µg of MWCNT, a nearest centroid classification accurately predicts human lung cancer survival with a significant hazard ratio in training set (n=256) and test set (n=186). Furthermore, both gene signatures were associated with human lung cancer risk (n=164) with significant odds ratios. These results may lead to development of a surveillance approach for early detection of lung cancer and prognosis associated with MWCNT in the workplace. PMID:22891886
A set of GFP organelle marker lines for intracellular localization studies in Medicago truncatula
USDA-ARS?s Scientific Manuscript database
Genomics advances in the model legume Medicago truncatula have led to an increase in the number of identified genes encoding proteins with unknown biological function. Determining the intracellular location of uncharacterized proteins often aids in the elucidation of biological function. To expedite...
RAS oncogene-mediated deregulation of the transcriptome: from molecular signature to function.
Schäfer, Reinhold; Sers, Christine
2011-01-01
Transcriptome analysis of cancer cells has developed into a standard procedure to elucidate multiple features of the malignant process and to link gene expression to clinical properties. Gene expression profiling based on microarrays provides essentially correlative information and needs to be transferred to the functional level in order to understand the activity and contribution of individual genes or sets of genes as elements of the gene signature. To date, there exist significant gaps in the functional understanding of gene expression profiles. Moreover, the processes that drive the profound transcriptional alterations that characterize cancer cells remain mainly elusive. We have used pathway-restricted gene expression profiles derived from RAS oncogene-transformed cells and from RAS-expressing cancer cells to identify regulators downstream of the MAPK pathway.We describe the role of epigenetic regulation exemplified by the control of several immune genes in generic cell lines and colorectal cancer cells, particularly the functional interaction between signaling and DNA methylation. Moreover, we assess the role of the architectural transcription factor high mobility AT-hook 2 (HMGA2) as a regulator of the RAS-responsive transcriptome in ovarian epithelial cells. Finally, we describe an integrated approach combining pathway interference in colorectal cancer cells, gene expression profiling and computational analysis of regulatory elements of deregulated target genes. This strategy resulted in the identification of Y-box binding protein 1 (YBX1) as a regulator of MAPK-dependent proliferation and gene expression. The implications for a therapeutic application of HMGA2 gene silencing and the role of YBX1 as a prognostic factor are discussed.
Interactions in the microbiome: communities of organisms and communities of genes
Boon, Eva; Meehan, Conor J; Whidden, Chris; Wong, Dennis H-J; Langille, Morgan GI; Beiko, Robert G
2014-01-01
A central challenge in microbial community ecology is the delineation of appropriate units of biodiversity, which can be taxonomic, phylogenetic, or functional in nature. The term ‘community’ is applied ambiguously; in some cases, the term refers simply to a set of observed entities, while in other cases, it requires that these entities interact with one another. Microorganisms can rapidly gain and lose genes, potentially decoupling community roles from taxonomic and phylogenetic groupings. Trait-based approaches offer a useful alternative, but many traits can be defined based on gene functions, metabolic modules, and genomic properties, and the optimal set of traits to choose is often not obvious. An analysis that considers taxon assignment and traits in concert may be ideal, with the strengths of each approach offsetting the weaknesses of the other. Individual genes also merit consideration as entities in an ecological analysis, with characteristics such as diversity, turnover, and interactions modeled using genes rather than organisms as entities. We identify some promising avenues of research that are likely to yield a deeper understanding of microbial communities that shift from observation-based questions of ‘Who is there?’ and ‘What are they doing?’ to the mechanistically driven question of ‘How will they respond?’ PMID:23909933
Strotbek, Christoph; Krinninger, Stefan; Frank, Wolfgang
2013-01-01
To comprehensively understand the major processes in plant biology, it is necessary to study a diverse set of species that represent the complexity of plants. This research will help to comprehend common conserved mechanisms and principles, as well as to elucidate those mechanisms that are specific to a particular plant clade. Thereby, we will gain knowledge about the invention and loss of mechanisms and their biological impact causing the distinct specifications throughout the plant kingdom. Since the establishment of transgenic plants, these studies concentrate on the elucidation of gene functions applying an increasing repertoire of molecular techniques. In the last two decades, the moss Physcomitrella patens joined the established set of plant models based on its evolutionary position bridging unicellular algae and vascular plants and a number of specific features alleviating gene function analysis. Here, we want to provide an overview of the specific features of P. patens making it an interesting model for many research fields in plant biology, to present the major achievements in P. patens genetic engineering, and to introduce common techniques to scientists who intend to use P. patens as a model in their research activities.
Functional genomics of the evolution of increased resistance to parasitism in Drosophila.
Wertheim, Bregje; Kraaijeveld, Alex R; Hopkins, Meirion G; Walther Boer, Mark; Godfray, H Charles J
2011-03-01
Individual hosts normally respond to parasite attack by launching an acute immune response (a phenotypic plastic response), while host populations can respond in the longer term by evolving higher level of defence against parasites. Little is known about the genetics of the evolved response: the identity and number of genes involved and whether it involves a pre-activation of the regulatory systems governing the plastic response. We explored these questions by surveying transcriptional changes in a Drosophila melanogaster strain artificially selected for resistance against the hymenopteran endoparasitoid Asobara tabida. Using micro-arrays, we profiled gene expression at seven time points during development (from the egg to the second instar larva) and found a large number of genes (almost 900) with altered expression levels. Bioinformatic analysis showed that some were involved in immunity or defence-associated functions but many were not. Previously, we had defined a set of genes whose level of expression changed after parasitoid attack and a comparison with the present set showed a significant though comparatively small overlap. This suggests that the evolutionary response to parasitism is not a simple pre-activation of the plastic, acute response. We also found overlap in the genes involved in the evolutionary response to parasitism and to other biotic and abiotic stressors, perhaps suggesting a 'module' of genes involved in a generalized stress response as has been found in other organisms. © 2010 Blackwell Publishing Ltd.
Li, Meng; Ford, Tim; Li, Xiaoyan; Gu, Ji-Dong
2011-04-15
A newly designed primer set (AnnirS), together with a previously published primer set (ScnirS), was used to detect anammox bacterial nirS genes from sediments collected from three marine environments. Phylogenetic analysis demonstrated that all retrieved sequences were clearly different from typical denitrifiers' nirS, but do group together with the known anammox bacterial nirS. Sequences targeted by ScnirS are closely related to Scalindua nirS genes recovered from the Peruvian oxygen minimum zone (OMZ), whereas sequences targeted by AnnirS are more closely affiliated with the nirS of Candidatus 'Kuenenia stuttgartiensis' and even form a new phylogenetic nirS clade, which might be related to other genera of the anammox bacteria. Analysis demonstrated that retrieved sequences had higher sequence identities (>60%) with known anammox bacterial nirS genes than with denitrifiers' nirS, on both nucleotide and amino acid levels. Compared to the 16S rRNA and hydrazine oxidoreductase (hzo) genes, the anammox bacterial nirS not only showed consistent phylogenetic relationships but also demonstrated more reliable quantification of anammox bacteria because of the single copy of the nirS gene in the anammox bacterial genome and the specificity of PCR primers for different genera of anammox bacteria, thus providing a suitable functional biomarker for investigation of anammox bacteria.
Genome-wide analysis of YY2 versus YY1 target genes
Chen, Li; Shioda, Toshi; Coser, Kathryn R.; Lynch, Mary C.; Yang, Chuanwei; Schmidt, Emmett V.
2010-01-01
Yin Yang 1 (YY1) is a critical transcription factor controlling cell proliferation, development and DNA damage responses. Retrotranspositions have independently generated additional YY family members in multiple species. Although Drosophila YY1 [pleiohomeotic (Pho)] and its homolog [pleiohomeotic-like (Phol)] redundantly control homeotic gene expression, the regulatory contributions of YY1-homologs have not yet been examined in other species. Indeed, targets for the mammalian YY1 homolog YY2 are completely unknown. Using gene set enrichment analysis, we found that lentiviral constructs containing short hairpin loop inhibitory RNAs for human YY1 (shYY1) and its homolog YY2 (shYY2) caused significant changes in both shared and distinguishable gene sets in human cells. Ribosomal protein genes were the most significant gene set upregulated by both shYY1 and shYY2, although combined shYY1/2 knock downs were not additive. In contrast, shYY2 reversed the anti-proliferative effects of shYY1, and shYY2 particularly altered UV damage response, platelet-specific and mitochondrial function genes. We found that decreases in YY1 or YY2 caused inverse changes in UV sensitivity, and that their combined loss reversed their respective individual effects. Our studies show that human YY2 is not redundant to YY1, and YY2 is a significant regulator of genes previously identified as uniquely responding to YY1. PMID:20215434
A Systematic Analysis of Candidate Genes Associated with Nicotine Addiction
Liu, Meng; Li, Xia; Fan, Rui; Liu, Xinhua; Wang, Ju
2015-01-01
Nicotine, as the major psychoactive component of tobacco, has broad physiological effects within the central nervous system, but our understanding of the molecular mechanism underlying its neuronal effects remains incomplete. In this study, we performed a systematic analysis on a set of nicotine addiction-related genes to explore their characteristics at network levels. We found that NAGenes tended to have a more moderate degree and weaker clustering coefficient and to be less central in the network compared to alcohol addiction-related genes or cancer genes. Further, clustering of these genes resulted in six clusters with themes in synaptic transmission, signal transduction, metabolic process, and apoptosis, which provided an intuitional view on the major molecular functions of the genes. Moreover, functional enrichment analysis revealed that neurodevelopment, neurotransmission activity, and metabolism related biological processes were involved in nicotine addiction. In summary, by analyzing the overall characteristics of the nicotine addiction related genes, this study provided valuable information for understanding the molecular mechanisms underlying nicotine addiction. PMID:26097843
Tian, Honglai; Guan, Donghui; Li, Jianmin
2018-06-01
Osteosarcoma (OS), the most common malignant bone tumor, accounts for the heavy healthy threat in the period of children and adolescents. OS occurrence usually correlates with early metastasis and high death rate. This study aimed to better understand the mechanism of OS metastasis.Based on Gene Expression Omnibus (GEO) database, we downloaded 4 expression profile data sets associated with OS metastasis, and selected differential expressed genes. Weighted gene co-expression network analysis (WGCNA) approach allowed us to investigate the most OS metastasis-correlated module. Gene Ontology functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to give annotation of selected OS metastasis-associated genes.We select 897 differential expressed genes from OS metastasis and OS non-metastasis groups. Based on these selected genes, WGCNA further explored 142 genes included in the most OS metastasis-correlated module. Gene Ontology functional and KEGG pathway enrichment analyses showed that significantly OS metastasis-associated genes were involved in pathway correlated with insulin-like growth factor binding.Our research figured out several potential molecules participating in metastasis process and factors acting as biomarker. With this study, we could better explore the mechanism of OS metastasis and further discover more therapy targets.
He, Peng; Zhang, Yun-Fei; Hong, Duan-Yang; Wang, Jun; Wang, Xing-Liang; Zuo, Ling-Hua; Tang, Xian-Fu; Xu, Wei-Ming; He, Ming
2017-03-01
Female moths synthesize species-specific sex pheromone components and release them to attract male moths, which depend on precise sex pheromone chemosensory system to locate females. Two types of genes involved in the sex pheromone biosynthesis and degradation pathways play essential roles in this important moth behavior. To understand the function of genes in the sex pheromone pathway, this study investigated the genome-wide and digital gene expression of sex pheromone biosynthesis and degradation genes in various adult tissues in the diamondback moth (DBM), Plutella xylostella, which is a notorious vegetable pest worldwide. A massive transcriptome data (at least 39.04 Gb) was generated by sequencing 6 adult tissues including male antennae, female antennae, heads, legs, abdomen and female pheromone glands from DBM by using Illumina 4000 next-generation sequencing and mapping to a published DBM genome. Bioinformatics analysis yielded a total of 89,332 unigenes among which 87 transcripts were putatively related to seven gene families in the sex pheromone biosynthesis pathway. Among these, seven [two desaturases (DES), three fatty acyl-CoA reductases (FAR) one acetyltransferase (ACT) and one alcohol dehydrogenase (AD)] were mainly expressed in the pheromone glands with likely function in the three essential sex pheromone biosynthesis steps: desaturation, reduction, and esterification. We also identified 210 odorant-degradation related genes (including sex pheromone-degradation related genes) from seven major enzyme groups. Among these genes, 100 genes are new identified and two aldehyde oxidases (AOXs), one aldehyde dehydrogenase (ALDH), five carboxyl/cholinesterases (CCEs), five UDP-glycosyltransferases (UGTs), eight cytochrome P450 (CYP) and three glutathione S-transferases (GSTs) displayed more robust expression in the antennae, and thus are proposed to participate in the degradation of sex pheromone components and plant volatiles. To date, this is the most comprehensive gene data set of sex pheromone biosynthesis and degradation enzyme related genes in DBM created by genome- and transcriptome-wide identification, characterization and expression profiling. Our findings provide a basis to better understand the function of genes with tissue enriched expression. The results also provide information on the genes involved in sex pheromone biosynthesis and degradation, and may be useful to identify potential gene targets for pest control strategies by disrupting the insect-insect communication using pheromone-based behavioral antagonists.
Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander
2015-01-01
High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305
A Mitocentric View of Parkinson’s Disease
Haelterman, Nele A.; Yoon, Wan Hee; Sandoval, Hector; Jaiswal, Manish; Shulman, Joshua M.; Bellen, Hugo J.
2015-01-01
Parkinson’s disease (PD) is a common neurodegenerative disease, yet the underlying causative molecular mechanisms are ill defined. Numerous observations based on drug studies and mutations in genes that cause PD point to a complex set of rather subtle mitochondrial defects that may be causative. Indeed, intensive investigation of these genes in model organisms has revealed roles in the electron transport chain, mitochondrial protein homeostasis, mitophagy, and the fusion and fission of mitochondria. Here, we attempt to synthesize results from experimental studies in diverse systems to define the precise function of these PD genes, as well as their interplay with other genes that affect mitochondrial function. We propose that subtle mitochondrial defects in combination with other insults trigger the onset and progression of disease, in both familial and idiopathic PD. PMID:24821430
Genetic identification of brain cell types underlying schizophrenia.
Skene, Nathan G; Bryois, Julien; Bakken, Trygve E; Breen, Gerome; Crowley, James J; Gaspar, Héléna A; Giusti-Rodriguez, Paola; Hodge, Rebecca D; Miller, Jeremy A; Muñoz-Manchado, Ana B; O'Donovan, Michael C; Owen, Michael J; Pardiñas, Antonio F; Ryge, Jesper; Walters, James T R; Linnarsson, Sten; Lein, Ed S; Sullivan, Patrick F; Hjerling-Leffler, Jens
2018-06-01
With few exceptions, the marked advances in knowledge about the genetic basis of schizophrenia have not converged on findings that can be confidently used for precise experimental modeling. By applying knowledge of the cellular taxonomy of the brain from single-cell RNA sequencing, we evaluated whether the genomic loci implicated in schizophrenia map onto specific brain cell types. We found that the common-variant genomic results consistently mapped to pyramidal cells, medium spiny neurons (MSNs) and certain interneurons, but far less consistently to embryonic, progenitor or glial cells. These enrichments were due to sets of genes that were specifically expressed in each of these cell types. We also found that many of the diverse gene sets previously associated with schizophrenia (genes involved in synaptic function, those encoding mRNAs that interact with FMRP, antipsychotic targets, etc.) generally implicated the same brain cell types. Our results suggest a parsimonious explanation: the common-variant genetic results for schizophrenia point at a limited set of neurons, and the gene sets point to the same cells. The genetic risk associated with MSNs did not overlap with that of glutamatergic pyramidal cells and interneurons, suggesting that different cell types have biologically distinct roles in schizophrenia.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Calenda, Giulia; Tong, Yuehong; Tar, Moses; Lowe, Daniel; Siragusa, Joseph; Melman, Arnold; Davies, Kelvin P.
2010-01-01
Purpose We identified molecular markers of erectile function, particularly those responding to erectile dysfunction treatment. Materials and Methods Sprague-Dawley retired breeder rats were intracorporeally injected with pVAX-hSlo, pSMAA-hSlo or the control plasmid pVAX. One week later the intracorporeal pressure-to-blood pressure ratio and gene expression were determined by microarray analysis and quantitative reverse transcriptase-polymerase chain reaction. Rat corporeal cells were transfected in vitro with pVAX-hSlo, pSMAA-hSlo or pVAX and the change in gene expression was determined. We also determined whether Vcsa1 expression was changed after pharmacotherapy using tadalafil. Results Animals treated with vectors expressing hSlo had significantly improved erectile function compared to that in controls, accompanied by changed expression of a subset of genes. Vcsa1 was one of the genes that was most changed in expression (the third of approximately 31,000 with greater than 10-fold up-regulation). Changes in gene expression were different than those observed in corporeal cells transfected in vitro, distinguishing gene expression changes that were a direct effect of hSlo over expression. When tadalafil was administered in retired breeder rats, the Vcsa1 transcript increased 4-fold in corporeal tissue compared to that in untreated controls. Conclusions Our study identifies a set of genes that are changed in response to improved erectile function, rather than as a direct effect of treatment. We noted Vcsa1 may act as marker of the restoration of erectile function after gene transfer and pharmacotherapy. PMID:19375734
Calenda, Giulia; Tong, Yuehong; Tar, Moses; Lowe, Daniel; Siragusa, Joseph; Melman, Arnold; Davies, Kelvin P
2009-06-01
We identified molecular markers of erectile function, particularly those responding to erectile dysfunction treatment. Sprague-Dawley retired breeder rats were intracorporeally injected with pVAX-hSlo, pSMAA-hSlo or the control plasmid pVAX. One week later the intracorporeal pressure-to-blood pressure ratio and gene expression were determined by microarray analysis and quantitative reverse transcriptase-polymerase chain reaction. Rat corporeal cells were transfected in vitro with pVAX-hSlo, pSMAA-hSlo or pVAX and the change in gene expression was determined. We also determined whether Vcsa1 expression was changed after pharmacotherapy using tadalafil. Animals treated with vectors expressing hSlo had significantly improved erectile function compared to that in controls, accompanied by changed expression of a subset of genes. Vcsa1 was one of the genes that was most changed in expression (the third of approximately 31,000 with greater than 10-fold up-regulation). Changes in gene expression were different than those observed in corporeal cells transfected in vitro, distinguishing gene expression changes that were a direct effect of hSlo over expression. When tadalafil was administered in retired breeder rats, the Vcsa1 transcript increased 4-fold in corporeal tissue compared to that in untreated controls. Our study identifies a set of genes that are changed in response to improved erectile function, rather than as a direct effect of treatment. We noted Vcsa1 may act as marker of the restoration of erectile function after gene transfer and pharmacotherapy.
Computational annotation of genes differentially expressed along olive fruit development
Galla, Giulio; Barcaccia, Gianni; Ramina, Angelo; Collani, Silvio; Alagna, Fiammetta; Baldoni, Luciana; Cultrera, Nicolò GM; Martinelli, Federico; Sebastiani, Luca; Tonutti, Pietro
2009-01-01
Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1), completed pit hardening (stage 2) and veraison (stage 3)] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B), and 2 and 3 (libraries C and D). All sequenced clones (1,132 in total) were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO): cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening. PMID:19852839
Veturi, Yogasudha; Ritchie, Marylyn D
2018-01-01
Transcriptome-wide association studies (TWAS) have recently been employed as an approach that can draw upon the advantages of genome-wide association studies (GWAS) and gene expression studies to identify genes associated with complex traits. Unlike standard GWAS, summary level data suffices for TWAS and offers improved statistical power. Two popular TWAS methods include either (a) imputing the cis genetic component of gene expression from smaller sized studies (using multi-SNP prediction or MP) into much larger effective sample sizes afforded by GWAS - TWAS-MP or (b) using summary-based Mendelian randomization - TWAS-SMR. Although these methods have been effective at detecting functional variants, it remains unclear how extensive variability in the genetic architecture of complex traits and diseases impacts TWAS results. Our goal was to investigate the different scenarios under which these methods yielded enough power to detect significant expression-trait associations. In this study, we conducted extensive simulations based on 6000 randomly chosen, unrelated Caucasian males from Geisinger's MyCode population to compare the power to detect cis expression-trait associations (within 500 kb of a gene) using the above-described approaches. To test TWAS across varying genetic backgrounds we simulated gene expression and phenotype using different quantitative trait loci per gene and cis-expression /trait heritability under genetic models that differentiate the effect of causality from that of pleiotropy. For each gene, on a training set ranging from 100 to 1000 individuals, we either (a) estimated regression coefficients with gene expression as the response using five different methods: LASSO, elastic net, Bayesian LASSO, Bayesian spike-slab, and Bayesian ridge regression or (b) performed eQTL analysis. We then sampled with replacement 50,000, 150,000, and 300,000 individuals respectively from the testing set of the remaining 5000 individuals and conducted GWAS on each set. Subsequently, we integrated the GWAS summary statistics derived from the testing set with the weights (or eQTLs) derived from the training set to identify expression-trait associations using (a) TWAS-MP (b) TWAS-SMR (c) eQTL-based GWAS, or (d) standalone GWAS. Finally, we examined the power to detect functionally relevant genes using the different approaches under the considered simulation scenarios. In general, we observed great similarities among TWAS-MP methods although the Bayesian methods resulted in improved power in comparison to LASSO and elastic net as the trait architecture grew more complex while training sample sizes and expression heritability remained small. Finally, we observed high power under causality but very low to moderate power under pleiotropy.
Carbohydrate degrading polypeptide and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
HisB as novel selection marker for gene targeting approaches in Aspergillus niger.
Fiedler, Markus R M; Gensheimer, Tarek; Kubisch, Christin; Meyer, Vera
2017-03-08
For Aspergillus niger, a broad set of auxotrophic and dominant resistance markers is available. However, only few offer targeted modification of a gene of interest into or at a genomic locus of choice, which hampers functional genomics studies. We thus aimed to extend the available set by generating a histidine auxotrophic strain with a characterized hisB locus for targeted gene integration and deletion in A. niger. A histidine-auxotrophic strain was established via disruption of the A. niger hisB gene by using the counterselectable pyrG marker. After curing, a hisB - , pyrG - strain was obtained, which served as recipient strain for further studies. We show here that both hisB orthologs from A. nidulans and A. niger can be used to reestablish histidine prototrophy in this recipient strain. Whereas the hisB gene from A. nidulans was suitable for efficient gene targeting at different loci in A. niger, the hisB gene from A. niger allowed efficient integration of a Tet-on driven luciferase reporter construct at the endogenous non-functional hisB locus. Subsequent analysis of the luciferase activity revealed that the hisB locus is tight under non-inducing conditions and allows even higher luciferase expression levels compared to the pyrG integration locus. Taken together, we provide here an alternative selection marker for A. niger, hisB, which allows efficient homologous integration rates as well as high expression levels which compare favorably to the well-established pyrG selection marker.
2011-01-01
Background Stem cells and their niches are studied in many systems, but mammalian germ stem cells (GSC) and their niches are still poorly understood. In rat testis, spermatogonia and undifferentiated Sertoli cells proliferate before puberty, but at puberty most spermatogonia enter spermatogenesis, and Sertoli cells differentiate to support this program. Thus, pre-pubertal spermatogonia might possess GSC potential and pre-pubertal Sertoli cells niche functions. We hypothesized that the different stem cell pools at pre-puberty and maturity provide a model for the identification of stem cell and niche-specific genes. We compared the transcript profiles of spermatogonia and Sertoli cells from pre-pubertal and pubertal rats and examined how these related to genes expressed in testicular cancers, which might originate from inappropriate communication between GSCs and Sertoli cells. Results The pre-pubertal spermatogonia-specific gene set comprised known stem cell and spermatogonial stem cell (SSC) markers. Similarly, the pre-pubertal Sertoli cell-specific gene set comprised known niche gene transcripts. A large fraction of these specifically enriched transcripts encoded trans-membrane, extra-cellular, and secreted proteins highlighting stem cell to niche communication. Comparing selective gene sets established in this study with published gene expression data of testicular cancers and their stroma, we identified sets expressed genes shared between testicular tumors and pre-pubertal spermatogonia, and tumor stroma and pre-pubertal Sertoli cells with statistic significance. Conclusions Our data suggest that SSC and their niche specifically express complementary factors for cell communication and that the same factors might be implicated in the communication between tumor cells and their micro-enviroment in testicular cancer. PMID:21232125
Labbé, Roselyne M.; Irimia, Manuel; Currie, Ko W.; Lin, Alexander; Zhu, Shu Jun; Brown, David D.R.; Ross, Eric J.; Voisin, Veronique; Bader, Gary D.; Blencowe, Benjamin J.; Pearson, Bret J.
2014-01-01
Many long-lived species of animals require the function of adult stem cells throughout their lives. However, the transcriptomes of stem cells in invertebrates and vertebrates have not been compared, and consequently, ancestral regulatory circuits that control stem cell populations remain poorly defined. In this study, we have used data from high-throughput RNA sequencing to compare the transcriptomes of pluripotent adult stem cells from planarians with the transcriptomes of human and mouse pluripotent embryonic stem cells. From a stringently defined set of 4,432 orthologs shared between planarians, mice and humans, we identified 123 conserved genes that are ≥5-fold differentially expressed in stem cells from all three species. Guided by this gene set, we used RNAi screening in adult planarians to discover novel stem cell regulators, which we found to affect the stem cell-associated functions of tissue homeostasis, regeneration, and stem cell maintenance. Examples of genes that disrupted these processes included the orthologs of TBL3, PSD12, TTC27, and RACK1. From these analyses, we concluded that by comparing stem cell transcriptomes from diverse species, it is possible to uncover conserved factors that function in stem cell biology. These results provide insights into which genes comprised the ancestral circuitry underlying the control of stem cell self-renewal and pluripotency. PMID:22696458
Platt, James L.; Rogers, Benjamin J.; Rogers, Kelley C.; Harwood, Adrian J.; Kimmel, Alan R.
2013-01-01
Control of chromatin structure is crucial for multicellular development and regulation of cell differentiation. The CHD (chromodomain-helicase-DNA binding) protein family is one of the major ATP-dependent, chromatin remodeling factors that regulate nucleosome positioning and access of transcription factors and RNA polymerase to the eukaryotic genome. There are three mammalian CHD subfamilies and their impaired functions are associated with several human diseases. Here, we identify three CHD orthologs (ChdA, ChdB and ChdC) in Dictyostelium discoideum. These CHDs are expressed throughout development, but with unique patterns. Null mutants lacking each CHD have distinct phenotypes that reflect their expression patterns and suggest functional specificity. Accordingly, using genome-wide (RNA-seq) transcriptome profiling for each null strain, we show that the different CHDs regulate distinct gene sets during both growth and development. ChdC is an apparent ortholog of the mammalian Class III CHD group that is associated with the human CHARGE syndrome, and GO analyses of aberrant gene expression in chdC nulls suggest defects in both cell-autonomous and non-autonomous signaling, which have been confirmed through analyses of chdC nulls developed in pure populations or with low levels of wild-type cells. This study provides novel insight into the broad function of CHDs in the regulation development and disease, through chromatin-mediated changes in directed gene expression. PMID:24301467
Groten, Karin; Pahari, Nabin T; Xu, Shuqing; Miloradovic van Doorn, Maja; Baldwin, Ian T
2015-01-01
Most land plants live in a symbiotic association with arbuscular mycorrhizal fungi (AMF) that belong to the phylum Glomeromycota. Although a number of plant genes involved in the plant-AMF interactions have been identified by analyzing mutants, the ability to rapidly manipulate gene expression to study the potential functions of new candidate genes remains unrealized. We analyzed changes in gene expression of wild tobacco roots (Nicotiana attenuata) after infection with mycorrhizal fungi (Rhizophagus irregularis) by serial analysis of gene expression (SuperSAGE) combined with next generation sequencing, and established a virus-induced gene-silencing protocol to study the function of candidate genes in the interaction. From 92,434 SuperSAGE Tag sequences, 32,808 (35%) matched with our in-house Nicotiana attenuata transcriptome database and 3,698 (4%) matched to Rhizophagus genes. In total, 11,194 Tags showed a significant change in expression (p<0.05, >2-fold change) after infection. When comparing the functions of highly up-regulated annotated Tags in this study with those of two previous large-scale gene expression studies, 18 gene functions were found to be up-regulated in all three studies mainly playing roles related to phytohormone metabolism, catabolism and defense. To validate the function of identified candidate genes, we used the technique of virus-induced gene silencing (VIGS) to silence the expression of three putative N. attenuata genes: germin-like protein, indole-3-acetic acid-amido synthetase GH3.9 and, as a proof-of-principle, calcium and calmodulin-dependent protein kinase (CCaMK). The silencing of the three plant genes in roots was successful, but only CCaMK silencing had a significant effect on the interaction with R. irregularis. Interestingly, when a highly activated inoculum was used for plant inoculation, the effect of CCaMK silencing on fungal colonization was masked, probably due to trans-complementation. This study demonstrates that large-scale gene expression studies across different species induce of a core set of genes of similar functions. However, additional factors seem to influence the overall pattern of gene expression, resulting in high variability among independent studies with different hosts. We conclude that VIGS is a powerful tool with which to investigate the function of genes involved in plant-AMF interactions but that inoculum strength can strongly influence the outcome of the interaction.
Zhu, Bo; Zhang, Wenli; Jiang, Jiming
2015-01-01
Enhancers are important regulators of gene expression in eukaryotes. Enhancers function independently of their distance and orientation to the promoters of target genes. Thus, enhancers have been difficult to identify. Only a few enhancers, especially distant intergenic enhancers, have been identified in plants. We developed an enhancer prediction system based exclusively on the DNase I hypersensitive sites (DHSs) in the Arabidopsis thaliana genome. A set of 10,044 DHSs located in intergenic regions, which are away from any gene promoters, were predicted to be putative enhancers. We examined the functions of 14 predicted enhancers using the β-glucuronidase gene reporter. Ten of the 14 (71%) candidates were validated by the reporter assay. We also designed 10 constructs using intergenic sequences that are not associated with DHSs, and none of these constructs showed enhancer activities in reporter assays. In addition, the tissue specificity of the putative enhancers can be precisely predicted based on DNase I hypersensitivity data sets developed from different plant tissues. These results suggest that the open chromatin signature-based enhancer prediction system developed in Arabidopsis may serve as a universal system for enhancer identification in plants. PMID:26373455
NCBI GEO: archive for functional genomics data sets--10 years on.
Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra
2011-01-01
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
Wolf, Yuri I; Makarova, Kira S; Yutin, Natalya; Koonin, Eugene V
2012-12-14
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major 'highways' of horizontal gene transfer. The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time. This article was reviewed by (for complete reviews see the Reviewers' Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
Huang, Lei; Zhao, Shuangping; Frasor, Jonna M.; Dai, Yang
2011-01-01
Approximately half of estrogen receptor (ER) positive breast tumors will fail to respond to endocrine therapy. Here we used an integrative bioinformatics approach to analyze three gene expression profiling data sets from breast tumors in an attempt to uncover underlying mechanisms contributing to the development of resistance and potential therapeutic strategies to counteract these mechanisms. Genes that are differentially expressed in tamoxifen resistant vs. sensitive breast tumors were identified from three different publically available microarray datasets. These differentially expressed (DE) genes were analyzed using gene function and gene set enrichment and examined in intrinsic subtypes of breast tumors. The Connectivity Map analysis was utilized to link gene expression profiles of tamoxifen resistant tumors to small molecules and validation studies were carried out in a tamoxifen resistant cell line. Despite little overlap in genes that are differentially expressed in tamoxifen resistant vs. sensitive tumors, a high degree of functional similarity was observed among the three datasets. Tamoxifen resistant tumors displayed enriched expression of genes related to cell cycle and proliferation, as well as elevated activity of E2F transcription factors, and were highly correlated with a Luminal intrinsic subtype. A number of small molecules, including phenothiazines, were found that induced a gene signature in breast cancer cell lines opposite to that found in tamoxifen resistant vs. sensitive tumors and the ability of phenothiazines to down-regulate cyclin E2 and inhibit proliferation of tamoxifen resistant breast cancer cells was validated. Our findings demonstrate that an integrated bioinformatics approach to analyze gene expression profiles from multiple breast tumor datasets can identify important biological pathways and potentially novel therapeutic options for tamoxifen-resistant breast cancers. PMID:21789246
A gene network bioinformatics analysis for pemphigoid autoimmune blistering diseases.
Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo
2015-07-01
In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.
2013-01-01
Background 3,4-methylenedioxymethamphetamine (MDMA, "ecstasy") is a widely used recreational drug known to impair cognitive functions on the long-run. Both hippocampal and frontal cortical regions have well established roles in behavior, memory formation and other cognitive tasks and damage of these regions is associated with altered behavior and cognitive functions, impairments frequently described in heavy MDMA users. The aim of this study was to examine the hippocampus, frontal cortex and dorsal raphe of Dark Agouti rats with gene expression arrays (Illumina RatRef bead arrays) looking for possible mechanisms and new candidates contributing to the effects of a single dose of MDMA (15 mg/kg) 3 weeks earlier. Results The number of differentially expressed genes in the hippocampus, frontal cortex and the dorsal raphe were 481, 155, and 15, respectively. Gene set enrichment analysis of the microarray data revealed reduced expression of 'memory’ and 'cognition’, 'dendrite development’ and 'regulation of synaptic plasticity’ gene sets in the hippocampus, parallel to the upregulation of the CB1 cannabinoid- and Epha4, Epha5, Epha6 ephrin receptors. Downregulated gene sets in the frontal cortex were related to protein synthesis, chromatin organization, transmembrane transport processes, while 'dendrite development’, 'regulation of synaptic plasticity’ and 'positive regulation of synapse assembly’ gene sets were upregulated. Changes in the dorsal raphe region were mild and in most cases not significant. Conclusion The present data raise the possibility of new synapse formation/synaptic reorganization in the frontal cortex three weeks after a single neurotoxic dose of MDMA. In contrast, a prolonged depression of new neurite formation in the hippocampus is suggested by the data, which underlines the particular vulnerability of this brain region after the drug treatment. Finally, our results also suggest the substantial contribution of CB1 receptor and endocannabinoid mediated pathways in the hippocampal impairments. Taken together the present study provides evidence for the participation of new molecular candidates in the long-term effects of MDMA. PMID:24378229
Vazquez, Miguel; Nogales-Cadenas, Ruben; Arroyo, Javier; Botías, Pedro; García, Raul; Carazo, Jose M; Tirado, Francisco; Pascual-Montano, Alberto; Carmona-Saez, Pedro
2010-07-01
The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.
Diversification of Root Hair Development Genes in Vascular Plants.
Huang, Ling; Shi, Xinhui; Wang, Wenjia; Ryu, Kook Hui; Schiefelbein, John
2017-07-01
The molecular genetic program for root hair development has been studied intensively in Arabidopsis ( Arabidopsis thaliana ). To understand the extent to which this program might operate in other plants, we conducted a large-scale comparative analysis of root hair development genes from diverse vascular plants, including eudicots, monocots, and a lycophyte. Combining phylogenetics and transcriptomics, we discovered conservation of a core set of root hair genes across all vascular plants, which may derive from an ancient program for unidirectional cell growth coopted for root hair development during vascular plant evolution. Interestingly, we also discovered preferential diversification in the structure and expression of root hair development genes, relative to other root hair- and root-expressed genes, among these species. These differences enabled the definition of sets of genes and gene functions that were acquired or lost in specific lineages during vascular plant evolution. In particular, we found substantial divergence in the structure and expression of genes used for root hair patterning, suggesting that the Arabidopsis transcriptional regulatory mechanism is not shared by other species. To our knowledge, this study provides the first comprehensive view of gene expression in a single plant cell type across multiple species. © 2017 American Society of Plant Biologists. All Rights Reserved.
Diversification of Root Hair Development Genes in Vascular Plants1[OPEN
Shi, Xinhui; Wang, Wenjia; Ryu, Kook Hui
2017-01-01
The molecular genetic program for root hair development has been studied intensively in Arabidopsis (Arabidopsis thaliana). To understand the extent to which this program might operate in other plants, we conducted a large-scale comparative analysis of root hair development genes from diverse vascular plants, including eudicots, monocots, and a lycophyte. Combining phylogenetics and transcriptomics, we discovered conservation of a core set of root hair genes across all vascular plants, which may derive from an ancient program for unidirectional cell growth coopted for root hair development during vascular plant evolution. Interestingly, we also discovered preferential diversification in the structure and expression of root hair development genes, relative to other root hair- and root-expressed genes, among these species. These differences enabled the definition of sets of genes and gene functions that were acquired or lost in specific lineages during vascular plant evolution. In particular, we found substantial divergence in the structure and expression of genes used for root hair patterning, suggesting that the Arabidopsis transcriptional regulatory mechanism is not shared by other species. To our knowledge, this study provides the first comprehensive view of gene expression in a single plant cell type across multiple species. PMID:28487476
Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko
2012-07-15
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.
Cortesi, Marilisa; Bandiera, Lucia; Pasini, Alice; Bevilacqua, Alessandro; Gherardi, Alessandro; Furini, Simone; Giordano, Emanuele
2017-01-01
Quantifying gene expression at single cell level is fundamental for the complete characterization of synthetic gene circuits, due to the significant impact of noise and inter-cellular variability on the system's functionality. Commercial set-ups that allow the acquisition of fluorescent signal at single cell level (flow cytometers or quantitative microscopes) are expensive apparatuses that are hardly affordable by small laboratories. A protocol that makes a standard optical microscope able to acquire quantitative, single cell, fluorescent data from a bacterial population transformed with synthetic gene circuitry is presented. Single cell fluorescence values, acquired with a microscope set-up and processed with custom-made software, are compared with results that were obtained with a flow cytometer in a bacterial population transformed with the same gene circuitry. The high correlation between data from the two experimental set-ups, with a correlation coefficient computed over the tested dynamic range > 0.99, proves that a standard optical microscope- when coupled with appropriate software for image processing- might be used for quantitative single-cell fluorescence measurements. The calibration of the set-up, together with its validation, is described. The experimental protocol described in this paper makes quantitative measurement of single cell fluorescence accessible to laboratories equipped with standard optical microscope set-ups. Our method allows for an affordable measurement/quantification of intercellular variability, whose better understanding of this phenomenon will improve our comprehension of cellular behaviors and the design of synthetic gene circuits. All the required software is freely available to the synthetic biology community (MUSIQ Microscope flUorescence SIngle cell Quantification).
Epigenetic regulation of depot-specific gene expression in adipose tissue.
Gehrke, Sandra; Brueckner, Bodo; Schepky, Andreas; Klein, Johannes; Iwen, Alexander; Bosch, Thomas C G; Wenck, Horst; Winnefeld, Marc; Hagemann, Sabine
2013-01-01
In humans, adipose tissue is distributed in subcutaneous abdominal and subcutaneous gluteal depots that comprise a variety of functional differences. Whereas energy storage in gluteal adipose tissue has been shown to mediate a protective effect, an increase of abdominal adipose tissue is associated with metabolic disorders. However, the molecular basis of depot-specific characteristics is not completely understood yet. Using array-based analyses of transcription profiles, we identified a specific set of genes that was differentially expressed between subcutaneous abdominal and gluteal adipose tissue. To investigate the role of epigenetic regulation in depot-specific gene expression, we additionally analyzed genome-wide DNA methylation patterns in abdominal and gluteal depots. By combining both data sets, we identified a highly significant set of depot-specifically expressed genes that appear to be epigenetically regulated. Interestingly, the majority of these genes form part of the homeobox gene family. Moreover, genes involved in fatty acid metabolism were also differentially expressed. Therefore we suppose that changes in gene expression profiles might account for depot-specific differences in lipid composition. Indeed, triglycerides and fatty acids of abdominal adipose tissue were more saturated compared to triglycerides and fatty acids in gluteal adipose tissue. Taken together, our results uncover clear differences between abdominal and gluteal adipose tissue on the gene expression and DNA methylation level as well as in fatty acid composition. Therefore, a detailed molecular characterization of adipose tissue depots will be essential to develop new treatment strategies for metabolic syndrome associated complications.
Reed, Jessica L; D'Ambrosio, Enrico; Marenco, Stefano; Ursini, Gianluca; Zheutlin, Amanda B; Blasi, Giuseppe; Spencer, Barbara E; Romano, Raffaella; Hochheiser, Jesse; Reifman, Ann; Sturm, Justin; Berman, Karen F; Bertolino, Alessandro; Weinberger, Daniel R; Callicott, Joseph H
2018-01-01
Brain phenotypes showing environmental influence may help clarify unexplained associations between urban exposure and psychiatric risk. Heritable prefrontal fMRI activation during working memory (WM) is such a phenotype. We hypothesized that urban upbringing (childhood urbanicity) would alter this phenotype and interact with dopamine genes that regulate prefrontal function during WM. Further, dopamine has been hypothesized to mediate urban-associated factors like social stress. WM-related prefrontal function was tested for main effects of urbanicity, main effects of three dopamine genes-catechol-O-methyltransferase (COMT), dopamine receptor D1 (DRD1), and dopamine receptor D2 (DRD2)-and, importantly, dopamine gene-by-urbanicity interactions. For COMT, three independent human samples were recruited (total n = 487). We also studied 253 subjects genotyped for DRD1 and DRD2. 3T fMRI activation during the N-back WM task was the dependent variable, while childhood urbanicity, dopamine genotype, and urbanicity-dopamine interactions were independent variables. Main effects of dopamine genes and of urbanicity were found. Individuals raised in an urban environment showed altered prefrontal activation relative to those raised in rural or town settings. For each gene, dopamine genotype-by-urbanicity interactions were shown in prefrontal cortex-COMT replicated twice in two independent samples. An urban childhood upbringing altered prefrontal function and interacted with each gene to alter genotype-phenotype relationships. Gene-environment interactions between multiple dopamine genes and urban upbringing suggest that neural effects of developmental environmental exposure could mediate, at least partially, increased risk for psychiatric illness in urban environments via dopamine genes expressed into adulthood.
Establishing the role of rare coding variants in known Parkinson's disease risk loci.
Jansen, Iris E; Gibbs, J Raphael; Nalls, Mike A; Price, T Ryan; Lubbe, Steven; van Rooij, Jeroen; Uitterlinden, André G; Kraaij, Robert; Williams, Nigel M; Brice, Alexis; Hardy, John; Wood, Nicholas W; Morris, Huw R; Gasser, Thomas; Singleton, Andrew B; Heutink, Peter; Sharma, Manu
2017-11-01
Many common genetic factors have been identified to contribute to Parkinson's disease (PD) susceptibility, improving our understanding of the related underlying biological mechanisms. The involvement of rarer variants in these loci has been poorly studied. Using International Parkinson's Disease Genomics Consortium data sets, we performed a comprehensive study to determine the impact of rare variants in 23 previously published genome-wide association studies (GWAS) loci in PD. We applied Prix fixe to select the putative causal genes underneath the GWAS peaks, which was based on underlying functional similarities. The Sequence Kernel Association Test was used to analyze the joint effect of rare, common, or both types of variants on PD susceptibility. All genes were tested simultaneously as a gene set and each gene individually. We observed a moderate association of common variants, confirming the involvement of the known PD risk loci within our genetic data sets. Focusing on rare variants, we identified additional association signals for LRRK2, STBD1, and SPATA19. Our study suggests an involvement of rare variants within several putatively causal genes underneath previously identified PD GWAS peaks. Copyright © 2017 Elsevier Inc. All rights reserved.
Peng, Hui; Lan, Chaowang; Zheng, Yi; Hutvagner, Gyorgy; Tao, Dacheng; Li, Jinyan
2017-03-24
MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics. We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs. With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.
Hill, Matthew J; Killick, Richard; Navarrete, Katherinne; Maruszak, Aleksandra; McLaughlin, Gemma M; Williams, Brenda P; Bray, Nicholas J
2017-05-01
Common variants in the TCF4 gene are among the most robustly supported genetic risk factors for schizophrenia. Rare TCF4 deletions and loss-of-function point mutations cause Pitt-Hopkins syndrome, a developmental disorder associated with severe intellectual disability. To explore molecular and cellular mechanisms by which TCF4 perturbation could interfere with human cortical development, we experimentally reduced the endogenous expression of TCF4 in a neural progenitor cell line derived from the developing human cerebral cortex using RNA interference. Effects on genome-wide gene expression were assessed by microarray, followed by Gene Ontology and pathway analysis of differentially expressed genes. We tested for genetic association between the set of differentially expressed genes and schizophrenia using genome-wide association study data from the Psychiatric Genomics Consortium and competitive gene set analysis (MAGMA). Effects on cell proliferation were assessed using high content imaging. Genes that were differentially expressed following TCF4 knockdown were highly enriched for involvement in the cell cycle. There was a nonsignificant trend for genetic association between the differentially expressed gene set and schizophrenia. Consistent with the gene expression data, TCF4 knockdown was associated with reduced proliferation of cortical progenitor cells in vitro. A detailed mechanistic explanation of how TCF4 knockdown alters human neural progenitor cell proliferation is not provided by this study. Our data indicate effects of TCF4 perturbation on human cortical progenitor cell proliferation, a process that could contribute to cognitive deficits in individuals with Pitt-Hopkins syndrome and risk for schizophrenia.
Protective pathways against colitis mediated by appendicitis and appendectomy.
Cheluvappa, R; Luo, A S; Palmer, C; Grimm, M C
2011-09-01
Appendicitis followed by appendectomy (AA) at a young age protects against inflammatory bowel disease (IBD). Using a novel murine appendicitis model, we showed that AA protected against subsequent experimental colitis. To delineate genes/pathways involved in this protection, AA was performed and samples harvested from the most distal colon. RNA was extracted from four individual colonic samples per group (AA group and double-laparotomy control group) and each sample microarray analysed followed by gene-set enrichment analysis (GSEA). The gene-expression study was validated by quantitative reverse transcription-polymerase chain reaction (RT-PCR) of 14 selected genes across the immunological spectrum. Distal colonic expression of 266 gene-sets was up-regulated significantly in AA group samples (false discovery rates < 1%; P-value < 0·001). Time-course RT-PCR experiments involving the 14 genes displayed down-regulation over 28 days. The IBD-associated genes tnfsf10, SLC22A5, C3, ccr5, irgm, ptger4 and ccl20 were modulated in AA mice 3 days after surgery. Many key immunological and cellular function-associated gene-sets involved in the protective effect of AA in experimental colitis were identified. The down-regulation of 14 selected genes over 28 days after surgery indicates activation, repression or de-repression of these genes leading to downstream AA-conferred anti-colitis protection. Further analysis of these genes, profiles and biological pathways may assist in developing better therapeutic strategies in the management of intractable IBD. © 2011 The Authors. Clinical and Experimental Immunology © 2011 British Society for Immunology.
Crosley, E J; Elliot, M G; Christians, J K; Crespi, B J
2013-02-01
Recent evidence from chimpanzees and gorillas has raised doubts that preeclampsia is a uniquely human disease. The deep extravillous trophoblast (EVT) invasion and spiral artery remodeling that characterizes our placenta (and is abnormal in preeclampsia) is shared within great apes, setting Homininae apart from Hylobatidae and Old World Monkeys, which show much shallower trophoblast invasion and limited spiral artery remodeling. We hypothesize that the evolution of a more invasive placenta in the lineage ancestral to the great apes involved positive selection on genes crucial to EVT invasion and spiral artery remodeling. Furthermore, identification of placentally-expressed genes under selection in this lineage may identify novel genes involved in placental development. We tested for positive selection in approximately 18,000 genes using the ratio of non-synonymous to synonymous amino acid substitution for protein-coding DNA. DAVID Bioinformatics Resources identified biological processes enriched in positively selected genes, including processes related to EVT invasion and spiral artery remodeling. Analyses revealed 295 and 264 genes under significant positive selection on the branches ancestral to Hominidae (Human, Chimp, Gorilla, Orangutan) and Homininae (Human, Chimp, Gorilla), respectively. Gene ontology analysis of these gene sets demonstrated significant enrichments for several functional gene clusters relevant to preeclampsia risk, and sets of placentally-expressed genes that have been linked with preeclampsia and/or trophoblast invasion in other studies. Our study represents a novel approach to the identification of candidate genes and amino acid residues involved in placental pathologies by implicating them in the evolution of highly-invasive placenta. Copyright © 2012 Elsevier Ltd. All rights reserved.
Gene finding in metatranscriptomic sequences.
Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu
2014-01-01
Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
Maier, Uwe-G; Zauner, Stefan; Woehle, Christian; Bolte, Kathrin; Hempel, Franziska; Allen, John F.; Martin, William F.
2013-01-01
Plastid and mitochondrial genomes have undergone parallel evolution to encode the same functional set of genes. These encode conserved protein components of the electron transport chain in their respective bioenergetic membranes and genes for the ribosomes that express them. This highly convergent aspect of organelle genome evolution is partly explained by the redox regulation hypothesis, which predicts a separate plastid or mitochondrial location for genes encoding bioenergetic membrane proteins of either photosynthesis or respiration. Here we show that convergence in organelle genome evolution is far stronger than previously recognized, because the same set of genes for ribosomal proteins is independently retained by both plastid and mitochondrial genomes. A hitherto unrecognized selective pressure retains genes for the same ribosomal proteins in both organelles. On the Escherichia coli ribosome assembly map, the retained proteins are implicated in 30S and 50S ribosomal subunit assembly and initial rRNA binding. We suggest that ribosomal assembly imposes functional constraints that govern the retention of ribosomal protein coding genes in organelles. These constraints are subordinate to redox regulation for electron transport chain components, which anchor the ribosome to the organelle genome in the first place. As organelle genomes undergo reduction, the rRNAs also become smaller. Below size thresholds of approximately 1,300 nucleotides (16S rRNA) and 2,100 nucleotides (26S rRNA), all ribosomal protein coding genes are lost from organelles, while electron transport chain components remain organelle encoded as long as the organelles use redox chemistry to generate a proton motive force. PMID:24259312
Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection
NASA Astrophysics Data System (ADS)
Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd
2015-02-01
Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.
ERIC Educational Resources Information Center
Tharp, Ian J.; Pickering, Alan D.
2011-01-01
Individual differences in psychophysiological function have been shown to influence the balance between flexibility and distractibility during attentional set-shifting [e.g., Dreisbach et al. (2005). Dopamine and cognitive control: The influence of spontaneous eyeblink rate and dopamine gene polymorphisms on perseveration and distractibility.…
From data to function: functional modeling of poultry genomics data.
McCarthy, F M; Lyons, E
2013-09-01
One of the challenges of functional genomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and analyzing these data. This process will support the efforts of poultry researchers to make sense of their functional genomics data sets, and we provide here a starting point for researchers who wish to take advantage of these tools.
Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong
2015-10-19
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .
Bacci, Giovanni; Fiscarelli, Ersilia; Taccetti, Giovanni; Dolce, Daniela; Paganin, Patrizia; Morelli, Patrizia; Tuccio, Vanessa; De Alessandri, Alessandra; Lucidi, Vincenzina
2017-01-01
In recent years, next-generation sequencing (NGS) was employed to decipher the structure and composition of the microbiota of the airways in cystic fibrosis (CF) patients. However, little is still known about the overall gene functions harbored by the resident microbial populations and which specific genes are associated with various stages of CF lung disease. In the present study, we aimed to identify the microbial gene repertoire of CF microbiota in twelve patients with severe and normal/mild lung disease by performing sputum shotgun metagenome sequencing. The abundance of metabolic pathways encoded by microbes inhabiting CF airways was reconstructed from the metagenome. We identified a set of metabolic pathways differently distributed in patients with different pulmonary function; namely, pathways related to bacterial chemotaxis and flagellar assembly, as well as genes encoding efflux-mediated antibiotic resistance mechanisms and virulence-related genes. The results indicated that the microbiome of CF patients with low pulmonary function is enriched in virulence-related genes and in genes encoding efflux-mediated antibiotic resistance mechanisms. Overall, the microbiome of severely affected adults with CF seems to encode different mechanisms for the facilitation of microbial colonization and persistence in the lung, consistent with the characteristics of multidrug-resistant microbial communities that are commonly observed in patients with severe lung disease. PMID:28758937
A Morpholino-based screen to identify novel genes involved in craniofacial morphogenesis
Melvin, Vida Senkus; Feng, Weiguo; Hernandez-Lagunas, Laura; Artinger, Kristin Bruk; Williams, Trevor
2014-01-01
BACKGROUND The regulatory mechanisms underpinning facial development are conserved between diverse species. Therefore, results from model systems provide insight into the genetic causes of human craniofacial defects. Previously, we generated a comprehensive dataset examining gene expression during development and fusion of the mouse facial prominences. Here, we used this resource to identify genes that have dynamic expression patterns in the facial prominences, but for which only limited information exists concerning developmental function. RESULTS This set of ~80 genes was used for a high throughput functional analysis in the zebrafish system using Morpholino gene knockdown technology. This screen revealed three classes of cranial cartilage phenotypes depending upon whether knockdown of the gene affected the neurocranium, viscerocranium, or both. The targeted genes that produced consistent phenotypes encoded proteins linked to transcription (meis1, meis2a, tshz2, vgll4l), signaling (pkdcc, vlk, macc1, wu:fb16h09), and extracellular matrix function (smoc2). The majority of these phenotypes were not altered by reduction of p53 levels, demonstrating that both p53 dependent and independent mechanisms were involved in the craniofacial abnormalities. CONCLUSIONS This Morpholino-based screen highlights new genes involved in development of the zebrafish craniofacial skeleton with wider relevance to formation of the face in other species, particularly mouse and human. PMID:23559552
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...
2016-11-24
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Merks, Roeland M H; Guravage, Michael; Inzé, Dirk; Beemster, Gerrit T S
2011-02-01
Plant organs, including leaves and roots, develop by means of a multilevel cross talk between gene regulation, patterned cell division and cell expansion, and tissue mechanics. The multilevel regulatory mechanisms complicate classic molecular genetics or functional genomics approaches to biological development, because these methodologies implicitly assume a direct relation between genes and traits at the level of the whole plant or organ. Instead, understanding gene function requires insight into the roles of gene products in regulatory networks, the conditions of gene expression, etc. This interplay is impossible to understand intuitively. Mathematical and computer modeling allows researchers to design new hypotheses and produce experimentally testable insights. However, the required mathematics and programming experience makes modeling poorly accessible to experimental biologists. Problem-solving environments provide biologically intuitive in silico objects ("cells", "regulation networks") required for setting up a simulation and present those to the user in terms of familiar, biological terminology. Here, we introduce the cell-based computer modeling framework VirtualLeaf for plant tissue morphogenesis. The current version defines a set of biologically intuitive C++ objects, including cells, cell walls, and diffusing and reacting chemicals, that provide useful abstractions for building biological simulations of developmental processes. We present a step-by-step introduction to building models with VirtualLeaf, providing basic example models of leaf venation and meristem development. VirtualLeaf-based models provide a means for plant researchers to analyze the function of developmental genes in the context of the biophysics of growth and patterning. VirtualLeaf is an ongoing open-source software project (http://virtualleaf.googlecode.com) that runs on Windows, Mac, and Linux.
2010-01-01
One of the important challenges to post-genomic biology is relating observed phenotypic alterations to the underlying collective alterations in genes. Current inferential methods, however, invariably omit large bodies of information on the relationships between genes. We present a method that takes account of such information - expressed in terms of the topology of a correlation network - and we apply the method in the context of current procedures for gene set enrichment analysis. PMID:20187943
Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis
2016-09-02
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.
Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael
2013-08-01
With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.
Bubier, Jason A.; Jay, Jeremy J.; Baker, Christopher L.; Bergeson, Susan E.; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C.; Chesler, Elissa J.
2014-01-01
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. PMID:24923803
Bubier, Jason A; Jay, Jeremy J; Baker, Christopher L; Bergeson, Susan E; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C; Chesler, Elissa J
2014-08-01
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. Copyright © 2014 by the Genetics Society of America.
Yi, Jin Wook; Park, Ji Yeon; Sung, Ji-Youn; Kwak, Sang Hyuk; Yu, Jihan; Chang, Ji Hyun; Kim, Jo-Heon; Ha, Sang Yun; Paik, Eun Kyung; Lee, Woo Seung; Kim, Su-Jin; Lee, Kyu Eun; Kim, Ju Han
2015-01-01
Elevated levels of reactive oxygen species (ROS) have been proposed as a risk factor for the development of papillary thyroid carcinoma (PTC) in patients with Hashimoto thyroiditis (HT). However, it has yet to be proven that the total levels of ROS are sufficiently increased to contribute to carcinogenesis. We hypothesized that if the ROS levels were increased in HT, ROS-related genes would also be differently expressed in PTC with HT. To find differentially expressed genes (DEGs) we analyzed data from the Cancer Genomic Atlas, gene expression data from RNA sequencing: 33 from normal thyroid tissue, 232 from PTC without HT, and 60 from PTC with HT. We prepared 402 ROS-related genes from three gene sets by genomic database searching. We also analyzed a public microarray data to validate our results. Thirty-three ROS related genes were up-regulated in PTC with HT, whereas there were only nine genes in PTC without HT (Chi-square p-value < 0.001). Mean log2 fold changes of up-regulated genes was 0.562 in HT group and 0.252 in PTC without HT group (t-test p-value = 0.001). In microarray data analysis, 12 of 32 ROS-related genes showed the same differential expression pattern with statistical significance. In gene ontology analysis, up-regulated ROS-related genes were related with ROS metabolism and apoptosis. Immune function-related and carcinogenesis-related gene sets were enriched only in HT group in Gene Set Enrichment Analysis. Our results suggested that ROS levels may be increased in PTC with HT. Increased levels of ROS may contribute to PTC development in patients with HT.
COMPADRE: an R and web resource for pathway activity analysis by component decompositions.
Ramos-Rodriguez, Roberto-Rafael; Cuevas-Diaz-Duran, Raquel; Falciani, Francesco; Tamez-Peña, Jose-Gerardo; Trevino, Victor
2012-10-15
The analysis of biological networks has become essential to study functional genomic data. Compadre is a tool to estimate pathway/gene sets activity indexes using sub-matrix decompositions for biological networks analyses. The Compadre pipeline also includes one of the direct uses of activity indexes to detect altered gene sets. For this, the gene expression sub-matrix of a gene set is decomposed into components, which are used to test differences between groups of samples. This procedure is performed with and without differentially expressed genes to decrease false calls. During this process, Compadre also performs an over-representation test. Compadre already implements four decomposition methods [principal component analysis (PCA), Isomaps, independent component analysis (ICA) and non-negative matrix factorization (NMF)], six statistical tests (t- and f-test, SAM, Kruskal-Wallis, Welch and Brown-Forsythe), several gene sets (KEGG, BioCarta, Reactome, GO and MsigDB) and can be easily expanded. Our simulation results shown in Supplementary Information suggest that Compadre detects more pathways than over-representation tools like David, Babelomics and Webgestalt and less false positives than PLAGE. The output is composed of results from decomposition and over-representation analyses providing a more complete biological picture. Examples provided in Supplementary Information show the utility, versatility and simplicity of Compadre for analyses of biological networks. Compadre is freely available at http://bioinformatica.mty.itesm.mx:8080/compadre. The R package is also available at https://sourceforge.net/p/compadre.
Dutta, B; Pusztai, L; Qi, Y; André, F; Lazar, V; Bianchini, G; Ueno, N; Agarwal, R; Wang, B; Shiang, C Y; Hortobagyi, G N; Mills, G B; Symmans, W F; Balázsi, G
2012-01-01
Background: The rapid collection of diverse genome-scale data raises the urgent need to integrate and utilise these resources for biological discovery or biomedical applications. For example, diverse transcriptomic and gene copy number variation data are currently collected for various cancers, but relatively few current methods are capable to utilise the emerging information. Methods: We developed and tested a data-integration method to identify gene networks that drive the biology of breast cancer clinical subtypes. The method simultaneously overlays gene expression and gene copy number data on protein–protein interaction, transcriptional-regulatory and signalling networks by identifying coincident genomic and transcriptional disturbances in local network neighborhoods. Results: We identified distinct driver-networks for each of the three common clinical breast cancer subtypes: oestrogen receptor (ER)+, human epidermal growth factor receptor 2 (HER2)+, and triple receptor-negative breast cancers (TNBC) from patient and cell line data sets. Driver-networks inferred from independent datasets were significantly reproducible. We also confirmed the functional relevance of a subset of randomly selected driver-network members for TNBC in gene knockdown experiments in vitro. We found that TNBC driver-network members genes have increased functional specificity to TNBC cell lines and higher functional sensitivity compared with genes selected by differential expression alone. Conclusion: Clinical subtype-specific driver-networks identified through data integration are reproducible and functionally important. PMID:22343619
Distribution of mutations in the PEX gene in families with X-linked hypophosphataemic rickets (HYP).
Rowe, P S; Oudet, C L; Francis, F; Sinding, C; Pannetier, S; Econs, M J; Strom, T M; Meitinger, T; Garabedian, M; David, A; Macher, M A; Questiaux, E; Popowska, E; Pronicka, E; Read, A P; Mokrzycki, A; Glorieux, F H; Drezner, M K; Hanauer, A; Lehrach, H; Goulding, J N; O'Riordan, J L
1997-04-01
Mutations in the PEX gene at Xp22.1 (phosphate-regulating gene with homologies to endopeptidases, on the X-chromosome), are responsible for X-linked hypophosphataemic rickets (HYP). Homology of PEX to the M13 family of Zn2+ metallopeptidases which include neprilysin (NEP) as prototype, has raised important questions regarding PEX function at the molecular level. The aim of this study was to analyse 99 HYP families for PEX gene mutations, and to correlate predicted changes in the protein structure with Zn2+ metallopeptidase gene function. Primers flanking 22 characterised exons were used to amplify DNA by PCR, and SSCP was then used to screen for mutations. Deletions, insertions, nonsense mutations, stop codons and splice mutations occurred in 83% of families screened for in all 22 exons, and 51% of a separate set of families screened in 17 PEX gene exons. Missense mutations in four regions of the gene were informative regarding function, with one mutation in the Zn2+-binding site predicted to alter substrate enzyme interaction and catalysis. Computer analysis of the remaining mutations predicted changes in secondary structure, N-glycosylation, protein phosphorylation and catalytic site molecular structure. The wide range of mutations that align with regions required for protease activity in NEP suggests that PEX also functions as a protease, and may act by processing factor(s) involved in bone mineral metabolism.
Chang, Yao-Ming; Liu, Wen-Yu; Shih, Arthur Chun-Chieh; Shen, Meng-Ni; Lu, Chen-Hua; Lu, Mei-Yeh Jade; Yang, Hui-Wen; Wang, Tzi-Yuan; Chen, Sean C-C; Chen, Stella Maris; Li, Wen-Hsiung; Ku, Maurice S B
2012-09-01
To study the regulatory and functional differentiation between the mesophyll (M) and bundle sheath (BS) cells of maize (Zea mays), we isolated large quantities of highly homogeneous M and BS cells from newly matured second leaves for transcriptome profiling by RNA sequencing. A total of 52,421 annotated genes with at least one read were found in the two transcriptomes. Defining a gene with more than one read per kilobase per million mapped reads as expressed, we identified 18,482 expressed genes; 14,972 were expressed in M cells, including 53 M-enriched transcription factor (TF) genes, whereas 17,269 were expressed in BS cells, including 214 BS-enriched TF genes. Interestingly, many TF gene families show a conspicuous BS preference in expression. Pathway analyses reveal differentiation between the two cell types in various functional categories, with the M cells playing more important roles in light reaction, protein synthesis and folding, tetrapyrrole synthesis, and RNA binding, while the BS cells specialize in transport, signaling, protein degradation and posttranslational modification, major carbon, hydrogen, and oxygen metabolism, cell division and organization, and development. Genes coding for several transporters involved in the shuttle of C(4) metabolites and BS cell wall development have been identified, to our knowledge, for the first time. This comprehensive data set will be useful for studying M/BS differentiation in regulation and function.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-01-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Parenclitic networks: uncovering new functions in biological data
Zanin, Massimiliano; Alcazar, Joaquín Medina; Carbajosa, Jesus Vicente; Paez, Marcela Gomez; Papo, David; Sousa, Pedro; Menasalvas, Ernestina; Boccaletti, Stefano
2014-01-01
We introduce a novel method to represent time independent, scalar data sets as complex networks. We apply our method to investigate gene expression in the response to osmotic stress of Arabidopsis thaliana. In the proposed network representation, the most important genes for the plant response turn out to be the nodes with highest centrality in appropriately reconstructed networks. We also performed a target experiment, in which the predicted genes were artificially induced one by one, and the growth of the corresponding phenotypes compared to that of the wild-type. The joint application of the network reconstruction method and of the in vivo experiments allowed identifying 15 previously unknown key genes, and provided models of their mutual relationships. This novel representation extends the use of graph theory to data sets hitherto considered outside of the realm of its application, vastly simplifying the characterization of their underlying structure. PMID:24870931
Herman, Dorota; Slabbinck, Bram; Pè, Mario Enrico
2016-01-01
Leaves are vital organs for biomass and seed production because of their role in the generation of metabolic energy and organic compounds. A better understanding of the molecular networks underlying leaf development is crucial to sustain global requirements for food and renewable energy. Here, we combined transcriptome profiling of proliferative leaf tissue with in-depth phenotyping of the fourth leaf at later stages of development in 197 recombinant inbred lines of two different maize (Zea mays) populations. Previously, correlation analysis in a classical biparental mapping population identified 1,740 genes correlated with at least one of 14 traits. Here, we extended these results with data from a multiparent advanced generation intercross population. As expected, the phenotypic variability was found to be larger in the latter population than in the biparental population, although general conclusions on the correlations among the traits are comparable. Data integration from the two diverse populations allowed us to identify a set of 226 genes that are robustly associated with diverse leaf traits. This set of genes is enriched for transcriptional regulators and genes involved in protein synthesis and cell wall metabolism. In order to investigate the molecular network context of the candidate gene set, we integrated our data with publicly available functional genomics data and identified a growth regulatory network of 185 genes. Our results illustrate the power of combining in-depth phenotyping with transcriptomics in mapping populations to dissect the genetic control of complex traits and present a set of candidate genes for use in biomass improvement. PMID:26754667
Baute, Joke; Herman, Dorota; Coppens, Frederik; De Block, Jolien; Slabbinck, Bram; Dell'Acqua, Matteo; Pè, Mario Enrico; Maere, Steven; Nelissen, Hilde; Inzé, Dirk
2016-03-01
Leaves are vital organs for biomass and seed production because of their role in the generation of metabolic energy and organic compounds. A better understanding of the molecular networks underlying leaf development is crucial to sustain global requirements for food and renewable energy. Here, we combined transcriptome profiling of proliferative leaf tissue with in-depth phenotyping of the fourth leaf at later stages of development in 197 recombinant inbred lines of two different maize (Zea mays) populations. Previously, correlation analysis in a classical biparental mapping population identified 1,740 genes correlated with at least one of 14 traits. Here, we extended these results with data from a multiparent advanced generation intercross population. As expected, the phenotypic variability was found to be larger in the latter population than in the biparental population, although general conclusions on the correlations among the traits are comparable. Data integration from the two diverse populations allowed us to identify a set of 226 genes that are robustly associated with diverse leaf traits. This set of genes is enriched for transcriptional regulators and genes involved in protein synthesis and cell wall metabolism. In order to investigate the molecular network context of the candidate gene set, we integrated our data with publicly available functional genomics data and identified a growth regulatory network of 185 genes. Our results illustrate the power of combining in-depth phenotyping with transcriptomics in mapping populations to dissect the genetic control of complex traits and present a set of candidate genes for use in biomass improvement. © 2016 American Society of Plant Biologists. All Rights Reserved.
Yin, L G; Zou, Z Q; Zhao, H Y; Zhang, C L; Shen, J G; Qi, L; Qi, M; Xue, Z Q
2014-01-01
Adenocarcinoma (ADC) and squamous cell carcinomas (SCC) are two subtypes of non-small cell lung carcinomas which are regarded as the leading cause of cancer-related malignancy worldwide. The aim of this study is to detect the differentially methylated loci (DMLs) and differentially methylated genes (DMGs) of these two tumor sets, and then to illustrate the different expression level of specific methylated genes. Using TCGA database and Illumina HumanMethylation 27 arrays, we first screened the DMGs and DMLs in tumor samples. Then, we explored the BiologicalProcess terms of hypermethylated and hypomethylated genes using Functional Gene Ontology (GO) catalogues. Hypermethylation intensively occurred in CpG-island, whereas hypomethylation was located in non-CpG-island. Most SCC and ADC hypermethylated genes involved GO function of DNA dependenit regulation of transcription, and hypomethylated genes mainly 'enriched in the term of immune responses. Additionally, the expression level of specific differentially methylated genesis distinctbetween ADC and SCC. It is concluded that ADC and SCC have different methylated status that might play an important role in carcinogenesis.
Patterns of population differentiation of candidate genes for cardiovascular disease
Kullo, Iftikhar J; Ding, Keyue
2007-01-01
Background The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. Results We calculated FST for 15,559 common SNPs (minor allele frequency ≥ 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. Conclusion These results suggest a possible basis for varying susceptibility to CVD among ethnic groups. PMID:17626638
Aspler, Anne L; Bolshin, Carly; Vernon, Suzanne D; Broderick, Gordon
2008-09-26
Genomic profiling of peripheral blood reveals altered immunity in chronic fatigue syndrome (CFS) however interpretation remains challenging without immune demographic context. The object of this work is to identify modulation of specific immune functional components and restructuring of co-expression networks characteristic of CFS using the quantitative genomics of peripheral blood. Gene sets were constructed a priori for CD4+ T cells, CD8+ T cells, CD19+ B cells, CD14+ monocytes and CD16+ neutrophils from published data. A group of 111 women were classified using empiric case definition (U.S. Centers for Disease Control and Prevention) and unsupervised latent cluster analysis (LCA). Microarray profiles of peripheral blood were analyzed for expression of leukocyte-specific gene sets and characteristic changes in co-expression identified from topological evaluation of linear correlation networks. Median expression for a set of 6 genes preferentially up-regulated in CD19+ B cells was significantly lower in CFS (p = 0.01) due mainly to PTPRK and TSPAN3 expression. Although no other gene set was differentially expressed at p < 0.05, patterns of co-expression in each group differed markedly. Significant co-expression of CD14+ monocyte with CD16+ neutrophil (p = 0.01) and CD19+ B cell sets (p = 0.00) characterized CFS and fatigue phenotype groups. Also in CFS was a significant negative correlation between CD8+ and both CD19+ up-regulated (p = 0.02) and NK gene sets (p = 0.08). These patterns were absent in controls. Dissection of blood microarray profiles points to B cell dysfunction with coordinated immune activation supporting persistent inflammation and antibody-mediated NK cell modulation of T cell activity. This has clinical implications as the CD19+ genes identified could provide robust and biologically meaningful basis for the early detection and unambiguous phenotyping of CFS.
Lira-Albarrán, Saúl; Durand, Marta; Barrera, David; Vega, Claudia; Becerra, Rocio García; Díaz, Lorenza; García-Quiroz, Janice; Rangel, Claudia; Larrea, Fernando
2018-04-27
In order to get further information on the effects of ulipristal acetate (UPA) upon the process of decidualization of endometrium, a functional analysis of the differentially expressed genes in endometrium (DEG) from UPA treated-versus control-cycles of normal ovulatory women was performed. A list of 1183 endometrial DEG, from a previously published study by our group, was submitted to gene ontology, gene enrichment and ingenuity pathway analyses (IPA). This functional analysis showed that decidualization was a biological process overrepresented. Gene set enrichment analysis identified LIF, PRL, IL15 and STAT3 among the most down-regulated genes within the JAK STAT canonical pathway. IPA showed that decidualization of uterus was a bio-function predicted as inhibited by UPA. The results demonstrated that this selective progesterone receptor modulator, when administered during the periovulatory phase of the menstrual cycle, may affect the molecular mechanisms leading to endometrial decidualization in response to progesterone during the period of maximum embryo receptivity. Copyright © 2018 Elsevier B.V. All rights reserved.
Intragenic DNA methylation prevents spurious transcription initiation.
Neri, Francesco; Rapelli, Stefania; Krepelova, Anna; Incarnato, Danny; Parlato, Caterina; Basile, Giulia; Maldotti, Mara; Anselmi, Francesca; Oliviero, Salvatore
2017-03-02
In mammals, DNA methylation occurs mainly at CpG dinucleotides. Methylation of the promoter suppresses gene expression, but the functional role of gene-body DNA methylation in highly expressed genes has yet to be clarified. Here we show that, in mouse embryonic stem cells, Dnmt3b-dependent intragenic DNA methylation protects the gene body from spurious RNA polymerase II entry and cryptic transcription initiation. Using different genome-wide approaches, we demonstrate that this Dnmt3b function is dependent on its enzymatic activity and recruitment to the gene body by H3K36me3. Furthermore, the spurious transcripts can either be degraded by the RNA exosome complex or capped, polyadenylated, and delivered to the ribosome to produce aberrant proteins. Elongating RNA polymerase II therefore triggers an epigenetic crosstalk mechanism that involves SetD2, H3K36me3, Dnmt3b and DNA methylation to ensure the fidelity of gene transcription initiation, with implications for intragenic hypomethylation in cancer.
Systematic identification and analysis of frequent gene fusion events in metabolic pathways
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henry, Christopher S.; Lerma-Ortiz, Claudia; Gerdes, Svetlana Y.
Here, gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. As a result, here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These setsmore » were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. In conclusion, customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function.« less
Systematic identification and analysis of frequent gene fusion events in metabolic pathways
Henry, Christopher S.; Lerma-Ortiz, Claudia; Gerdes, Svetlana Y.; ...
2016-06-24
Here, gene fusions are the most powerful type of in silico-derived functional associations. However, many fusion compilations were made when <100 genomes were available, and algorithms for identifying fusions need updating to handle the current avalanche of sequenced genomes. The availability of a large fusion dataset would help probe functional associations and enable systematic analysis of where and why fusion events occur. As a result, here we present a systematic analysis of fusions in prokaryotes. We manually generated two training sets: (i) 121 fusions in the model organism Escherichia coli; (ii) 131 fusions found in B vitamin metabolism. These setsmore » were used to develop a fusion prediction algorithm that captured the training set fusions with only 7 % false negatives and 50 % false positives, a substantial improvement over existing approaches. This algorithm was then applied to identify 3.8 million potential fusions across 11,473 genomes. The results of the analysis are available in a searchable database. A functional analysis identified 3,000 reactions associated with frequent fusion events and revealed areas of metabolism where fusions are particularly prevalent. In conclusion, customary definitions of fusions were shown to be ambiguous, and a stricter one was proposed. Exploring the genes participating in fusion events showed that they most commonly encode transporters, regulators, and metabolic enzymes. The major rationales for fusions between metabolic genes appear to be overcoming pathway bottlenecks, avoiding toxicity, controlling competing pathways, and facilitating expression and assembly of protein complexes. Finally, our fusion dataset provides powerful clues to decipher the biological activities of domains of unknown function.« less
Mouse Genome Database: From sequence to phenotypes and disease models
Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.
2015-01-01
Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326
An extended set of yeast-based functional assays accurately identifies human disease mutations
Sun, Song; Yang, Fan; Tan, Guihong; Costanzo, Michael; Oughtred, Rose; Hirschman, Jodi; Theesfeld, Chandra L.; Bansal, Pritpal; Sahni, Nidhi; Yi, Song; Yu, Analyn; Tyagi, Tanya; Tie, Cathy; Hill, David E.; Vidal, Marc; Andrews, Brenda J.; Boone, Charles; Dolinski, Kara; Roth, Frederick P.
2016-01-01
We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods. PMID:26975778
Ponce-Toledo, Rafael I; Moreira, David; López-García, Purificación; Deschamps, Philippe
2018-06-19
Endosymbiosis has been common all along eukaryotic evolution, providing opportunities for genomic and organellar innovation. Plastids are a prominent example. After the primary endosymbiosis of the cyanobacterial plastid ancestor, photosynthesis spread in many eukaryotic lineages via secondary endosymbioses involving red or green algal endosymbionts and diverse heterotrophic hosts. However, the number of secondary endosymbioses and how they occurred remain poorly understood. In particular, contrasting patterns of endosymbiotic gene transfer (EGT) have been detected and subjected to various interpretations. In this context, accurate detection of EGTs is essential to avoid wrong evolutionary conclusions. We have assembled a strictly selected set of markers that provides robust phylogenomic evidence suggesting that nuclear genes involved in the function and maintenance of green secondary plastids in chlorarachniophytes and euglenids have unexpected mixed red and green algal origins. This mixed ancestry contrasts with the clear red algal origin of most nuclear genes carrying similar functions in secondary algae with red plastids.
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
NASA Astrophysics Data System (ADS)
Mittal, Shikha; Banduni, Pooja; Mallikarjuna, Mallana G.; Rao, Atmakuri R.; Jain, Prashant A.; Dash, Prasanta K.; Thirunavukkarasu, Nepolean
2018-05-01
Drought is one of the major threats to maize production. In order to improve the production and to breed tolerant hybrids, understanding the genes and regulatory mechanisms during drought stress is important. Transcription factors (TFs) play a major role in gene regulation and many TFs have been identified in response to drought stress. In our experiment, a set of 15 major TF families comprising 1436 genes was structurally and functionally characterized using in-silico tools and a gene expression assay. All 1436 genes were mapped on 10 chromosome of maize. The functional annotation indicated the involvement of these genes in ABA signaling, ROS scavenging, photosynthesis, stomatal regulation, and sucrose metabolism. Duplication was identified as the primary force in divergence and expansion of TF families. Phylogenetic relationship was developed individually for each TF family as well as combined TF families. Phylogenetic analysis grouped the TF family of genes into TF-specific and mixed groups. Phylogenetic analysis of genes belonging to various TF families suggested that the origin of TFs occurred in the lineage of maize evolution. Gene structure analysis revealed that more number of genes were intron-rich as compared to intronless genes. Drought-responsive CRE’s such as ABREA, ABREB, DRE1 and DRECRTCOREAT have been identified. Expression and interaction analyses identified leaf-specific bZIP TF, GRMZM2G140355, as a potential contributor toward drought tolerance in maize. We also analyzed protein-protein interaction network of 269 drought-responsive genes belonging to different drought-related TFs. The information generated on structural and functional characteristics, expression and interaction of the drought-related TF families will be useful to decipher the drought tolerance mechanisms and to derive drought-tolerant genotypes in maize.
Maciejowski, John; Ahn, James Hyungsoo; Cipriani, Patricia Giselle; Killian, Darrell J.; Chaudhary, Aisha L.; Lee, Ji Inn; Voutev, Roumen; Johnsen, Robert C.; Baillie, David L.; Gunsalus, Kristin C.; Fitch, David H. A.; Hubbard, E. Jane Albert
2005-01-01
We report molecular genetic studies of three genes involved in early germ-line proliferation in Caenorhabditis elegans that lend unexpected insight into a germ-line/soma functional separation of autosomal/X-linked duplicated gene pairs. In a genetic screen for germ-line proliferation-defective mutants, we identified mutations in rpl-11.1 (L11 protein of the large ribosomal subunit), pab-1 [a poly(A)-binding protein], and glp-3/eft-3 (an elongation factor 1-α homolog). All three are members of autosome/X gene pairs. Consistent with a germ-line-restricted function of rpl-11.1 and pab-1, mutations in these genes extend life span and cause gigantism. We further examined the RNAi phenotypes of the three sets of rpl genes (rpl-11, rpl-24, and rpl-25) and found that for the two rpl genes with autosomal/X-linked pairs (rpl-11 and rpl-25), zygotic germ-line function is carried by the autosomal copy. Available RNAi results for highly conserved autosomal/X-linked gene pairs suggest that other duplicated genes may follow a similar trend. The three rpl and the pab-1/2 duplications predate the divergence between C. elegans and C. briggsae, while the eft-3/4 duplication appears to have occurred in the lineage to C. elegans after it diverged from C. briggsae. The duplicated C. briggsae orthologs of the three C. elegans autosomal/X-linked gene pairs also display functional differences between paralogs. We present hypotheses for evolutionary mechanisms that may underlie germ-line/soma subfunctionalization of duplicated genes, taking into account the role of X chromosome silencing in the germ line and analogous mammalian phenomena. PMID:15687263
Quiapim, Andréa C.; Brito, Michael S.; Bernardes, Luciano A.S.; daSilva, Idalete; Malavazi, Iran; DePaoli, Henrique C.; Molfetta-Machado, Jeanne B.; Giuliatti, Silvana; Goldman, Gustavo H.; Goldman, Maria Helena S.
2009-01-01
The success of plant reproduction depends on pollen-pistil interactions occurring at the stigma/style. These interactions vary depending on the stigma type: wet or dry. Tobacco (Nicotiana tabacum) represents a model of wet stigma, and its stigmas/styles express genes to accomplish the appropriate functions. For a large-scale study of gene expression during tobacco pistil development and preparation for pollination, we generated 11,216 high-quality expressed sequence tags (ESTs) from stigmas/styles and created the TOBEST database. These ESTs were assembled in 6,177 clusters, from which 52.1% are pistil transcripts/genes of unknown function. The 21 clusters with the highest number of ESTs (putative higher expression levels) correspond to genes associated with defense mechanisms or pollen-pistil interactions. The database analysis unraveled tobacco sequences homologous to the Arabidopsis (Arabidopsis thaliana) genes involved in specifying pistil identity or determining normal pistil morphology and function. Additionally, 782 independent clusters were examined by macroarray, revealing 46 stigma/style preferentially expressed genes. Real-time reverse transcription-polymerase chain reaction experiments validated the pistil-preferential expression for nine out of 10 genes tested. A search for these 46 genes in the Arabidopsis pistil data sets demonstrated that only 11 sequences, with putative equivalent molecular functions, are expressed in this dry stigma species. The reverse search for the Arabidopsis pistil genes in the TOBEST exposed a partial overlap between these dry and wet stigma transcriptomes. The TOBEST represents the most extensive survey of gene expression in the stigmas/styles of wet stigma plants, and our results indicate that wet and dry stigmas/styles express common as well as distinct genes in preparation for the pollination process. PMID:19052150
Wingo, Aliza P; Gibson, Greg
2015-01-01
Prospective epidemiological studies found that generalized anxiety disorder (GAD) can impair immune function and increase risk for cardiovascular disease or events. Mechanisms underlying the physiological reverberations of anxiety, however, are still elusive. Hence, we aimed to investigate molecular processes mediating effects of anxiety on physical health using blood gene expression profiles of 336 community participants (157 anxious and 179 control). We examined genome-wide differential gene expression in anxiety, as well as associations between nine major modules of co-regulated transcripts in blood gene expression and anxiety. No significant differential expression was observed in women, but 631 genes were differentially expressed between anxious and control men at the false discovery rate of 0.1 after controlling for age, body mass index, race, and batch effect. Gene set enrichment analysis (GSEA) revealed that genes with altered expression levels in anxious men were involved in response of various immune cells to vaccination and to acute viral and bacterial infection, and in a metabolic network affecting traits of metabolic syndrome. Further, we found one set of 260 co-regulated genes to be significantly associated with anxiety in men after controlling for the relevant covariates, and demonstrate its equivalence to a component of the stress-related conserved transcriptional response to adversity profile. Taken together, our results suggest potential molecular pathways that can explain negative effects of GAD observed in epidemiological studies. Remarkably, even mild anxiety, which most of our participants had, was associated with observable changes in immune-related gene expression levels. Our findings generate hypotheses and provide incremental insights into molecular mechanisms mediating negative physiological effects of GAD. Published by Elsevier Inc.
Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H
2014-04-11
Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.
Joshi, Anagha
2014-12-30
Transcriptional hotspots are defined as genomic regions bound by multiple factors. They have been identified recently as cell type specific enhancers regulating developmentally essential genes in many species such as worm, fly and humans. The in-depth analysis of hotspots across multiple cell types in same species still remains to be explored and can bring new biological insights. We therefore collected 108 transcription-related factor (TF) ChIP sequencing data sets in ten murine cell types and classified the peaks in each cell type in three groups according to binding occupancy as singletons (low-occupancy), combinatorials (mid-occupancy) and hotspots (high-occupancy). The peaks in the three groups clustered largely according to the occupancy, suggesting priming of genomic loci for mid occupancy irrespective of cell type. We then characterized hotspots for diverse structural functional properties. The genes neighbouring hotspots had a small overlap with hotspot genes in other cell types and were highly enriched for cell type specific function. Hotspots were enriched for sequence motifs of key TFs in that cell type and more than 90% of hotspots were occupied by pioneering factors. Though we did not find any sequence signature in the three groups, the H3K4me1 binding profile had bimodal peaks at hotspots, distinguishing hotspots from mono-modal H3K4me1 singletons. In ES cells, differentially expressed genes after perturbation of activators were enriched for hotspot genes suggesting hotspots primarily act as transcriptional activator hubs. Finally, we proposed that ES hotspots might be under control of SetDB1 and not DNMT for silencing. Transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes. In ES cells, they are predicted to act as transcriptional activator hubs and might be under SetDB1 control for silencing.
Selecting and validating reference genes for quantitative real-time PCR in Plutella xylostella (L.).
You, Yanchun; Xie, Miao; Vasseur, Liette; You, Minsheng
2018-05-01
Gene expression analysis provides important clues regarding gene functions, and quantitative real-time PCR (qRT-PCR) is a widely used method in gene expression studies. Reference genes are essential for normalizing and accurately assessing gene expression. In the present study, 16 candidate reference genes (ACTB, CyPA, EF1-α, GAPDH, HSP90, NDPk, RPL13a, RPL18, RPL19, RPL32, RPL4, RPL8, RPS13, RPS4, α-TUB, and β-TUB) from Plutella xylostella were selected to evaluate gene expression stability across different experimental conditions using five statistical algorithms (geNorm, NormFinder, Delta Ct, BestKeeper, and RefFinder). The results suggest that different reference genes or combinations of reference genes are suitable for normalization in gene expression studies of P. xylostella according to the different developmental stages, strains, tissues, and insecticide treatments. Based on the given experimental sets, the most stable reference genes were RPS4 across different developmental stages, RPL8 across different strains and tissues, and EF1-α across different insecticide treatments. A comprehensive and systematic assessment of potential reference genes for gene expression normalization is essential for post-genomic functional research in P. xylostella, a notorious pest with worldwide distribution and a high capacity to adapt and develop resistance to insecticides.
de Luis, Daniel Antonio; Almansa, Raquel; Aller, Rocío; Izaola, Olatz; Romero, E
2017-06-10
Understanding molecular basis involved in overweight is an important first step in developing therapeutic pathways against excess in body weight gain. The purpose of our pilot study was to evaluate the gene expression profiles in the peripheral blood of obese patients without other metabolic complications. A sample of 17 obese patients without metabolic syndrome and 15 non obese control subjects was evaluated in a prospective way. Following 'One-Color Microarray-Based Gene Expression Analysis' protocol Version 5.7 (Agilent p/n 4140-90040), cRNA was hybridized with Whole Human Genome Oligo Microarray Kit (Agilent p/n G2519F-014850) containing 41,000+ unique human genes and transcripts. The average age of the study group was 43.6 ± 19.7 years with a sex distribution of 64.7% females and 35.3% males. No statistical differences were detected with healthy controls 41.9 ± 12.3 years with a sex distribution of 70% females and 30% males. Obese patients showed 1436 genes that were differentially expressed compared to control group. Ingenuity Pathway Analysis showed that these genes participated in 13 different categories related to metabolism and cellular functions. In the gene set of cellular function, the most important genes were C-terminal region of Nel-like molecule 1 protein (NELL1) and Pigment epithelium-derived factor (SPEDF), both genes were over-expressed. In the gene set of metabolism, insulin growth factor type 1 (IGF1), ApoA5 (apolipoprotein subtype 5), Foxo4 (Forkhead transcription factor 4), ADIPOR1 (receptor of adiponectin type 1) and AQP7 (aquaporin channel proteins7) were over expressed. Moreover, PIKFYVE (PtdIns(3) P 5-kinase), and ROCK-2 (rho-kinase II) were under expressed. We showed that PBMCs from obese subjects presented significant changes in gene expression, exhibiting 1436 differentially expressed genes compared to PBMCs from non-obese subjects. Furthermore, our data showed a number of genes involved in relevant processes implicated in metabolism, with genes presenting high fold-change values (up-regulation and down regulation) associated with lipid, carbohydrate and protein metabolism. Copyright © 2017 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Dissecting gene expression at the blood-brain barrier
Huntley, Melanie A.; Bien-Ly, Nga; Daneman, Richard; Watts, Ryan J.
2014-01-01
The availability of genome-wide expression data for the blood-brain barrier is an invaluable resource that has recently enabled the discovery of several genes and pathways involved in the development and maintenance of the blood-brain barrier, particularly in rodent models. The broad distribution of published data sets represents a viable starting point for the molecular dissection of the blood-brain barrier and will further direct the discovery of novel mechanisms of blood-brain barrier formation and function. Technical advances in purifying brain endothelial cells, the key cell that forms the critical barrier, have allowed for greater specificity in gene expression comparisons with other central nervous system cell types, and more systematic characterizations of the molecular composition of the blood-brain barrier. Nevertheless, our understanding of how the blood-brain barrier changes during aging and disease is underrepresented. Blood-brain barrier data sets from a wider range of experimental paradigms and species, including invertebrates and primates, would be invaluable for investigating the function and evolution of the blood-brain barrier. Newer technologies in gene expression profiling, such as RNA-sequencing, now allow for finer resolution of transcriptomic changes, including isoform specificity and RNA-editing. As our field continues to utilize more advanced expression profiling in its ongoing efforts to elucidate the blood-brain barrier, including in disease and drug delivery, we will continue to see rapid advances in our understanding of the molecular mediators of barrier biology. We predict that the recently published data sets, combined with forthcoming genomic and proteomic blood-brain barrier data sets, will continue to fuel the molecular genetic revolution of blood-brain barrier biology. PMID:25414634
Baker, Lauren A.; Kirkpatrick, Brian; Rosa, Guilherme J. M.; Gianola, Daniel; Valente, Bruno; Sumner, Julia P.; Baltzer, Wendy; Hao, Zhengling; Binversie, Emily E.; Volstad, Nicola; Piazza, Alexander; Sample, Susannah J.
2017-01-01
Anterior cruciate ligament (ACL) rupture is a common condition that can be devastating and life changing, particularly in young adults. A non-contact mechanism is typical. Second ACL ruptures through rupture of the contralateral ACL or rupture of a graft repair is also common. Risk of rupture is increased in females. ACL rupture is also common in dogs. Disease prevalence exceeds 5% in several dog breeds, ~100 fold higher than human beings. We provide insight into the genetic etiology of ACL rupture by genome-wide association study (GWAS) in a high-risk breed using 98 case and 139 control Labrador Retrievers. We identified 129 single nucleotide polymorphisms (SNPs) within 99 risk loci. Associated loci (P<5E-04) explained approximately half of phenotypic variance in the ACL rupture trait. Two of these loci were located in uncharacterized or non-coding regions of the genome. A chromosome 24 locus containing nine genes with diverse functions met genome-wide significance (P = 3.63E-0.6). GWAS pathways were enriched for c-type lectins, a gene set that includes aggrecan, a gene set encoding antimicrobial proteins, and a gene set encoding membrane transport proteins with a variety of physiological functions. Genotypic risk estimated for each dog based on the risk contributed by each GWAS locus showed clear separation of ACL rupture cases and controls. Power analysis of the GWAS data set estimated that ~172 loci explain the genetic contribution to ACL rupture in the Labrador Retriever. Heritability was estimated at 0.48. We conclude ACL rupture is a moderately heritable highly polygenic complex trait. Our results implicate c-type lectin pathways in ACL homeostasis. PMID:28379989
Down-weighting overlapping genes improves gene set analysis
2012-01-01
Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124
The gene and the genon concept: a functional and information-theoretic analysis
Scherrer, Klaus; Jost, Jürgen
2007-01-01
‘Gene' has become a vague and ill-defined concept. To set the stage for mathematical analysis of gene storage and expression, we return to the original concept of the gene as a function encoded in the genome, basis of genetic analysis, that is a polypeptide or other functional product. The additional information needed to express a gene is contained within each mRNA as an ensemble of signals, added to or superimposed onto the coding sequence. To designate this programme, we introduce the term ‘genon'. Individual genons are contained in the pre-mRNA forming a pre-genon. A genomic domain contains a proto-genon, with the signals of transcription activation in addition to the pre-genon in the transcripts. Some contain several mRNAs and hence genons, to be singled out by RNA processing and differential splicing. The programme in the genon in cis is implemented by corresponding factors of protein or RNA nature contained in the transgenon of the cell or organism. The gene, the cis programme contained in the individual domain and transcript, and the trans programme of factors, can be analysed by information theory. PMID:17353929
Update of the human and mouse Fanconi anemia genes.
Dong, Hongbin; Nebert, Daniel W; Bruford, Elspeth A; Thompson, David C; Joenje, Hans; Vasiliou, Vasilis
2015-11-24
Fanconi anemia (FA) is a recessively inherited disease manifesting developmental abnormalities, bone marrow failure, and increased risk of malignancies. Whereas FA has been studied for nearly 90 years, only in the last 20 years have increasing numbers of genes been implicated in the pathogenesis associated with this genetic disease. To date, 19 genes have been identified that encode Fanconi anemia complementation group proteins, all of which are named or aliased, using the root symbol "FANC." Fanconi anemia subtype (FANC) proteins function in a common DNA repair pathway called "the FA pathway," which is essential for maintaining genomic integrity. The various FANC mutant proteins contribute to distinct steps associated with FA pathogenesis. Herein, we provide a review update of the 19 human FANC and their mouse orthologs, an evolutionary perspective on the FANC genes, and the functional significance of the FA DNA repair pathway in association with clinical disorders. This is an example of a set of genes--known to exist in vertebrates, invertebrates, plants, and yeast--that are grouped together on the basis of shared biochemical and physiological functions, rather than evolutionary phylogeny, and have been named on this basis by the HUGO Gene Nomenclature Committee (HGNC).
Aspirin exposure reveals novel genes associated with platelet function and cardiovascular events.
Voora, Deepak; Cyr, Derek; Lucas, Joseph; Chi, Jen-Tsan; Dungan, Jennifer; McCaffrey, Timothy A; Katz, Richard; Newby, L Kristin; Kraus, William E; Becker, Richard C; Ortel, Thomas L; Ginsburg, Geoffrey S
2013-10-01
The aim of this study was to develop ribonucleic acid (RNA) profiles that could serve as novel biomarkers for the response to aspirin. Aspirin reduces death and myocardial infarction (MI), suggesting that aspirin interacts with biological pathways that may underlie these events. Aspirin was administered, followed by whole-blood RNA microarray profiling, in a discovery cohort of healthy volunteers (HV1) (n = 50) and 2 validation cohorts of healthy volunteers (HV2) (n = 53) and outpatient cardiology patients (OPC) (n = 25). Platelet function was assessed using the platelet function score (PFS) in HV1 and HV2 and the VerifyNow Aspirin Test (Accumetrics, Inc., San Diego, California) in OPC. Bayesian sparse factor analysis identified sets of coexpressed transcripts, which were examined for associations with PFS in HV1 and validated in HV2 and OPC. Proteomic analysis confirmed the association of validated transcripts in platelet proteins. Validated gene sets were tested for association with death or MI in 2 patient cohorts (n = 587 total) from RNA samples collected at cardiac catheterization. A set of 60 coexpressed genes named the "aspirin response signature" (ARS) was associated with PFS in HV1 (r = -0.31, p = 0.03), HV2 (r = -0.34, Bonferroni p = 0.03), and OPC (p = 0.046). Corresponding proteins for the 17 ARS genes were identified in the platelet proteome, of which 6 were associated with PFS. The ARS was associated with death or MI in both patient cohorts (odds ratio: 1.2 [p = 0.01]; hazard ratio: 1.5 [p = 0.001]), independent of cardiovascular risk factors. Compared with traditional risk factors, reclassification (net reclassification index = 31% to 37%, p ≤ 0.0002) was improved by including the ARS or 1 of its genes, ITGA2B. RNA profiles of platelet-specific genes are novel biomarkers for identifying patients who do not respond adequately to aspirin and who are at risk for death or MI. Copyright © 2013 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Novel gene sets improve set-level classification of prokaryotic gene expression data.
Holec, Matěj; Kuželka, Ondřej; Železný, Filip
2015-10-28
Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
Tripathi, Binu M; Moroenyane, Itumeleng; Sherman, Chen; Lee, Yoo Kyung; Adams, Jonathan M; Steinberger, Yosef
2017-07-01
The soil microbiome is important for the functioning of terrestrial ecosystems. However, the impacts of climate on taxonomic and functional diversity of soil microbiome are not well understood. A precipitation gradient along regional scale transects may offer a model setting for understanding the effect of climate on the composition and function of the soil microbiome. Here, we compared taxonomic and functional attributes of soil microorganisms in arid, semiarid, Mediterranean, and humid Mediterranean climatic conditions of Israel using shotgun metagenomic sequencing. We hypothesized that there would be a distinct taxonomic and functional soil community for each precipitation zone, with arid environments having lower taxonomic and functional diversity, greater relative abundance of stress response and sporulation-related genes, and lower relative abundance of genes related to nutrient cycling and degradation of complex organic compounds. As hypothesized, our results showed a distinct taxonomic and functional community in each precipitation zone, revealing differences in soil taxonomic and functional selection in the different climates. Although the taxonomic diversity remained similar across all sites, the functional diversity was-as hypothesized-lower in the arid environments, suggesting that functionality is more constrained in "extreme" environments. Also, with increasing aridity, we found a significant increase in genes related to dormancy/sporulation and a decrease in those related to nutrient cycling (genes related to nitrogen, potassium, and sulfur metabolism), respectively. However, relative abundance of genes related to stress response were lower in arid soils. Overall, these results indicate that climatic conditions play an important role in shaping taxonomic and functional attributes of soil microbiome. These findings have important implications for understanding the impacts of climate change (e.g., precipitation change) on structure and function of the soil microbiome.
Discrimination learning and attentional set formation in a mouse model of Fragile X.
Casten, Kimberly S; Gray, Annette C; Burwell, Rebecca D
2011-06-01
Fragile X Syndrome is the most prevalent genetic cause of mental retardation. Selective deficits in executive function, including inhibitory control and attention, are core features of the disorder. In humans, Fragile X results from a trinucleotide repeat in the Fmr1 gene that renders it functionally silent and has been modeled in mice by targeted deletion of the Fmr1 gene. Fmr1 knockout (KO) mice recapitulate many features of Fragile X syndrome, but evidence for deficits in executive function is inconsistent. To address this issue, we trained wild-type and Fmr1 KO mice on an experimental paradigm that assesses attentional set-shifting. Mice learned to discriminate between stimuli differing in two of three perceptual dimensions. Successful discrimination required attending only to the relevant dimension, while ignoring irrelevant dimensions. Mice were trained on three discriminations in the same perceptual dimension, each followed by a reversal. This procedure normally results in the formation of an attentional set to the relevant dimension. Mice were then required to shift attention and discriminate based on a previously irrelevant perceptual dimension. Wild-type mice exhibited the increase in trials to criterion expected when shifting attention from one perceptual dimension to another. In contrast, the Fmr1 KO group failed to show the expected increase, suggesting impairment in forming an attentional set. Fmr1 KO mice also exhibited a general impairment in learning discriminations and reversals. This is the first demonstration that Fmr1 KO mice show a deficit in attentional set formation.
Identifying a gene expression signature of cluster headache in blood
Eising, Else; Pelzer, Nadine; Vijfhuizen, Lisanne S.; Vries, Boukje de; Ferrari, Michel D.; ‘t Hoen, Peter A. C.; Terwindt, Gisela M.; van den Maagdenberg, Arn M. J. M.
2017-01-01
Cluster headache is a relatively rare headache disorder, typically characterized by multiple daily, short-lasting attacks of excruciating, unilateral (peri-)orbital or temporal pain associated with autonomic symptoms and restlessness. To better understand the pathophysiology of cluster headache, we used RNA sequencing to identify differentially expressed genes and pathways in whole blood of patients with episodic (n = 19) or chronic (n = 20) cluster headache in comparison with headache-free controls (n = 20). Gene expression data were analysed by gene and by module of co-expressed genes with particular attention to previously implicated disease pathways including hypocretin dysregulation. Only moderate gene expression differences were identified and no associations were found with previously reported pathogenic mechanisms. At the level of functional gene sets, associations were observed for genes involved in several brain-related mechanisms such as GABA receptor function and voltage-gated channels. In addition, genes and modules of co-expressed genes showed a role for intracellular signalling cascades, mitochondria and inflammation. Although larger study samples may be required to identify the full range of involved pathways, these results indicate a role for mitochondria, intracellular signalling and inflammation in cluster headache. PMID:28074859
2010-01-01
Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
Gold kiwifruit ( Actinidia chinensis 'Hort16A') for immune support.
Skinner, Margot A; Loh, Jacelyn M S; Hunter, Denise C; Zhang, Jingli
2011-05-01
Kiwifruit is a good source of several vitamins and minerals and dietary fibre, and contains a number of phytochemicals; so kiwifruit potentially provides health benefits beyond basic nutrition. Consumption of green kiwifruit can have positive effects on cardiovascular health through antioxidant activity, inhibition of platelet aggregation and lowered TAG levels, and gut health through improving laxation, aiding digestion and promoting a healthy gut microflora. The importance of nutrition on immune function is well recognised, with deficiencies in vitamins A, C, E, B6 and B12, folic acid, Zn, Cu, Fe and Se being associated with impaired immune function and increased susceptibility to diseases. Evidence is growing that kiwifruit enhances immunity, with several small murine studies showing enhancement of innate and adaptive immune function. Few studies have examined the effect of kiwifruit on immune function in human subjects, but a recent study has revealed that kiwifruit up-regulates several 'immune' and 'DNA and repair'-related gene sets, and down-regulates one gene set related to Ig secretion. Taken together, the evidence from the literature provides supporting data for designing a human intervention trial to validate the ability of kiwifruit to support immune function in healthy and immunocompromised populations.
Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia
2009-01-01
Background microRNAs (miRNAs) are single-stranded RNA molecules of about 20–23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. Results GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. Conclusion GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA. PMID:19534746
Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia
2009-06-16
microRNAs (miRNAs) are single-stranded RNA molecules of about 20-23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA.
Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying
2011-08-01
The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
Borowsky, Alexander T.
2017-01-01
Plants produce diverse specialized metabolites (SMs), but the genes responsible for their production and regulation remain largely unknown, hindering efforts to tap plant pharmacopeia. Given that genes comprising SM pathways exhibit environmentally dependent coregulation, we hypothesized that genes within a SM pathway would form tight associations (modules) with each other in coexpression networks, facilitating their identification. To evaluate this hypothesis, we used 10 global coexpression data sets, each a meta-analysis of hundreds to thousands of experiments, across eight plant species to identify hundreds of coexpressed gene modules per data set. In support of our hypothesis, 15.3 to 52.6% of modules contained two or more known SM biosynthetic genes, and module genes were enriched in SM functions. Moreover, modules recovered many experimentally validated SM pathways, including all six known to form biosynthetic gene clusters (BGCs). In contrast, bioinformatically predicted BGCs (i.e., those lacking an associated metabolite) were no more coexpressed than the null distribution for neighboring genes. These results suggest that most predicted plant BGCs are not genuine SM pathways and argue that BGCs are not a hallmark of plant specialized metabolism. We submit that global gene coexpression is a rich, largely untapped resource for discovering the genetic basis and architecture of plant natural products. PMID:28408660