Sample records for identified gene sets

  1. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    PubMed

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10 -10 ), MGC57346 (p value=6.92×10 -7 ), BLK (p value=1.01×10 -6 ), XKR6 (p value=1.11×10 -6 ), C17ORF69 (p value=1.12×10 -6 ) and KIAA1267 (p value=4.00×10 -6 ). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  2. The Gene Set Builder: collation, curation, and distribution of sets of genes

    PubMed Central

    Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W

    2005-01-01

    Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of

  3. Gene-set analysis based on the pharmacological profiles of drugs to identify repurposing opportunities in schizophrenia.

    PubMed

    de Jong, Simone; Vidler, Lewis R; Mokrab, Younes; Collier, David A; Breen, Gerome

    2016-08-01

    Genome-wide association studies (GWAS) have identified thousands of novel genetic associations for complex genetic disorders, leading to the identification of potential pharmacological targets for novel drug development. In schizophrenia, 108 conservatively defined loci that meet genome-wide significance have been identified and hundreds of additional sub-threshold associations harbour information on the genetic aetiology of the disorder. In the present study, we used gene-set analysis based on the known binding targets of chemical compounds to identify the 'drug pathways' most strongly associated with schizophrenia-associated genes, with the aim of identifying potential drug repositioning opportunities and clues for novel treatment paradigms, especially in multi-target drug development. We compiled 9389 gene sets (2496 with unique gene content) and interrogated gene-based p-values from the PGC2-SCZ analysis. Although no single drug exceeded experiment wide significance (corrected p<0.05), highly ranked gene-sets reaching suggestive significance including the dopamine receptor antagonists metoclopramide and trifluoperazine and the tyrosine kinase inhibitor neratinib. This is a proof of principle analysis showing the potential utility of GWAS data of schizophrenia for the direct identification of candidate drugs and molecules that show polypharmacy. © The Author(s) 2016.

  4. Genome-Wide Temporal Expression Profiling in Caenorhabditis elegans Identifies a Core Gene Set Related to Long-Term Memory.

    PubMed

    Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila

    2017-07-12

    The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets. SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.

  5. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for

  6. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

    PubMed Central

    Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe

    2015-01-01

    Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374

  7. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  8. Novel gene sets improve set-level classification of prokaryotic gene expression data.

    PubMed

    Holec, Matěj; Kuželka, Ondřej; Železný, Filip

    2015-10-28

    Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

  9. Transcriptional differences between normal and glioma-derived glial progenitor cells identify a core set of dysregulated genes.

    PubMed

    Auvergne, Romane M; Sim, Fraser J; Wang, Su; Chandler-Militello, Devin; Burch, Jaclyn; Al Fanek, Yazan; Davis, Danielle; Benraiss, Abdellatif; Walter, Kevin; Achanta, Pragathi; Johnson, Mahlon; Quinones-Hinojosa, Alfredo; Natesan, Sridaran; Ford, Heide L; Goldman, Steven A

    2013-06-27

    Glial progenitor cells (GPCs) are a potential source of malignant gliomas. We used A2B5-based sorting to extract tumorigenic GPCs from human gliomas spanning World Health Organization grades II-IV. Messenger RNA profiling identified a cohort of genes that distinguished A2B5+ glioma tumor progenitor cells (TPCs) from A2B5+ GPCs isolated from normal white matter. A core set of genes and pathways was substantially dysregulated in A2B5+ TPCs, which included the transcription factor SIX1 and its principal cofactors, EYA1 and DACH2. Small hairpin RNAi silencing of SIX1 inhibited the expansion of glioma TPCs in vitro and in vivo, suggesting a critical and unrecognized role of the SIX1-EYA1-DACH2 system in glioma genesis or progression. By comparing the expression patterns of glioma TPCs with those of normal GPCs, we have identified a discrete set of pathways by which glial tumorigenesis may be better understood and more specifically targeted. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

    PubMed

    Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

    2014-01-01

    Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can

  11. A statistical approach to identify, monitor, and manage incomplete curated data sets.

    PubMed

    Howe, Douglas G

    2018-04-02

    Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to

  12. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  13. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  14. Down-weighting overlapping genes improves gene set analysis

    PubMed Central

    2012-01-01

    Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124

  15. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  16. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-08

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  18. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGES

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  19. Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets.

    PubMed

    Park, Inho; Lee, Kwang H; Lee, Doheon

    2010-06-15

    Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. Supplementary data are available at Bioinformatics online.

  20. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  1. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome.

    PubMed

    Morine, Melissa J; McMonagle, Jolene; Toomey, Sinead; Reynolds, Clare M; Moloney, Aidan P; Gormley, Isobel C; Gaora, Peadar O; Roche, Helen M

    2010-10-07

    constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect. Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease.

  2. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome

    PubMed Central

    2010-01-01

    -sensitive changes in constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect. Conclusion Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease. PMID:20929581

  3. The essential gene set of a photosynthetic organism

    DOE PAGES

    Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.; ...

    2015-10-27

    Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ~250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism's 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlapmore » with wellconserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNA Leu , which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism's physiology and defines the essential gene set required for the growth of a photosynthetic organism.« less

  4. CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

    PubMed Central

    Hestand, Matthew S; van Galen, Michiel; Villerius, Michel P; van Ommen, Gert-Jan B; den Dunnen, Johan T; 't Hoen, Peter AC

    2008-01-01

    Background The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. Results We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFACR database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. Conclusion The program CORE_TF is accessible in a user friendly web interface at . It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites. PMID:19036135

  5. Learning contextual gene set interaction networks of cancer with condition specificity

    PubMed Central

    2013-01-01

    Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further

  6. Phylogenetics and evolution of Trx SET genes in fully sequenced land plants.

    PubMed

    Zhu, Xinyu; Chen, Caoyi; Wang, Baohua

    2012-04-01

    Plant Trx SET proteins are involved in H3K4 methylation and play a key role in plant floral development. Genes encoding Trx SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history of the Trx SET gene family, we made a comprehensive evolutionary analysis on this gene family from 13 major representatives of green plants. A novel clustering (here named as cpTrx clade), which included the III-1, III-2, and III-4 orthologous groups, previously resolved was identified. Our analysis showed that plant Trx proteins possessed a variety of domain organizations and gene structures among paralogs. Additional domains such as PHD, PWWP, and FYR were early integrated into primordial SET-PostSET domain organization of cpTrx clade. We suggested that the PostSET domain was lost in some members of III-4 orthologous group during the evolution of land plants. At least four classes of gene structures had been formed at the early evolutionary stage of land plants. Three intronless orphan Trx SET genes from the Physcomitrella patens (moss) were identified, and supposedly, their parental genes have been eliminated from the genome. The structural differences among evolutionary groups of plant Trx SET genes with different functions were described, contributing to the design of further experimental studies.

  7. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    PubMed

    Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin

    2011-04-14

    High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.

  8. ADAGE signature analysis: differential expression analysis with data-defined gene sets.

    PubMed

    Tan, Jie; Huyck, Matthew; Hu, Dongbo; Zelaya, René A; Hogan, Deborah A; Greene, Casey S

    2017-11-22

    Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. We designed

  9. Genome-wide DNA methylation profile identified a unique set of differentially methylated immune genes in oral squamous cell carcinoma patients in India.

    PubMed

    Basu, Baidehi; Chakraborty, Joyeeta; Chandra, Aditi; Katarkar, Atul; Baldevbhai, Jadav Ritesh Kumar; Dhar Chowdhury, Debjit; Ray, Jay Gopal; Chaudhuri, Keya; Chatterjee, Raghunath

    2017-01-01

    Oral squamous cell carcinoma (OSCC) is one of the common malignancies in Southeast Asia. Epigenetic changes, mainly the altered DNA methylation, have been implicated in many cancers. Considering the varied environmental and genotoxic exposures among the Indian population, we conducted a genome-wide DNA methylation study on paired tumor and adjacent normal tissues of ten well-differentiated OSCC patients and validated in an additional 53 well-differentiated OSCC and adjacent normal samples. Genome-wide DNA methylation analysis identified several novel differentially methylated regions associated with OSCC. Hypermethylation is primarily enriched in the CpG-rich regions, while hypomethylation is mainly in the open sea. Distinct epigenetic drifts for hypo- and hypermethylation across CpG islands suggested independent mechanisms of hypo- and hypermethylation in OSCC development. Aberrant DNA methylation in the promoter regions are concomitant with gene expression. Hypomethylation of immune genes reflect the lymphocyte infiltration into the tumor microenvironment. Comparison of methylome data with 312 TCGA HNSCC samples identified a unique set of hypomethylated promoters among the OSCC patients in India. Pathway analysis of unique hypomethylated promoters indicated that the OSCC patients in India induce an anti-tumor T cell response, with mobilization of T lymphocytes in the neoplastic environment. Survival analysis of these epigenetically regulated immune genes suggested their prominent role in OSCC progression. Our study identified a unique set of hypomethylated regions, enriched in the promoters of immune response genes, and indicated the presence of a strong immune component in the tumor microenvironment. These methylation changes may serve as potential molecular markers to define risk and to monitor the prognosis of OSCC patients in India.

  10. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  11. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  12. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.

    PubMed

    Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A

    2013-01-29

    pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  13. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  14. The limitations of simple gene set enrichment analysis assuming gene independence.

    PubMed

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.

  15. A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes

    PubMed Central

    Liu, Jian; Liu, Jin-Xing; Gao, Ying-Lian; Kong, Xiang-Zhen; Wang, Xue-Song; Wang, Dong

    2015-01-01

    In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data. PMID:26201006

  16. A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes.

    PubMed

    Liu, Jian; Liu, Jin-Xing; Gao, Ying-Lian; Kong, Xiang-Zhen; Wang, Xue-Song; Wang, Dong

    2015-01-01

    In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data.

  17. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878

  18. Identifying a gene expression signature of cluster headache in blood

    PubMed Central

    Eising, Else; Pelzer, Nadine; Vijfhuizen, Lisanne S.; Vries, Boukje de; Ferrari, Michel D.; ‘t Hoen, Peter A. C.; Terwindt, Gisela M.; van den Maagdenberg, Arn M. J. M.

    2017-01-01

    Cluster headache is a relatively rare headache disorder, typically characterized by multiple daily, short-lasting attacks of excruciating, unilateral (peri-)orbital or temporal pain associated with autonomic symptoms and restlessness. To better understand the pathophysiology of cluster headache, we used RNA sequencing to identify differentially expressed genes and pathways in whole blood of patients with episodic (n = 19) or chronic (n = 20) cluster headache in comparison with headache-free controls (n = 20). Gene expression data were analysed by gene and by module of co-expressed genes with particular attention to previously implicated disease pathways including hypocretin dysregulation. Only moderate gene expression differences were identified and no associations were found with previously reported pathogenic mechanisms. At the level of functional gene sets, associations were observed for genes involved in several brain-related mechanisms such as GABA receptor function and voltage-gated channels. In addition, genes and modules of co-expressed genes showed a role for intracellular signalling cascades, mitochondria and inflammation. Although larger study samples may be required to identify the full range of involved pathways, these results indicate a role for mitochondria, intracellular signalling and inflammation in cluster headache. PMID:28074859

  19. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2016-07-15

    Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti

  20. Comparative study on gene set and pathway topology-based enrichment methods.

    PubMed

    Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

    2015-10-22

    Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both

  1. A 6-gene signature identifies four molecular subgroups of neuroblastoma

    PubMed Central

    2011-01-01

    Background There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. Results The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). Conclusions Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics. PMID:21492432

  2. Mechanism-based biomarker gene sets for glutathione depletion-related hepatotoxicity in rats

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gao Weihua; Mizukawa, Yumiko; Nakatsu, Noriyuki

    Chemical-induced glutathione depletion is thought to be caused by two types of toxicological mechanisms: PHO-type glutathione depletion [glutathione conjugated with chemicals such as phorone (PHO) or diethyl maleate (DEM)], and BSO-type glutathione depletion [i.e., glutathione synthesis inhibited by chemicals such as L-buthionine-sulfoximine (BSO)]. In order to identify mechanism-based biomarker gene sets for glutathione depletion in rat liver, male SD rats were treated with various chemicals including PHO (40, 120 and 400 mg/kg), DEM (80, 240 and 800 mg/kg), BSO (150, 450 and 1500 mg/kg), and bromobenzene (BBZ, 10, 100 and 300 mg/kg). Liver samples were taken 3, 6, 9 andmore » 24 h after administration and examined for hepatic glutathione content, physiological and pathological changes, and gene expression changes using Affymetrix GeneChip Arrays. To identify differentially expressed probe sets in response to glutathione depletion, we focused on the following two courses of events for the two types of mechanisms of glutathione depletion: a) gene expression changes occurring simultaneously in response to glutathione depletion, and b) gene expression changes after glutathione was depleted. The gene expression profiles of the identified probe sets for the two types of glutathione depletion differed markedly at times during and after glutathione depletion, whereas Srxn1 was markedly increased for both types as glutathione was depleted, suggesting that Srxn1 is a key molecule in oxidative stress related to glutathione. The extracted probe sets were refined and verified using various compounds including 13 additional positive or negative compounds, and they established two useful marker sets. One contained three probe sets (Akr7a3, Trib3 and Gstp1) that could detect conjugation-type glutathione depletors any time within 24 h after dosing, and the other contained 14 probe sets that could detect glutathione depletors by any mechanism. These two sets, with appropriate

  3. Gene expression patterns combined with bioinformatics analysis identify genes associated with cholangiocarcinoma.

    PubMed

    Li, Chen; Shen, Weixing; Shen, Sheng; Ai, Zhilong

    2013-12-01

    To explore the molecular mechanisms of cholangiocarcinoma (CC), microarray technology was used to find biomarkers for early detection and diagnosis. The gene expression profiles from 6 patients with CC and 5 normal controls were downloaded from Gene Expression Omnibus and compared. As a result, 204 differentially co-expressed genes (DCGs) in CC patients compared to normal controls were identified using a computational bioinformatics analysis. These genes were mainly involved in coenzyme metabolic process, peptidase activity and oxidation reduction. A regulatory network was constructed by mapping the DCGs to known regulation data. Four transcription factors, FOXC1, ZIC2, NKX2-2 and GCGR, were hub nodes in the network. In conclusion, this study provides a set of targets useful for future investigations into molecular biomarker studies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  4. GO-based functional dissimilarity of gene sets.

    PubMed

    Díaz-Díaz, Norberto; Aguilar-Ruiz, Jesús S

    2011-09-01

    The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.

  5. Applying Multivariate Adaptive Splines to Identify Genes With Expressions Varying After Diagnosis in Microarray Experiments.

    PubMed

    Duan, Fenghai; Xu, Ye

    2017-01-01

    To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.

  6. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

    PubMed

    Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

    2015-11-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.

  7. Estimation of gene induction enables a relevance-based ranking of gene sets.

    PubMed

    Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens

    2009-07-01

    In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.

  8. GeneTopics - interpretation of gene sets via literature-driven topic models

    PubMed Central

    2013-01-01

    Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly

  9. A gene-trap strategy identifies quiescence-induced genes in synchronized myoblasts.

    PubMed

    Sambasivan, Ramkumar; Pavlath, Grace K; Dhawan, Jyotsna

    2008-03-01

    Cellular quiescence is characterized not only by reduced mitotic and metabolic activity but also by altered gene expression. Growing evidence suggests that quiescence is not merely a basal state but is regulated by active mechanisms. To understand the molecular programme that governs reversible cell cycle exit, we focused on quiescence-related gene expression in a culture model of myogenic cell arrest and activation. Here we report the identification of quiescence-induced genes using a gene-trap strategy. Using a retroviral vector, we generated a library of gene traps in C2C12 myoblasts that were screened for arrest-induced insertions by live cell sorting (FACS-gal). Several independent gene- trap lines revealed arrest-dependent induction of betagal activity, confirming the efficacy of the FACS screen. The locus of integration was identified in 15 lines. In three lines,insertion occurred in genes previously implicated in the control of quiescence, i.e. EMSY - a BRCA2--interacting protein, p8/com1 - a p300HAT -- binding protein and MLL5 - a SET domain protein. Our results demonstrate that expression of chromatin modulatory genes is induced in G0, providing support to the notion that this reversibly arrested state is actively regulated.

  10. Genome-wide methylation analysis identifies a core set of hypermethylated genes in CIMP-H colorectal cancer.

    PubMed

    McInnes, Tyler; Zou, Donghui; Rao, Dasari S; Munro, Francesca M; Phillips, Vicky L; McCall, John L; Black, Michael A; Reeve, Anthony E; Guilford, Parry J

    2017-03-28

    Aberrant DNA methylation profiles are a characteristic of all known cancer types, epitomized by the CpG island methylator phenotype (CIMP) in colorectal cancer (CRC). Hypermethylation has been observed at CpG islands throughout the genome, but it is unclear which factors determine whether an individual island becomes methylated in cancer. DNA methylation in CRC was analysed using the Illumina HumanMethylation450K array. Differentially methylated loci were identified using Significance Analysis of Microarrays (SAM) and the Wilcoxon Signed Rank (WSR) test. Unsupervised hierarchical clustering was used to identify methylation subtypes in CRC. In this study we characterized the DNA methylation profiles of 94 CRC tissues and their matched normal counterparts. Consistent with previous studies, unsupervized hierarchical clustering of genome-wide methylation data identified three subtypes within the tumour samples, designated CIMP-H, CIMP-L and CIMP-N, that showed high, low and very low methylation levels, respectively. Differential methylation between normal and tumour samples was analysed at the individual CpG level, and at the gene level. The distribution of hypermethylation in CIMP-N tumours showed high inter-tumour variability and appeared to be highly stochastic in nature, whereas CIMP-H tumours exhibited consistent hypermethylation at a subset of genes, in addition to a highly variable background of hypermethylated genes. EYA4, TFPI2 and TLX1 were hypermethylated in more than 90% of all tumours examined. One-hundred thirty-two genes were hypermethylated in 100% of CIMP-H tumours studied and these were highly enriched for functions relating to skeletal system development (Bonferroni adjusted p value =2.88E-15), segment specification (adjusted p value =9.62E-11), embryonic development (adjusted p value =1.52E-04), mesoderm development (adjusted p value =1.14E-20), and ectoderm development (adjusted p value =7.94E-16). Our genome-wide characterization of DNA

  11. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

    PubMed Central

    Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

    2016-01-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405

  12. Gene expression meta-analysis identifies chromosomal regions and candidate genes involved in breast cancer metastasis.

    PubMed

    Thomassen, Mads; Tan, Qihua; Kruse, Torben A

    2009-01-01

    Breast cancer cells exhibit complex karyotypic alterations causing deregulation of numerous genes. Some of these genes are probably causal for cancer formation and local growth whereas others are causal for the various steps of metastasis. In a fraction of tumors deregulation of the same genes might be caused by epigenetic modulations, point mutations or the influence of other genes. We have investigated the relation of gene expression and chromosomal position, using eight datasets including more than 1200 breast tumors, to identify chromosomal regions and candidate genes possibly causal for breast cancer metastasis. By use of "Gene Set Enrichment Analysis" we have ranked chromosomal regions according to their relation to metastasis. Overrepresentation analysis identified regions with increased expression for chromosome 1q41-42, 8q24, 12q14, 16q22, 16q24, 17q12-21.2, 17q21-23, 17q25, 20q11, and 20q13 among metastasizing tumors and reduced gene expression at 1p31-21, 8p22-21, and 14q24. By analysis of genes with extremely imbalanced expression in these regions we identified DIRAS3 at 1p31, PSD3, LPL, EPHX2 at 8p21-22, and FOS at 14q24 as candidate metastasis suppressor genes. Potential metastasis promoting genes includes RECQL4 at 8q24, PRMT7 at 16q22, GINS2 at 16q24, and AURKA at 20q13.

  13. Using SCOPE to identify potential regulatory motifs in coregulated genes.

    PubMed

    Martyanov, Viktor; Gross, Robert H

    2011-05-31

    SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from

  14. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth.

    PubMed

    Chaillou, Thomas; Jackson, Janna R; England, Jonathan H; Kirby, Tyler J; Richards-White, Jena; Esser, Karyn A; Dupont-Versteegden, Esther E; McCarthy, John J

    2015-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. Copyright © 2015 the American Physiological Society.

  15. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth

    PubMed Central

    Chaillou, Thomas; Jackson, Janna R.; England, Jonathan H.; Kirby, Tyler J.; Richards-White, Jena; Esser, Karyn A.; Dupont-Versteegden, Esther E.

    2014-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. PMID:25554798

  16. A computational approach to identify cellular heterogeneity and tissue-specific gene regulatory networks.

    PubMed

    Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees

    2018-06-07

    The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.

  17. Identifying differentially expressed genes in cancer patients using a non-parameter Ising model.

    PubMed

    Li, Xumeng; Feltus, Frank A; Sun, Xiaoqian; Wang, James Z; Luo, Feng

    2011-10-01

    Identification of genes and pathways involved in diseases and physiological conditions is a major task in systems biology. In this study, we developed a novel non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also proposed a simulated annealing algorithm to find the optimal configuration of the Ising model. The Ising model was applied to two breast cancer microarray data sets. The results showed that more cancer-related DE sub-networks and genes were identified by the Ising model than those by the Markov random field model. Furthermore, cross-validation experiments showed that DE genes identified by Ising model can improve classification performance compared with DE genes identified by Markov random field model. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Identifying Driver Genomic Alterations in Cancers by Searching Minimum-Weight, Mutually Exclusive Sets

    PubMed Central

    Lu, Songjian; Lu, Kevin N.; Cheng, Shi-Yuan; Hu, Bo; Ma, Xiaojun; Nystrom, Nicholas; Lu, Xinghua

    2015-01-01

    An important goal of cancer genomic research is to identify the driving pathways underlying disease mechanisms and the heterogeneity of cancers. It is well known that somatic genome alterations (SGAs) affecting the genes that encode the proteins within a common signaling pathway exhibit mutual exclusivity, in which these SGAs usually do not co-occur in a tumor. With some success, this characteristic has been utilized as an objective function to guide the search for driver mutations within a pathway. However, mutual exclusivity alone is not sufficient to indicate that genes affected by such SGAs are in common pathways. Here, we propose a novel, signal-oriented framework for identifying driver SGAs. First, we identify the perturbed cellular signals by mining the gene expression data. Next, we search for a set of SGA events that carries strong information with respect to such perturbed signals while exhibiting mutual exclusivity. Finally, we design and implement an efficient exact algorithm to solve an NP-hard problem encountered in our approach. We apply this framework to the ovarian and glioblastoma tumor data available at the TCGA database, and perform systematic evaluations. Our results indicate that the signal-oriented approach enhances the ability to find informative sets of driver SGAs that likely constitute signaling pathways. PMID:26317392

  19. Identifying Mendelian disease genes with the Variant Effect Scoring Tool

    PubMed Central

    2013-01-01

    Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is

  20. Identification of a set of genes showing regionally enriched expression in the mouse brain

    PubMed Central

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM

    2008-01-01

    Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066

  1. Identification of a set of genes showing regionally enriched expression in the mouse brain.

    PubMed

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa L C; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven J M

    2008-07-14

    The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.

  2. A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research.

    PubMed

    Weidner, Christopher; Steinfath, Matthias; Wistorf, Elisa; Oelgeschläger, Michael; Schneider, Marlon R; Schönfelder, Gilbert

    2017-08-16

    Recent studies that compared transcriptomic datasets of human diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. A major reason for the discrepancies between different gene expression analyses is the arbitrary filtering of differentially expressed genes. Furthermore, the comparison of single genes between different species and platforms often is limited by technical variance, leading to misinterpretation of the con/discordance between data from human and animal models. Thus, standardized approaches for systematic data analysis are needed. To overcome subjective gene filtering and ineffective gene-to-gene comparisons, we recently demonstrated that gene set enrichment analysis (GSEA) has the potential to avoid these problems. Therefore, we developed a standardized protocol for the use of GSEA to distinguish between appropriate and inappropriate animal models for translational research. This protocol is not suitable to predict how to design new model systems a-priori, as it requires existing experimental omics data. However, the protocol describes how to interpret existing data in a standardized manner in order to select the most suitable animal model, thus avoiding unnecessary animal experiments and misleading translational studies.

  3. Co-expression network analysis identified six hub genes in association with metastasis risk and prognosis in hepatocellular carcinoma

    PubMed Central

    Feng, Juerong; Zhou, Rui; Chang, Ying; Liu, Jing; Zhao, Qiu

    2017-01-01

    Hepatocellular carcinoma (HCC) has a high incidence and mortality worldwide, and its carcinogenesis and progression are influenced by a complex network of gene interactions. A weighted gene co-expression network was constructed to identify gene modules associated with the clinical traits in HCC (n = 214). Among the 13 modules, high correlation was only found between the red module and metastasis risk (classified by the HCC metastasis gene signature) (R2 = −0.74). Moreover, in the red module, 34 network hub genes for metastasis risk were identified, six of which (ABAT, AGXT, ALDH6A1, CYP4A11, DAO and EHHADH) were also hub nodes in the protein-protein interaction network of the module genes. Thus, a total of six hub genes were identified. In validation, all hub genes showed a negative correlation with the four-stage HCC progression (P for trend < 0.05) in the test set. Furthermore, in the training set, HCC samples with any hub gene lowly expressed demonstrated a higher recurrence rate and poorer survival rate (hazard ratios with 95% confidence intervals > 1). RNA-sequencing data of 142 HCC samples showed consistent results in the prognosis. Gene set enrichment analysis (GSEA) demonstrated that in the samples with any hub gene highly expressed, a total of 24 functional gene sets were enriched, most of which focused on amino acid metabolism and oxidation. In conclusion, co-expression network analysis identified six hub genes in association with HCC metastasis risk and prognosis, which might improve the prognosis by influencing amino acid metabolism and oxidation. PMID:28430663

  4. GARNET--gene set analysis with exploration of annotation relations.

    PubMed

    Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

    2011-02-15

    Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).

  5. Identifying conserved gene clusters in the presence of homology families.

    PubMed

    He, Xin; Goldwasser, Michael H

    2005-01-01

    The study of conserved gene clusters is important for understanding the forces behind genome organization and evolution, as well as the function of individual genes or gene groups. In this paper, we present a new model and algorithm for identifying conserved gene clusters from pairwise genome comparison. This generalizes a recent model called "gene teams." A gene team is a set of genes that appear homologously in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each gene must have a unique occurrence in each chromosome and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm analyzes a pair of chromosomes in O(mn) time and uses O(m+n) space, where m and n are the number of genes in the respective chromosomes. We demonstrate the utility of our methods by studying two bacterial genomes, E. coli K-12 and B. subtilis. Many of the teams identified by our algorithm correlate with documented E. coli operons, while several others match predicted operons, previously suggested by computational techniques. Our implementation and data are publicly available at euler.slu.edu/ approximately goldwasser/homologyteams/.

  6. NIH Researchers Identify OCD Risk Gene

    MedlinePlus

    ... News From NIH NIH Researchers Identify OCD Risk Gene Past Issues / Summer 2006 Table of Contents For ... and Alcoholism (NIAAA) have identified a previously unknown gene variant that doubles an individual's risk for obsessive- ...

  7. An Integrative Genetics Approach to Identify Candidate Genes Regulating BMD: Combining Linkage, Gene Expression, and Association

    PubMed Central

    Farber, Charles R; van Nas, Atila; Ghazalpour, Anatole; Aten, Jason E; Doss, Sudheer; Sos, Brandon; Schadt, Eric E; Ingram-Drake, Leslie; Davis, Richard C; Horvath, Steve; Smith, Desmond J; Drake, Thomas A; Lusis, Aldons J

    2009-01-01

    Numerous quantitative trait loci (QTLs) affecting bone traits have been identified in the mouse; however, few of the underlying genes have been discovered. To improve the process of transitioning from QTL to gene, we describe an integrative genetics approach, which combines linkage analysis, expression QTL (eQTL) mapping, causality modeling, and genetic association in outbred mice. In C57BL/6J × C3H/HeJ (BXH) F2 mice, nine QTLs regulating femoral BMD were identified. To select candidate genes from within each QTL region, microarray gene expression profiles from individual F2 mice were used to identify 148 genes whose expression was correlated with BMD and regulated by local eQTLs. Many of the genes that were the most highly correlated with BMD have been previously shown to modulate bone mass or skeletal development. Candidates were further prioritized by determining whether their expression was predicted to underlie variation in BMD. Using network edge orienting (NEO), a causality modeling algorithm, 18 of the 148 candidates were predicted to be causally related to differences in BMD. To fine-map QTLs, markers in outbred MF1 mice were tested for association with BMD. Three chromosome 11 SNPs were identified that were associated with BMD within the Bmd11 QTL. Finally, our approach provides strong support for Wnt9a, Rasd1, or both underlying Bmd11. Integration of multiple genetic and genomic data sets can substantially improve the efficiency of QTL fine-mapping and candidate gene identification. PMID:18767929

  8. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.

    PubMed

    Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew

    2012-08-08

    Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  9. Gene set analysis of purine and pyrimidine antimetabolites cancer therapies.

    PubMed

    Fridley, Brooke L; Batzler, Anthony; Li, Liang; Li, Fang; Matimba, Alice; Jenkins, Gregory D; Ji, Yuan; Wang, Liewei; Weinshilboum, Richard M

    2011-11-01

    Responses to therapies, either with regard to toxicities or efficacy, are expected to involve complex relationships of gene products within the same molecular pathway or functional gene set. Therefore, pathways or gene sets, as opposed to single genes, may better reflect the true underlying biology and may be more appropriate units for analysis of pharmacogenomic studies. Application of such methods to pharmacogenomic studies may enable the detection of more subtle effects of multiple genes in the same pathway that may be missed by assessing each gene individually. A gene set analysis of 3821 gene sets is presented assessing the association between basal messenger RNA expression and drug cytotoxicity using ethnically defined human lymphoblastoid cell lines for two classes of drugs: pyrimidines [gemcitabine (dFdC) and arabinoside] and purines [6-thioguanine and 6-mercaptopurine]. The gene set nucleoside-diphosphatase activity was found to be significantly associated with both dFdC and arabinoside, whereas gene set γ-aminobutyric acid catabolic process was associated with dFdC and 6-thioguanine. These gene sets were significantly associated with the phenotype even after adjusting for multiple testing. In addition, five associated gene sets were found in common between the pyrimidines and two gene sets for the purines (3',5'-cyclic-AMP phosphodiesterase activity and γ-aminobutyric acid catabolic process) with a P value of less than 0.0001. Functional validation was attempted with four genes each in gene sets for thiopurine and pyrimidine antimetabolites. All four genes selected from the pyrimidine gene sets (PSME3, CANT1, ENTPD6, ADRM1) were validated, but only one (PDE4D) was validated for the thiopurine gene sets. In summary, results from the gene set analysis of pyrimidine and purine therapies, used often in the treatment of various cancers, provide novel insight into the relationship between genomic variation and drug response.

  10. Use of RNA-seq to identify cardiac genes and gene pathways differentially expressed between dogs with and without dilated cardiomyopathy

    PubMed Central

    Friedenberg, Steven G.; Chdid, Lhoucine; Keene, Bruce; Sherry, Barbara; Motsinger-Reif, Alison; Meurs, Kathryn M.

    2017-01-01

    OBJECTIVE To identify cardiac tissue genes and gene pathways differentially expressed between dogs with and without dilated cardiomyopathy (DCM). ANIMALS 8 dogs with and 5 dogs without DCM. PROCEDURES Following euthanasia, samples of left ventricular myocardium were collected from each dog. Total RNA was extracted from tissue samples, and RNA sequencing was performed on each sample. Samples from dogs with and without DCM were grouped to identify genes that were differentially regulated between the 2 populations. Overrepresentation analysis was performed on upregulated and downregulated gene sets to identify altered molecular pathways in dogs with DCM. RESULTS Genes involved in cellular energy metabolism, especially metabolism of carbohydrates and fats, were significantly downregulated in dogs with DCM. Expression of cardiac structural proteins was also altered in affected dogs. CONCLUSIONS AND CLINICAL RELEVANCE Results suggested that RNA sequencing may provide important insights into the pathogenesis of DCM in dogs and highlight pathways that should be explored to identify causative mutations and develop novel therapeutic interventions. PMID:27347821

  11. Use of RNA-seq to identify cardiac genes and gene pathways differentially expressed between dogs with and without dilated cardiomyopathy.

    PubMed

    Friedenberg, Steven G; Chdid, Lhoucine; Keene, Bruce; Sherry, Barbara; Motsinger-Reif, Alison; Meurs, Kathryn M

    2016-07-01

    OBJECTIVE To identify cardiac tissue genes and gene pathways differentially expressed between dogs with and without dilated cardiomyopathy (DCM). ANIMALS 8 dogs with and 5 dogs without DCM. PROCEDURES Following euthanasia, samples of left ventricular myocardium were collected from each dog. Total RNA was extracted from tissue samples, and RNA sequencing was performed on each sample. Samples from dogs with and without DCM were grouped to identify genes that were differentially regulated between the 2 populations. Overrepresentation analysis was performed on upregulated and downregulated gene sets to identify altered molecular pathways in dogs with DCM. RESULTS Genes involved in cellular energy metabolism, especially metabolism of carbohydrates and fats, were significantly downregulated in dogs with DCM. Expression of cardiac structural proteins was also altered in affected dogs. CONCLUSIONS AND CLINICAL RELEVANCE Results suggested that RNA sequencing may provide important insights into the pathogenesis of DCM in dogs and highlight pathways that should be explored to identify causative mutations and develop novel therapeutic interventions.

  12. Training set selection for the prediction of essential genes.

    PubMed

    Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng

    2014-01-01

    Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.

  13. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Friddle, Carl J; Koga, Teiichiro; Rubin, Edward M.

    2000-03-15

    While cardiac hypertrophy has been the subject of intensive investigation, regression of hypertrophy has been significantly less studied, precluding large-scale analysis of the relationship between these processes. In the present study, using pharmacological models of hypertrophy in mice, expression profiling was performed with fragments of more than 3,000 genes to characterize and contrast expression changes during induction and regression of hypertrophy. Administration of angiotensin II and isoproterenol by osmotic minipump produced increases in heart weight (15% and 40% respectively) that returned to pre-induction size following drug withdrawal. From multiple expression analyses of left ventricular RNA isolated at daily time-points duringmore » cardiac hypertrophy and regression, we identified sets of genes whose expression was altered at specific stages of this process. While confirming the participation of 25 genes or pathways previously known to be altered by hypertrophy, a larger set of 30 genes was identified whose expression had not previously been associated with cardiac hypertrophy or regression. Of the 55 genes that showed reproducible changes during the time course of induction and regression, 32 genes were altered only during induction and 8 were altered only during regression. This study identified both known and novel genes whose expression is affected at different stages of cardiac hypertrophy and regression and demonstrates that cardiac remodeling during regression utilizes a set of genes that are distinct from those used during induction of hypertrophy.« less

  14. Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data

    PubMed Central

    Ping, Yanyan; Deng, Yulan; Wang, Li; Zhang, Hongyi; Zhang, Yong; Xu, Chaohan; Zhao, Hongying; Fan, Huihui; Yu, Fulong; Xiao, Yun; Li, Xia

    2015-01-01

    The driver genetic aberrations collectively regulate core cellular processes underlying cancer development. However, identifying the modules of driver genetic alterations and characterizing their functional mechanisms are still major challenges for cancer studies. Here, we developed an integrative multi-omics method CMDD to identify the driver modules and their affecting dysregulated genes through characterizing genetic alteration-induced dysregulated networks. Applied to glioblastoma (GBM), the CMDD identified a core gene module of 17 genes, including seven known GBM drivers, and their dysregulated genes. The module showed significant association with shorter survival of GBM. When classifying driver genes in the module into two gene sets according to their genetic alteration patterns, we found that one gene set directly participated in the glioma pathway, while the other indirectly regulated the glioma pathway, mostly, via their dysregulated genes. Both of the two gene sets were significant contributors to survival and helpful for classifying GBM subtypes, suggesting their critical roles in GBM pathogenesis. Also, by applying the CMDD to other six cancers, we identified some novel core modules associated with overall survival of patients. Together, these results demonstrate integrative multi-omics data can identify driver modules and uncover their dysregulated genes, which is useful for interpreting cancer genome. PMID:25653168

  15. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    PubMed

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  16. A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

    PubMed Central

    Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378

  17. Computation and application of tissue-specific gene set weights.

    PubMed

    Frost, H Robert

    2018-04-06

    Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.

  18. Phylogenetics and evolution of Su(var)3-9 SET genes in land plants: rapid diversification in structure and function.

    PubMed

    Zhu, Xinyu; Ma, Hong; Chen, Zhiduan

    2011-03-09

    Plants contain numerous Su(var)3-9 homologues (SUVH) and related (SUVR) genes, some of which await functional characterization. Although there have been studies on the evolution of plant Su(var)3-9 SET genes, a systematic evolutionary study including major land plant groups has not been reported. Large-scale phylogenetic and evolutionary analyses can help to elucidate the underlying molecular mechanisms and contribute to improve genome annotation. Putative orthologs of plant Su(var)3-9 SET protein sequences were retrieved from major representatives of land plants. A novel clustering that included most members analyzed, henceforth referred to as core Su(var)3-9 homologues and related (cSUVHR) gene clade, was identified as well as all orthologous groups previously identified. Our analysis showed that plant Su(var)3-9 SET proteins possessed a variety of domain organizations, and can be classified into five types and ten subtypes. Plant Su(var)3-9 SET genes also exhibit a wide range of gene structures among different paralogs within a family, even in the regions encoding conserved PreSET and SET domains. We also found that the majority of SUVH members were intronless and formed three subclades within the SUVH clade. A detailed phylogenetic analysis of the plant Su(var)3-9 SET genes was performed. A novel deep phylogenetic relationship including most plant Su(var)3-9 SET genes was identified. Additional domains such as SAR, ZnF_C2H2 and WIYLD were early integrated into primordial PreSET/SET/PostSET domain organization. At least three classes of gene structures had been formed before the divergence of Physcomitrella patens (moss) from other land plants. One or multiple retroposition events might have occurred among SUVH genes with the donor genes leading to the V-2 orthologous group. The structural differences among evolutionary groups of plant Su(var)3-9 SET genes with different functions were described, contributing to the design of further experimental studies.

  19. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    PubMed

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  20. Integrative Analysis of GWASs, Human Protein Interaction, and Gene Expression Identified Gene Modules Associated With BMDs

    PubMed Central

    He, Hao; Zhang, Lei; Li, Jian; Wang, Yu-Ping; Zhang, Ji-Gang; Shen, Jie; Guo, Yan-Fang

    2014-01-01

    Context: To date, few systems genetics studies in the bone field have been performed. We designed our study from a systems-level perspective by integrating genome-wide association studies (GWASs), human protein-protein interaction (PPI) network, and gene expression to identify gene modules contributing to osteoporosis risk. Methods: First we searched for modules significantly enriched with bone mineral density (BMD)-associated genes in human PPI network by using 2 large meta-analysis GWAS datasets through a dense module search algorithm. One included 7 individual GWAS samples (Meta7). The other was from the Genetic Factors for Osteoporosis Consortium (GEFOS2). One was assigned as a discovery dataset and the other as an evaluation dataset, and vice versa. Results: In total, 42 modules and 129 modules were identified significantly in both Meta7 and GEFOS2 datasets for femoral neck and spine BMD, respectively. There were 3340 modules identified for hip BMD only in Meta7. As candidate modules, they were assessed for the biological relevance to BMD by gene set enrichment analysis in 2 expression profiles generated from circulating monocytes in subjects with low versus high BMD values. Interestingly, there were 2 modules significantly enriched in monocytes from the low BMD group in both gene expression datasets (nominal P value <.05). Two modules had 16 nonredundant genes. Functional enrichment analysis revealed that both modules were enriched for genes involved in Wnt receptor signaling and osteoblast differentiation. Conclusion: We highlighted 2 modules and novel genes playing important roles in the regulation of bone mass, providing important clues for therapeutic approaches for osteoporosis. PMID:25119315

  1. LGscore: A method to identify disease-related genes using biological literature and Google data.

    PubMed

    Kim, Jeongwoo; Kim, Hyunjin; Yoon, Youngmi; Park, Sanghyun

    2015-04-01

    Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which identifies disease-related genes using Google data and literature data. To implement this method, first, we construct a disease-related gene network using text-mining results. We then extract gene-gene interactions based on co-occurrences in abstract data obtained from PubMed, and calculate the weights of edges in the gene network by means of Z-scoring. The weights contain two values: the frequency and the Google search results. The frequency value is extracted from literature data, and the Google search result is obtained using Google. We assign a score to each gene through a network analysis. We assume that genes with a large number of links and numerous Google search results and frequency values are more likely to be involved in disease. For validation, we investigated the top 20 inferred genes for five different diseases using answer sets. The answer sets comprised six databases that contain information on disease-gene relationships. We identified a significant number of disease-related genes as well as candidate genes for Alzheimer's disease, diabetes, colon cancer, lung cancer, and prostate cancer. Our method was up to 40% more accurate than existing methods. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children.

    PubMed

    Verhagen, Lilly M; Zomer, Aldert; Maes, Mailis; Villalba, Julian A; Del Nogal, Berenice; Eleveld, Marc; van Hijum, Sacha Aft; de Waard, Jacobus H; Hermans, Peter Wm

    2013-02-01

    Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Our data justify the further exploration of our signature set as biomarkers

  3. A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children

    PubMed Central

    2013-01-01

    Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our

  4. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature.

    PubMed

    Ye, Ning; Yin, Hengfu; Liu, Jingjing; Dai, Xiaogang; Yin, Tongming

    2015-01-01

    The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

  5. An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

    PubMed

    Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H

    2015-01-01

    Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

  6. Curated eutherian third party data gene data sets.

    PubMed

    Premzl, Marko

    2016-03-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.

  7. GSCALite: A Web Server for Gene Set Cancer Analysis.

    PubMed

    Liu, Chun-Jie; Hu, Fei-Fei; Xia, Mengxuan; Han, Leng; Zhang, Qiong; Guo, An-Yuan

    2018-05-22

    The availability of cancer genomic data makes it possible to analyze genes related to cancer. Cancer is usually the result of a set of genes and the signal of a single gene could be covered by background noise. Here, we present a web server named Gene Set Cancer Analysis (GSCALite) to analyze a set of genes in cancers with the following functional modules. (i) Differential expression in tumor vs normal, and the survival analysis; (ii) Genomic variations and their survival analysis; (iii) Gene expression associated cancer pathway activity; (iv) miRNA regulatory network for genes; (v) Drug sensitivity for genes; (vi) Normal tissue expression and eQTL for genes. GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers. GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/. guoay@hust.edu.cn or zhangqiong@hust.edu.cn. Supplementary data are available at Bioinformatics online.

  8. Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes

    PubMed Central

    Kurokawa, Ken; Itoh, Takehiko; Kuwahara, Tomomi; Oshima, Kenshiro; Toh, Hidehiro; Toyoda, Atsushi; Takami, Hideto; Morita, Hidetoshi; Sharma, Vineet K.; Srivastava, Tulika P.; Taylor, Todd D.; Noguchi, Hideki; Mori, Hiroshi; Ogura, Yoshitoshi; Ehrlich, Dusko S.; Itoh, Kikuji; Takagi, Toshihisa; Sakaki, Yoshiyuki; Hayashi, Tetsuya; Hattori, Masahira

    2007-01-01

    Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal microbiomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes. PMID:17916580

  9. The Molecular Signatures Database (MSigDB) hallmark gene set collection.

    PubMed

    Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo

    2015-12-23

    The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.

  10. The FUN of identifying gene function in bacterial pathogens; insights from Salmonella functional genomics.

    PubMed

    Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D

    2013-10-01

    The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. Gene Network Construction from Microarray Data Identifies a Key Network Module and Several Candidate Hub Genes in Age-Associated Spatial Learning Impairment

    PubMed Central

    Uddin, Raihan; Singh, Shiva M.

    2017-01-01

    As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in “learning and memory” related functions and pathways. Subsequent differential network analysis of this “learning and memory” module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken

  12. Gene Network Construction from Microarray Data Identifies a Key Network Module and Several Candidate Hub Genes in Age-Associated Spatial Learning Impairment.

    PubMed

    Uddin, Raihan; Singh, Shiva M

    2017-01-01

    As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in "learning and memory" related functions and pathways. Subsequent differential network analysis of this "learning and memory" module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they

  13. Pathway-based analysis of GWAs data identifies association of sex determination genes with susceptibility to testicular germ cell tumors.

    PubMed

    Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A

    2014-11-15

    Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. An automated procedure to identify biomedical articles that contain cancer-associated gene variants.

    PubMed

    McDonald, Ryan; Scott Winters, R; Ankuda, Claire K; Murphy, Joan A; Rogers, Amy E; Pereira, Fernando; Greenblatt, Marc S; White, Peter S

    2006-09-01

    The proliferation of biomedical literature makes it increasingly difficult for researchers to find and manage relevant information. However, identifying research articles containing mutation data, a requisite first step in integrating large and complex mutation data sets, is currently tedious, time-consuming and imprecise. More effective mechanisms for identifying articles containing mutation information would be beneficial both for the curation of mutation databases and for individual researchers. We developed an automated method that uses information extraction, classifier, and relevance ranking techniques to determine the likelihood of MEDLINE abstracts containing information regarding genomic variation data suitable for inclusion in mutation databases. We targeted the CDKN2A (p16) gene and the procedure for document identification currently used by CDKN2A Database curators as a measure of feasibility. A set of abstracts was manually identified from a MEDLINE search as potentially containing specific CDKN2A mutation events. A subset of these abstracts was used as a training set for a maximum entropy classifier to identify text features distinguishing "relevant" from "not relevant" abstracts. Each document was represented as a set of indicative word, word pair, and entity tagger-derived genomic variation features. When applied to a test set of 200 candidate abstracts, the classifier predicted 88 articles as being relevant; of these, 29 of 32 manuscripts in which manual curation found CDKN2A sequence variants were positively predicted. Thus, the set of potentially useful articles that a manual curator would have to review was reduced by 56%, maintaining 91% recall (sensitivity) and more than doubling precision (positive predictive value). Subsequent expansion of the training set to 494 articles yielded similar precision and recall rates, and comparison of the original and expanded trials demonstrated that the average precision improved with the larger data set

  15. Identifying candidate genes for Type 2 Diabetes Mellitus and obesity through gene expression profiling in multiple tissues or cells.

    PubMed

    Chen, Junhui; Meng, Yuhuan; Zhou, Jinghui; Zhuo, Min; Ling, Fei; Zhang, Yu; Du, Hongli; Wang, Xiaoning

    2013-01-01

    Type 2 Diabetes Mellitus (T2DM) and obesity have become increasingly prevalent in recent years. Recent studies have focused on identifying causal variations or candidate genes for obesity and T2DM via analysis of expression quantitative trait loci (eQTL) within a single tissue. T2DM and obesity are affected by comprehensive sets of genes in multiple tissues. In the current study, gene expression levels in multiple human tissues from GEO datasets were analyzed, and 21 candidate genes displaying high percentages of differential expression were filtered out. Specifically, DENND1B, LYN, MRPL30, POC1B, PRKCB, RP4-655J12.3, HIBADH, and TMBIM4 were identified from the T2DM-control study, and BCAT1, BMP2K, CSRNP2, MYNN, NCKAP5L, SAP30BP, SLC35B4, SP1, BAP1, GRB14, HSP90AB1, ITGA5, and TOMM5 were identified from the obesity-control study. The majority of these genes are known to be involved in T2DM and obesity. Therefore, analysis of gene expression in various tissues using GEO datasets may be an effective and feasible method to determine novel or causal genes associated with T2DM and obesity.

  16. Utilizing Gene Tree Variation to Identify Candidate Effector Genes in Zymoseptoria tritici

    PubMed Central

    McDonald, Megan C.; McGinness, Lachlan; Hane, James K.; Williams, Angela H.; Milgate, Andrew; Solomon, Peter S.

    2016-01-01

    Zymoseptoria tritici is a host-specific, necrotrophic pathogen of wheat. Infection by Z. tritici is characterized by its extended latent period, which typically lasts 2 wks, and is followed by extensive host cell death, and rapid proliferation of fungal biomass. This work characterizes the level of genomic variation in 13 isolates, for which we have measured virulence on 11 wheat cultivars with differential resistance genes. Between the reference isolate, IPO323, and the 13 Australian isolates we identified over 800,000 single nucleotide polymorphisms, of which ∼10% had an effect on the coding regions of the genome. Furthermore, we identified over 1700 probable presence/absence polymorphisms in genes across the Australian isolates using de novo assembly. Finally, we developed a gene tree sorting method that quickly identifies groups of isolates within a single gene alignment whose sequence haplotypes correspond with virulence scores on a single wheat cultivar. Using this method, we have identified < 100 candidate effector genes whose gene sequence correlates with virulence toward a wheat cultivar carrying a major resistance gene. PMID:26837952

  17. Beyond main effects of gene-sets: harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior.

    PubMed

    Windhorst, Dafna A; Mileva-Seitz, Viara R; Rippe, Ralph C A; Tiemeier, Henning; Jaddoe, Vincent W V; Verhulst, Frank C; van IJzendoorn, Marinus H; Bakermans-Kranenburg, Marian J

    2016-08-01

    In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome-wide environmental interaction (GWEI) studies in the search for genetic variation underlying individual differences in behavior. Genetic variants in 12 autosomal dopaminergic genes were available in an ethnically homogenous part of a population-based cohort. Harsh parenting was assessed with maternal (n = 1881) and paternal (n = 1710) reports at age 3. Externalizing behavior was assessed with the Child Behavior Checklist (CBCL) at age 5 (71 ± 3.7 months). We conducted gene-set analyses of the association between variation in dopaminergic genes and externalizing behavior, stratified for harsh parenting. The association was statistically significant or approached significance for children without harsh parenting experiences, but was absent in the group with harsh parenting. Similarly, significant associations between single genes and externalizing behavior were only found in the group without harsh parenting. Effect sizes in the groups with and without harsh parenting did not differ significantly. Gene-environment interaction tests were conducted for individual genetic variants, resulting in two significant interaction effects (rs1497023 and rs4922132) after correction for multiple testing. Our findings are suggestive of G × E interplay, with associations between dopamine genes and externalizing behavior present in children without harsh parenting, but not in children with harsh parenting experiences. Harsh parenting may overrule the role of genetic factors in externalizing behavior. Gene-based and gene-set

  18. Network-based integration of GWAS and gene expression identifies a HOX-centric network associated with serous ovarian cancer risk

    PubMed Central

    Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.

    2015-01-01

    Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509

  19. Identifying Candidate Reprogramming Genes in Mouse Induced Pluripotent Stem Cells.

    PubMed

    Gao, Fang; Li, Jingyu; Zhang, Heng; Yang, Xu; An, Tiezhu

    2017-08-01

    Factor-based induced reprogramming approaches have tremendous potential for human regenerative medicine, but the efficiencies of these approaches are still low. In this study, we analyzed the global transcriptional profiles of mouse induced pluripotent stem cells (miPSCs) and mouse embryonic stem cells (mESCs) from seven different labs and present here the first successful clustering according to cell type, not by lab of origin. We identified 2131 different expression genes (DEs) as candidate pluripotency-associated genes by comparing mESCs/miPSCs with somatic cells and 720 DEs between miPSCs and mESCs. Interestingly, there was a significant overlap between the two DE sets. Therefore, we defined the overlap DEs as "consensus DEs" including 313 miPSC-specific genes expressed at a higher level in miPSCs versus mESCs and 184 mESC-specific genes in total and reasoned that these may contribute to the differences in pluripotency between mESCs and miPSCs. A classification of "consensus DEs" according to their different expression levels between somatic cells and mESCs/miPSCs shows that 86% of the miPSC-specific genes are more highly expressed in somatic cells, while 73% of mESC-specific genes are highly expressed in mESCs/miPSCs, indicating that the miPSCs have not efficiently silenced the expression pattern of the somatic cells from which they are derived and failed to completely induce the genes with high expression levels in mESCs. We further revealed a strong correlation between oocyte-enriched factors and insufficiently induced mESC-specific genes and identified 11 hub genes via network analysis. In light of these findings, we postulated that these key hub genes might not only drive somatic cell nuclear transfer (SCNT) reprogramming but also augment the efficiency and quality of miPSC reprogramming.

  20. Gene Selection and Cancer Classification: A Rough Sets Based Approach

    NASA Astrophysics Data System (ADS)

    Sun, Lijun; Miao, Duoqian; Zhang, Hongyun

    Indentification of informative gene subsets responsible for discerning between available samples of gene expression data is an important task in bioinformatics. Reducts, from rough sets theory, corresponding to a minimal set of essential genes for discerning samples, is an efficient tool for gene selection. Due to the compuational complexty of the existing reduct algoritms, feature ranking is usually used to narrow down gene space as the first step and top ranked genes are selected . In this paper,we define a novel certierion based on the expression level difference btween classes and contribution to classification of the gene for scoring genes and present a algorithm for generating all possible reduct from informative genes.The algorithm takes the whole attribute sets into account and find short reduct with a significant reduction in computational complexity. An exploration of this approach on benchmark gene expression data sets demonstrates that this approach is successful for selecting high discriminative genes and the classification accuracy is impressive.

  1. Discovery of cancer common and specific driver gene sets

    PubMed Central

    2017-01-01

    Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295

  2. Transcriptome-wide selection of a reliable set of reference genes for gene expression studies in potato cyst nematodes (Globodera spp.).

    PubMed

    Sabeh, Michael; Duceppe, Marc-Olivier; St-Arnaud, Marc; Mimee, Benjamin

    2018-01-01

    Relative gene expression analyses by qRT-PCR (quantitative reverse transcription PCR) require an internal control to normalize the expression data of genes of interest and eliminate the unwanted variation introduced by sample preparation. A perfect reference gene should have a constant expression level under all the experimental conditions. However, the same few housekeeping genes selected from the literature or successfully used in previous unrelated experiments are often routinely used in new conditions without proper validation of their stability across treatments. The advent of RNA-Seq and the availability of public datasets for numerous organisms are opening the way to finding better reference genes for expression studies. Globodera rostochiensis is a plant-parasitic nematode that is particularly yield-limiting for potato. The aim of our study was to identify a reliable set of reference genes to study G. rostochiensis gene expression. Gene expression levels from an RNA-Seq database were used to identify putative reference genes and were validated with qRT-PCR analysis. Three genes, GR, PMP-3, and aaRS, were found to be very stable within the experimental conditions of this study and are proposed as reference genes for future work.

  3. Identification of a core set of rhizobial infection genes using data from single cell-types.

    PubMed

    Chen, Da-Song; Liu, Cheng-Wu; Roy, Sonali; Cousins, Donna; Stacey, Nicola; Murray, Jeremy D

    2015-01-01

    Genome-wide expression studies on nodulation have varied in their scale from entire root systems to dissected nodules or root sections containing nodule primordia (NP). More recently efforts have focused on developing methods for isolation of root hairs from infected plants and the application of laser-capture microdissection technology to nodules. Here we analyze two published data sets to identify a core set of infection genes that are expressed in the nodule and in root hairs during infection. Among the genes identified were those encoding phenylpropanoid biosynthesis enzymes including Chalcone-O-Methyltransferase which is required for the production of the potent Nod gene inducer 4',4-dihydroxy-2-methoxychalcone. A promoter-GUS analysis in transgenic hairy roots for two genes encoding Chalcone-O-Methyltransferase isoforms revealed their expression in rhizobially infected root hairs and the nodule infection zone but not in the nitrogen fixation zone. We also describe a group of Rhizobially Induced Peroxidases whose expression overlaps with the production of superoxide in rhizobially infected root hairs and in nodules and roots. Finally, we identify a cohort of co-regulated transcription factors as candidate regulators of these processes.

  4. APPRIS 2017: principal isoforms for multiple gene sets

    PubMed Central

    Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso

    2018-01-01

    Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475

  5. Identifying Cancer Driver Genes Using Replication-Incompetent Retroviral Vectors

    PubMed Central

    Bii, Victor M.; Trobridge, Grant D.

    2016-01-01

    Identifying novel genes that drive tumor metastasis and drug resistance has significant potential to improve patient outcomes. High-throughput sequencing approaches have identified cancer genes, but distinguishing driver genes from passengers remains challenging. Insertional mutagenesis screens using replication-incompetent retroviral vectors have emerged as a powerful tool to identify cancer genes. Unlike replicating retroviruses and transposons, replication-incompetent retroviral vectors lack additional mutagenesis events that can complicate the identification of driver mutations from passenger mutations. They can also be used for almost any human cancer due to the broad tropism of the vectors. Replication-incompetent retroviral vectors have the ability to dysregulate nearby cancer genes via several mechanisms including enhancer-mediated activation of gene promoters. The integrated provirus acts as a unique molecular tag for nearby candidate driver genes which can be rapidly identified using well established methods that utilize next generation sequencing and bioinformatics programs. Recently, retroviral vector screens have been used to efficiently identify candidate driver genes in prostate, breast, liver and pancreatic cancers. Validated driver genes can be potential therapeutic targets and biomarkers. In this review, we describe the emergence of retroviral insertional mutagenesis screens using replication-incompetent retroviral vectors as a novel tool to identify cancer driver genes in different cancer types. PMID:27792127

  6. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

    PubMed

    Allman, Elizabeth S; Degnan, James H; Rhodes, John A

    2011-06-01

    Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

  7. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    PubMed

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.

    PubMed

    Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

    2011-01-01

    Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.

  9. Gene-Based Genome-Wide Association Analysis in European and Asian Populations Identified Novel Genes for Rheumatoid Arthritis.

    PubMed

    Zhu, Hong; Xia, Wei; Mo, Xing-Bo; Lin, Xiang; Qiu, Ying-Hua; Yi, Neng-Jun; Zhang, Yong-Hong; Deng, Fei-Yan; Lei, Shu-Feng

    2016-01-01

    Rheumatoid arthritis (RA) is a complex autoimmune disease. Using a gene-based association research strategy, the present study aims to detect unknown susceptibility to RA and to address the ethnic differences in genetic susceptibility to RA between European and Asian populations. Gene-based association analyses were performed with KGG 2.5 by using publicly available large RA datasets (14,361 RA cases and 43,923 controls of European subjects, 4,873 RA cases and 17,642 controls of Asian Subjects). For the newly identified RA-associated genes, gene set enrichment analyses and protein-protein interactions analyses were carried out with DAVID and STRING version 10.0, respectively. Differential expression verification was conducted using 4 GEO datasets. The expression levels of three selected 'highly verified' genes were measured by ELISA among our in-house RA cases and controls. A total of 221 RA-associated genes were newly identified by gene-based association study, including 71'overlapped', 76 'European-specific' and 74 'Asian-specific' genes. Among them, 105 genes had significant differential expressions between RA patients and health controls at least in one dataset, especially for 20 genes including 11 'overlapped' (ABCF1, FLOT1, HLA-F, IER3, TUBB, ZKSCAN4, BTN3A3, HSP90AB1, CUTA, BRD2, HLA-DMA), 5 'European-specific' (PHTF1, RPS18, BAK1, TNFRSF14, SUOX) and 4 'Asian-specific' (RNASET2, HFE, BTN2A2, MAPK13) genes whose differential expressions were significant at least in three datasets. The protein expressions of two selected genes FLOT1 (P value = 1.70E-02) and HLA-DMA (P value = 4.70E-02) in plasma were significantly different in our in-house samples. Our study identified 221 novel RA-associated genes and especially highlighted the importance of 20 candidate genes on RA. The results addressed ethnic genetic background differences for RA susceptibility between European and Asian populations and detected a long list of overlapped or ethnic specific RA genes. The

  10. A cross-species bi-clustering approach to identifying conserved co-regulated genes.

    PubMed

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-06-15

    A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared

  11. Novel Myopia Genes and Pathways Identified From Syndromic Forms of Myopia

    PubMed Central

    Loughman, James; Wildsoet, Christine F.; Williams, Cathy; Guggenheim, Jeremy A.

    2018-01-01

    Purpose To test the hypothesis that genes known to cause clinical syndromes featuring myopia also harbor polymorphisms contributing to nonsyndromic refractive errors. Methods Clinical phenotypes and syndromes that have refractive errors as a recognized feature were identified using the Online Mendelian Inheritance in Man (OMIM) database. One hundred fifty-four unique causative genes were identified, of which 119 were specifically linked with myopia and 114 represented syndromic myopia (i.e., myopia and at least one other clinical feature). Myopia was the only refractive error listed for 98 genes and hyperopia and the only refractive error noted for 28 genes, with the remaining 28 genes linked to phenotypes with multiple forms of refractive error. Pathway analysis was carried out to find biological processes overrepresented within these sets of genes. Genetic variants located within 50 kb of the 119 myopia-related genes were evaluated for involvement in refractive error by analysis of summary statistics from genome-wide association studies (GWAS) conducted by the CREAM Consortium and 23andMe, using both single-marker and gene-based tests. Results Pathway analysis identified several biological processes already implicated in refractive error development through prior GWAS analyses and animal studies, including extracellular matrix remodeling, focal adhesion, and axon guidance, supporting the research hypothesis. Novel pathways also implicated in myopia development included mannosylation, glycosylation, lens development, gliogenesis, and Schwann cell differentiation. Hyperopia was found to be linked to a different pattern of biological processes, mostly related to organogenesis. Comparison with GWAS findings further confirmed that syndromic myopia genes were enriched for genetic variants that influence refractive errors in the general population. Gene-based analyses implicated 21 novel candidate myopia genes (ADAMTS18, ADAMTS2, ADAMTSL4, AGK, ALDH18A1, ASXL1, COL4A1

  12. Candidate genes for panhypopituitarism identified by gene expression profiling

    PubMed Central

    Mortensen, Amanda H.; MacDonald, James W.; Ghosh, Debashis

    2011-01-01

    Mutations in the transcription factors PROP1 and PIT1 (POU1F1) lead to pituitary hormone deficiency and hypopituitarism in mice and humans. The dysmorphology of developing Prop1 mutant pituitaries readily distinguishes them from those of Pit1 mutants and normal mice. This and other features suggest that Prop1 controls the expression of genes besides Pit1 that are important for pituitary cell migration, survival, and differentiation. To identify genes involved in these processes we used microarray analysis of gene expression to compare pituitary RNA from newborn Prop1 and Pit1 mutants and wild-type littermates. Significant differences in gene expression were noted between each mutant and their normal littermates, as well as between Prop1 and Pit1 mutants. Otx2, a gene critical for normal eye and pituitary development in humans and mice, exhibited elevated expression specifically in Prop1 mutant pituitaries. We report the spatial and temporal regulation of Otx2 in normal mice and Prop1 mutants, and the results suggest Otx2 could influence pituitary development by affecting signaling from the ventral diencephalon and regulation of gene expression in Rathke's pouch. The discovery that Otx2 expression is affected by Prop1 deficiency provides support for our hypothesis that identifying molecular differences in mutants will contribute to understanding the molecular mechanisms that control pituitary organogenesis and lead to human pituitary disease. PMID:21828248

  13. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

    PubMed

    Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E

    2017-02-01

    Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  14. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.

    PubMed

    Shimoni, Yishai

    2018-02-01

    One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.

  15. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification

    PubMed Central

    2018-01-01

    One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520

  16. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

    PubMed Central

    Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi

    2016-01-01

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961

  17. Comparative Transcriptional Profiling of the Axolotl Limb Identifies a Tripartite Regeneration-Specific Gene Program

    PubMed Central

    Knapp, Dunja; Schulz, Herbert; Rascon, Cynthia Alexander; Volkmer, Michael; Scholz, Juliane; Nacu, Eugen; Le, Mu; Novozhilov, Sergey; Tazaki, Akira; Protze, Stephanie; Jacob, Tina; Hubner, Norbert; Habermann, Bianca; Tanaka, Elly M.

    2013-01-01

    Understanding how the limb blastema is established after the initial wound healing response is an important aspect of regeneration research. Here we performed parallel expression profile time courses of healing lateral wounds versus amputated limbs in axolotl. This comparison between wound healing and regeneration allowed us to identify amputation-specific genes. By clustering the expression profiles of these samples, we could detect three distinguishable phases of gene expression – early wound healing followed by a transition-phase leading to establishment of the limb development program, which correspond to the three phases of limb regeneration that had been defined by morphological criteria. By focusing on the transition-phase, we identified 93 strictly amputation-associated genes many of which are implicated in oxidative-stress response, chromatin modification, epithelial development or limb development. We further classified the genes based on whether they were or were not significantly expressed in the developing limb bud. The specific localization of 53 selected candidates within the blastema was investigated by in situ hybridization. In summary, we identified a set of genes that are expressed specifically during regeneration and are therefore, likely candidates for the regulation of blastema formation. PMID:23658691

  18. ADGO: analysis of differentially expressed gene sets using composite GO annotation.

    PubMed

    Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun

    2006-09-15

    Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.

  19. Identifying potential maternal genes of Bombyx mori using digital gene expression profiling

    PubMed Central

    Xu, Pingzhen

    2018-01-01

    Maternal genes present in mature oocytes play a crucial role in the early development of silkworm. Although maternal genes have been widely studied in many other species, there has been limited research in Bombyx mori. High-throughput next generation sequencing provides a practical method for gene discovery on a genome-wide level. Herein, a transcriptome study was used to identify maternal-related genes from silkworm eggs. Unfertilized eggs from five different stages of early development were used to detect the changing situation of gene expression. The expressed genes showed different patterns over time. Seventy-six maternal genes were annotated according to homology analysis with Drosophila melanogaster. More than half of the differentially expressed maternal genes fell into four expression patterns, while the expression patterns showed a downward trend over time. The functional annotation of these material genes was mainly related to transcription factor activity, growth factor activity, nucleic acid binding, RNA binding, ATP binding, and ion binding. Additionally, twenty-two gene clusters including maternal genes were identified from 18 scaffolds. Altogether, we plotted a profile for the maternal genes of Bombyx mori using a digital gene expression profiling method. This will provide the basis for maternal-specific signature research and improve the understanding of the early development of silkworm. PMID:29462160

  20. Overexpression screens identify conserved dosage chromosome instability genes in yeast and human cancer

    PubMed Central

    Duffy, Supipi; Fam, Hok Khim; Wang, Yi Kan; Styles, Erin B.; Kim, Jung-Hyun; Ang, J. Sidney; Singh, Tejomayee; Larionov, Vladimir; Shah, Sohrab P.; Andrews, Brenda; Boerkoel, Cornelius F.; Hieter, Philip

    2016-01-01

    Somatic copy number amplification and gene overexpression are common features of many cancers. To determine the role of gene overexpression on chromosome instability (CIN), we performed genome-wide screens in the budding yeast for yeast genes that cause CIN when overexpressed, a phenotype we refer to as dosage CIN (dCIN), and identified 245 dCIN genes. This catalog of genes reveals human orthologs known to be recurrently overexpressed and/or amplified in tumors. We show that two genes, TDP1, a tyrosyl-DNA-phosphdiesterase, and TAF12, an RNA polymerase II TATA-box binding factor, cause CIN when overexpressed in human cells. Rhabdomyosarcoma lines with elevated human Tdp1 levels also exhibit CIN that can be partially rescued by siRNA-mediated knockdown of TDP1. Overexpression of dCIN genes represents a genetic vulnerability that could be leveraged for selective killing of cancer cells through targeting of an unlinked synthetic dosage lethal (SDL) partner. Using SDL screens in yeast, we identified a set of genes that when deleted specifically kill cells with high levels of Tdp1. One gene was the histone deacetylase RPD3, for which there are known inhibitors. Both HT1080 cells overexpressing hTDP1 and rhabdomyosarcoma cells with elevated levels of hTdp1 were more sensitive to histone deacetylase inhibitors valproic acid (VPA) and trichostatin A (TSA), recapitulating the SDL interaction in human cells and suggesting VPA and TSA as potential therapeutic agents for tumors with elevated levels of hTdp1. The catalog of dCIN genes presented here provides a candidate list to identify genes that cause CIN when overexpressed in cancer, which can then be leveraged through SDL to selectively target tumors. PMID:27551064

  1. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

    PubMed

    Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

    2016-07-08

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Comprehensive Ex Vivo Transposon Mutagenesis Identifies Genes That Promote Growth Factor Independence and Leukemogenesis.

    PubMed

    Guo, Yabin; Updegraff, Barrett L; Park, Sunho; Durakoglugil, Deniz; Cruz, Victoria H; Maddux, Sarah; Hwang, Tae Hyun; O'Donnell, Kathryn A

    2016-02-15

    Aberrant signaling through cytokine receptors and their downstream signaling pathways is a major oncogenic mechanism underlying hematopoietic malignancies. To better understand how these pathways become pathologically activated and to potentially identify new drivers of hematopoietic cancers, we developed a high-throughput functional screening approach using ex vivo mutagenesis with the Sleeping Beauty transposon. We analyzed over 1,100 transposon-mutagenized pools of Ba/F3 cells, an IL3-dependent pro-B-cell line, which acquired cytokine independence and tumor-forming ability. Recurrent transposon insertions could be mapped to genes in the JAK/STAT and MAPK pathways, confirming the ability of this strategy to identify known oncogenic components of cytokine signaling pathways. In addition, recurrent insertions were identified in a large set of genes that have been found to be mutated in leukemia or associated with survival, but were not previously linked to the JAK/STAT or MAPK pathways nor shown to functionally contribute to leukemogenesis. Forced expression of these novel genes resulted in IL3-independent growth in vitro and tumorigenesis in vivo, validating this mutagenesis-based approach for identifying new genes that promote cytokine signaling and leukemogenesis. Therefore, our findings provide a broadly applicable approach for classifying functionally relevant genes in diverse malignancies and offer new insights into the impact of cytokine signaling on leukemia development. ©2015 American Association for Cancer Research.

  3. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics

    PubMed Central

    Zeeberg, Barry R; Riss, Joseph; Kane, David W; Bussey, Kimberly J; Uchio, Edward; Linehan, W Marston; Barrett, J Carl; Weinstein, John N

    2004-01-01

    Background When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. Results A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. Conclusions Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem. PMID:15214961

  4. A hierarchical approach employing metabolic and gene expression profiles to identify the pathways that confer cytotoxicity in HepG2 cells

    PubMed Central

    Li, Zheng; Srivastava, Shireesh; Yang, Xuerui; Mittal, Sheenu; Norton, Paul; Resau, James; Haab, Brian; Chan, Christina

    2007-01-01

    Background Free fatty acids (FFA) and tumor necrosis factor alpha (TNF-α) have been implicated in the pathogenesis of many obesity-related metabolic disorders. When human hepatoblastoma cells (HepG2) were exposed to different types of FFA and TNF-α, saturated fatty acid was found to be cytotoxic and its toxicity was exacerbated by TNF-α. In order to identify the processes associated with the toxicity of saturated FFA and TNF-α, the metabolic and gene expression profiles were measured to characterize the cellular states. A computational model was developed to integrate these disparate data to reveal the underlying pathways and mechanisms involved in saturated fatty acid toxicity. Results A hierarchical framework consisting of three stages was developed to identify the processes and genes that regulate the toxicity. First, discriminant analysis identified that fatty acid oxidation and intracellular triglyceride accumulation were the most relevant in differentiating the cytotoxic phenotype. Second, gene set enrichment analysis (GSEA) was applied to the cDNA microarray data to identify the transcriptionally altered pathways and processes. Finally, the genes and gene sets that regulate the metabolic responses identified in step 1 were identified by integrating the expression of the enriched gene sets and the metabolic profiles with a multi-block partial least squares (MBPLS) regression model. Conclusion The hierarchical approach suggested potential mechanisms involved in mediating the cytotoxic and cytoprotective pathways, as well as identified novel targets, such as NADH dehydrogenases, aldehyde dehydrogenases 1A1 (ALDH1A1) and endothelial membrane protein 3 (EMP3) as modulator of the toxic phenotypes. These predictions, as well as, some specific targets that were suggested by the analysis were experimentally validated. PMID:17498300

  5. Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes.

    PubMed

    Fujimoto, Akihiro; Okada, Yukinori; Boroevich, Keith A; Tsunoda, Tatsuhiko; Taniguchi, Hiroaki; Nakagawa, Hidewaki

    2016-05-26

    Protein tertiary structure determines molecular function, interaction, and stability of the protein, therefore distribution of mutation in the tertiary structure can facilitate the identification of new driver genes in cancer. To analyze mutation distribution in protein tertiary structures, we applied a novel three dimensional permutation test to the mutation positions. We analyzed somatic mutation datasets of 21 types of cancers obtained from exome sequencing conducted by the TCGA project. Of the 3,622 genes that had ≥3 mutations in the regions with tertiary structure data, 106 genes showed significant skew in mutation distribution. Known tumor suppressors and oncogenes were significantly enriched in these identified cancer gene sets. Physical distances between mutations in known oncogenes were significantly smaller than those of tumor suppressors. Twenty-three genes were detected in multiple cancers. Candidate genes with significant skew of the 3D mutation distribution included kinases (MAPK1, EPHA5, ERBB3, and ERBB4), an apoptosis related gene (APP), an RNA splicing factor (SF1), a miRNA processing factor (DICER1), an E3 ubiquitin ligase (CUL1) and transcription factors (KLF5 and EEF1B2). Our study suggests that systematic analysis of mutation distribution in the tertiary protein structure can help identify cancer driver genes.

  6. Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes

    PubMed Central

    Fujimoto, Akihiro; Okada, Yukinori; Boroevich, Keith A.; Tsunoda, Tatsuhiko; Taniguchi, Hiroaki; Nakagawa, Hidewaki

    2016-01-01

    Protein tertiary structure determines molecular function, interaction, and stability of the protein, therefore distribution of mutation in the tertiary structure can facilitate the identification of new driver genes in cancer. To analyze mutation distribution in protein tertiary structures, we applied a novel three dimensional permutation test to the mutation positions. We analyzed somatic mutation datasets of 21 types of cancers obtained from exome sequencing conducted by the TCGA project. Of the 3,622 genes that had ≥3 mutations in the regions with tertiary structure data, 106 genes showed significant skew in mutation distribution. Known tumor suppressors and oncogenes were significantly enriched in these identified cancer gene sets. Physical distances between mutations in known oncogenes were significantly smaller than those of tumor suppressors. Twenty-three genes were detected in multiple cancers. Candidate genes with significant skew of the 3D mutation distribution included kinases (MAPK1, EPHA5, ERBB3, and ERBB4), an apoptosis related gene (APP), an RNA splicing factor (SF1), a miRNA processing factor (DICER1), an E3 ubiquitin ligase (CUL1) and transcription factors (KLF5 and EEF1B2). Our study suggests that systematic analysis of mutation distribution in the tertiary protein structure can help identify cancer driver genes. PMID:27225414

  7. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

    PubMed Central

    2010-01-01

    Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query

  8. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

    PubMed

    Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

    2010-10-07

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database

  9. Identifying key genes in rheumatoid arthritis by weighted gene co-expression network analysis.

    PubMed

    Ma, Chunhui; Lv, Qi; Teng, Songsong; Yu, Yinxian; Niu, Kerun; Yi, Chengqin

    2017-08-01

    This study aimed to identify rheumatoid arthritis (RA) related genes based on microarray data using the WGCNA (weighted gene co-expression network analysis) method. Two gene expression profile datasets GSE55235 (10 RA samples and 10 healthy controls) and GSE77298 (16 RA samples and seven healthy controls) were downloaded from Gene Expression Omnibus database. Characteristic genes were identified using metaDE package. WGCNA was used to find disease-related networks based on gene expression correlation coefficients, and module significance was defined as the average gene significance of all genes used to assess the correlation between the module and RA status. Genes in the disease-related gene co-expression network were subject to functional annotation and pathway enrichment analysis using Database for Annotation Visualization and Integrated Discovery. Characteristic genes were also mapped to the Connectivity Map to screen small molecules. A total of 599 characteristic genes were identified. For each dataset, characteristic genes in the green, red and turquoise modules were most closely associated with RA, with gene numbers of 54, 43 and 79, respectively. These genes were enriched in totally enriched in 17 Gene Ontology terms, mainly related to immune response (CD97, FYB, CXCL1, IKBKE, CCR1, etc.), inflammatory response (CD97, CXCL1, C3AR1, CCR1, LYZ, etc.) and homeostasis (C3AR1, CCR1, PLN, CCL19, PPT1, etc.). Two small-molecule drugs sanguinarine and papaverine were predicted to have a therapeutic effect against RA. Genes related to immune response, inflammatory response and homeostasis presumably have critical roles in RA pathogenesis. Sanguinarine and papaverine have a potential therapeutic effect against RA. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.

  10. Identifying gene networks underlying the neurobiology of ethanol and alcoholism.

    PubMed

    Wolen, Aaron R; Miles, Michael F

    2012-01-01

    For complex disorders such as alcoholism, identifying the genes linked to these diseases and their specific roles is difficult. Traditional genetic approaches, such as genetic association studies (including genome-wide association studies) and analyses of quantitative trait loci (QTLs) in both humans and laboratory animals already have helped identify some candidate genes. However, because of technical obstacles, such as the small impact of any individual gene, these approaches only have limited effectiveness in identifying specific genes that contribute to complex diseases. The emerging field of systems biology, which allows for analyses of entire gene networks, may help researchers better elucidate the genetic basis of alcoholism, both in humans and in animal models. Such networks can be identified using approaches such as high-throughput molecular profiling (e.g., through microarray-based gene expression analyses) or strategies referred to as genetical genomics, such as the mapping of expression QTLs (eQTLs). Characterization of gene networks can shed light on the biological pathways underlying complex traits and provide the functional context for identifying those genes that contribute to disease development.

  11. Genome-Wide Association Study Identifies Candidate Genes for Starch Content Regulation in Maize Kernels

    PubMed Central

    Liu, Na; Xue, Yadong; Guo, Zhanyong; Li, Weihua; Tang, Jihua

    2016-01-01

    Kernel starch content is an important trait in maize (Zea mays L.) as it accounts for 65–75% of the dry kernel weight and positively correlates with seed yield. A number of starch synthesis-related genes have been identified in maize in recent years. However, many loci underlying variation in starch content among maize inbred lines still remain to be identified. The current study is a genome-wide association study that used a set of 263 maize inbred lines. In this panel, the average kernel starch content was 66.99%, ranging from 60.60 to 71.58% over the three study years. These inbred lines were genotyped with the SNP50 BeadChip maize array, which is comprised of 56,110 evenly spaced, random SNPs. Population structure was controlled by a mixed linear model (MLM) as implemented in the software package TASSEL. After the statistical analyses, four SNPs were identified as significantly associated with starch content (P ≤ 0.0001), among which one each are located on chromosomes 1 and 5 and two are on chromosome 2. Furthermore, 77 candidate genes associated with starch synthesis were found within the 100-kb intervals containing these four QTLs, and four highly associated genes were within 20-kb intervals of the associated SNPs. Among the four genes, Glucose-1-phosphate adenylyltransferase (APS1; Gene ID GRMZM2G163437) is known as an important regulator of kernel starch content. The identified SNPs, QTLs, and candidate genes may not only be readily used for germplasm improvement by marker-assisted selection in breeding, but can also elucidate the genetic basis of starch content. Further studies on these identified candidate genes may help determine the molecular mechanisms regulating kernel starch content in maize and other important cereal crops. PMID:27512395

  12. Transcriptome meta-analysis reveals common differential and global gene expression profiles in cystic fibrosis and other respiratory disorders and identifies CFTR regulators.

    PubMed

    Clarke, Luka A; Botelho, Hugo M; Sousa, Lisete; Falcao, Andre O; Amaral, Margarida D

    2015-11-01

    A meta-analysis of 13 independent microarray data sets was performed and gene expression profiles from cystic fibrosis (CF), similar disorders (COPD: chronic obstructive pulmonary disease, IPF: idiopathic pulmonary fibrosis, asthma), environmental conditions (smoking, epithelial injury), related cellular processes (epithelial differentiation/regeneration), and non-respiratory "control" conditions (schizophrenia, dieting), were compared. Similarity among differentially expressed (DE) gene lists was assessed using a permutation test, and a clustergram was constructed, identifying common gene markers. Global gene expression values were standardized using a novel approach, revealing that similarities between independent data sets run deeper than shared DE genes. Correlation of gene expression values identified putative gene regulators of the CF transmembrane conductance regulator (CFTR) gene, of potential therapeutic significance. Our study provides a novel perspective on CF epithelial gene expression in the context of other lung disorders and conditions, and highlights the contribution of differentiation/EMT and injury to gene signatures of respiratory disease. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Repressors Nrg1 and Nrg2 Regulate a Set of Stress-Responsive Genes in Saccharomyces cerevisiae§

    PubMed Central

    Vyas, Valmik K.; Berkey, Cristin D.; Miyao, Takenori; Carlson, Marian

    2005-01-01

    The yeast Saccharomyces cerevisiae responds to environmental stress by rapidly altering the expression of large sets of genes. We report evidence that the transcriptional repressors Nrg1 and Nrg2 (Nrg1/Nrg2), which were previously implicated in glucose repression, regulate a set of stress-responsive genes. Genome-wide expression analysis identified 150 genes that were upregulated in nrg1Δ nrg2Δ double mutant cells, relative to wild-type cells, during growth in glucose. We found that many of these genes are regulated by glucose repression. Stress response elements (STREs) and STRE-like elements are overrepresented in the promoters of these genes, and a search of available expression data sets showed that many are regulated in response to a variety of environmental stress signals. In accord with these findings, mutation of NRG1 and NRG2 enhanced the resistance of cells to salt and oxidative stress and decreased tolerance to freezing. We present evidence that Nrg1/Nrg2 not only contribute to repression of target genes in the absence of stress but also limit induction in response to salt stress. We suggest that Nrg1/Nrg2 fine-tune the regulation of a set of stress-responsive genes. PMID:16278455

  14. Genome-wide transcriptional analysis of flagellar regeneration in Chlamydomonas reinhardtii identifies orthologs of ciliary disease genes

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Marshall, Wallace F.

    2005-01-01

    The important role that cilia and flagella play in human disease creates an urgent need to identify genes involved in ciliary assembly and function. The strong and specific induction of flagellar-coding genes during flagellar regeneration in Chlamydomonas reinhardtii suggests that transcriptional profiling of such cells would reveal new flagella-related genes. We have conducted a genome-wide analysis of RNA transcript levels during flagellar regeneration in Chlamydomonas by using maskless photolithography method-produced DNA oligonucleotide microarrays with unique probe sequences for all exons of the 19,803 predicted genes. This analysis represents previously uncharacterized whole-genome transcriptional activity profiling study in this important model organism. Analysis of strongly induced genes reveals a large set of known flagellar components and also identifies a number of important disease-related proteins as being involved with cilia and flagella, including the zebrafish polycystic kidney genes Qilin, Reptin, and Pontin, as well as the testis-expressed tubby-like protein TULP2.

  15. Integrating mean and variance heterogeneities to identify differentially expressed genes.

    PubMed

    Ouyang, Weiwei; An, Qiang; Zhao, Jinying; Qin, Huaizhen

    2016-12-06

    In functional genomics studies, tests on mean heterogeneity have been widely employed to identify differentially expressed genes with distinct mean expression levels under different experimental conditions. Variance heterogeneity (aka, the difference between condition-specific variances) of gene expression levels is simply neglected or calibrated for as an impediment. The mean heterogeneity in the expression level of a gene reflects one aspect of its distribution alteration; and variance heterogeneity induced by condition change may reflect another aspect. Change in condition may alter both mean and some higher-order characteristics of the distributions of expression levels of susceptible genes. In this report, we put forth a conception of mean-variance differentially expressed (MVDE) genes, whose expression means and variances are sensitive to the change in experimental condition. We mathematically proved the null independence of existent mean heterogeneity tests and variance heterogeneity tests. Based on the independence, we proposed an integrative mean-variance test (IMVT) to combine gene-wise mean heterogeneity and variance heterogeneity induced by condition change. The IMVT outperformed its competitors under comprehensive simulations of normality and Laplace settings. For moderate samples, the IMVT well controlled type I error rates, and so did existent mean heterogeneity test (i.e., the Welch t test (WT), the moderated Welch t test (MWT)) and the procedure of separate tests on mean and variance heterogeneities (SMVT), but the likelihood ratio test (LRT) severely inflated type I error rates. In presence of variance heterogeneity, the IMVT appeared noticeably more powerful than all the valid mean heterogeneity tests. Application to the gene profiles of peripheral circulating B raised solid evidence of informative variance heterogeneity. After adjusting for background data structure, the IMVT replicated previous discoveries and identified novel experiment

  16. Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network.

    PubMed

    Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui

    2017-10-01

    The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.

  17. Diametrical clustering for identifying anti-correlated gene clusters.

    PubMed

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  18. Ranking metrics in gene set enrichment analysis: do they matter?

    PubMed

    Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

    2017-05-12

    There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner

  19. Utility and Limitations of Using Gene Expression Data to Identify Functional Associations

    PubMed Central

    Peng, Cheng; Shiu, Shin-Han

    2016-01-01

    Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets. PMID:27935950

  20. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pandi, Narayanan Sathiya, E-mail: sathiyapandi@gmail.com; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However,more » the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.« less

  1. ENU Mutagenesis in Mice Identifies Candidate Genes For Hypogonadism

    PubMed Central

    Weiss, Jeffrey; Hurley, Lisa A.; Harris, Rebecca M.; Finlayson, Courtney; Tong, Minghan; Fisher, Lisa A.; Moran, Jennifer L.; Beier, David R.; Mason, Christopher; Jameson, J. Larry

    2012-01-01

    Genome-wide mutagenesis was performed in mice to identify candidate genes for male infertility, for which the predominant causes remain idiopathic. Mice were mutagenized using N-ethyl-N-nitrosourea (ENU), bred, and screened for phenotypes associated with the male urogenital system. Fifteen heritable lines were isolated and chromosomal loci were assigned using low density genome-wide SNP arrays. Ten of the fifteen lines were pursued further using higher resolution SNP analysis to narrow the candidate gene regions. Exon sequencing of candidate genes identified mutations in mice with cystic kidneys (Bicc1), cryptorchidism (Rxfp2), restricted germ cell deficiency (Plk4), and severe germ cell deficiency (Prdm9). In two other lines with severe hypogonadism candidate sequencing failed to identify mutations, suggesting defects in genes with previously undocumented roles in gonadal function. These genomic intervals were sequenced in their entirety and a candidate mutation was identified in SnrpE in one of the two lines. The line harboring the SnrpE variant retains substantial spermatogenesis despite small testis size, an unusual phenotype. In addition to the reproductive defects, heritable phenotypes were observed in mice with ataxia (Myo5a), tremors (Pmp22), growth retardation (unknown gene), and hydrocephalus (unknown gene). These results demonstrate that the ENU screen is an effective tool for identifying potential causes of male infertility. PMID:22258617

  2. Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

    PubMed Central

    Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298

  3. SET1A/COMPASS and shadow enhancers in the regulation of homeotic gene expression

    PubMed Central

    Cao, Kaixiang; Collings, Clayton K.; Marshall, Stacy A.; Morgan, Marc A.; Rendleman, Emily J.; Wang, Lu; Sze, Christie C.; Sun, Tianjiao; Bartom, Elizabeth T.; Shilatifard, Ali

    2017-01-01

    The homeotic (Hox) genes are highly conserved in metazoans, where they are required for various processes in development, and misregulation of their expression is associated with human cancer. In the developing embryo, Hox genes are activated sequentially in time and space according to their genomic position within Hox gene clusters. Accumulating evidence implicates both enhancer elements and noncoding RNAs in controlling this spatiotemporal expression of Hox genes, but disentangling their relative contributions is challenging. Here, we identify two cis-regulatory elements (E1 and E2) functioning as shadow enhancers to regulate the early expression of the HoxA genes. Simultaneous deletion of these shadow enhancers in embryonic stem cells leads to impaired activation of HoxA genes upon differentiation, while knockdown of a long noncoding RNA overlapping E1 has no detectable effect on their expression. Although MLL/COMPASS (complex of proteins associated with Set1) family of histone methyltransferases is known to activate transcription of Hox genes in other contexts, we found that individual inactivation of the MLL1-4/COMPASS family members has little effect on early Hox gene activation. Instead, we demonstrate that SET1A/COMPASS is required for full transcriptional activation of multiple Hox genes but functions independently of the E1 and E2 cis-regulatory elements. Our results reveal multiple regulatory layers for Hox genes to fine-tune transcriptional programs essential for development. PMID:28487406

  4. Systematic computation with functional gene-sets among leukemic and hematopoietic stem cells reveals a favorable prognostic signature for acute myeloid leukemia.

    PubMed

    Yang, Xinan Holly; Li, Meiyi; Wang, Bin; Zhu, Wanqi; Desgardin, Aurelie; Onel, Kenan; de Jong, Jill; Chen, Jianjun; Chen, Luonan; Cunningham, John M

    2015-03-24

    Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational "gene-to-function, snapshot-to-dynamics, and biology-to-clinic" framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform. We introduced an adjustable "soft threshold" to a functional gene-set algorithm and found that two different analysis methods identified distinct gene-set signatures from the same samples. We identified a 30-gene cluster that characterizes leukemic stem cell (LSC)-depleted cells and a 25-gene cluster that characterizes LSC-enriched cells in parallel; both mark favorable-prognosis in AML. Genes within each signature significantly share common biological processes and/or molecular functions (empirical p = 6e-5 and 0.03 respectively). The 25-gene signature reflects the abnormal development of stem cells in AML, such as AURKA over-expression. We subsequently determined that the clinical relevance of both signatures is independent of known clinical risk classifications in 214 patients with cytogenetically normal AML. We successfully validated the prognosis of both signatures in two independent cohorts of 91 and 242 patients respectively (log-rank p < 0.0015 and 0.05; empirical p < 0.015 and 0.08). The proposed algorithms and computational framework will harness systems biology research because they efficiently translate gene-sets (rather than single genes) into biological discoveries about

  5. Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases

    PubMed Central

    2010-01-01

    Background Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is

  6. Heterogeneous activation of the TGFβ pathway in glioblastomas identified by gene expression-based classification using TGFβ-responsive genes

    PubMed Central

    Xu, Xie L; Kapoun, Ann M

    2009-01-01

    Background TGFβ has emerged as an attractive target for the therapeutic intervention of glioblastomas. Aberrant TGFβ overproduction in glioblastoma and other high-grade gliomas has been reported, however, to date, none of these reports has systematically examined the components of TGFβ signaling to gain a comprehensive view of TGFβ activation in large cohorts of human glioma patients. Methods TGFβ activation in mammalian cells leads to a transcriptional program that typically affects 5–10% of the genes in the genome. To systematically examine the status of TGFβ activation in high-grade glial tumors, we compiled a gene set of transcriptional response to TGFβ stimulation from tissue culture and in vivo animal studies. These genes were used to examine the status of TGFβ activation in high-grade gliomas including a large cohort of glioblastomas. Unsupervised and supervised classification analysis was performed in two independent, publicly available glioma microarray datasets. Results Unsupervised and supervised classification using the TGFβ-responsive gene list in two independent glial tumor gene expression data sets revealed various levels of TGFβ activation in these tumors. Among glioblastomas, one of the most devastating human cancers, two subgroups were identified that showed distinct TGFβ activation patterns as measured from transcriptional responses. Approximately 62% of glioblastoma samples analyzed showed strong TGFβ activation, while the rest showed a weak TGFβ transcriptional response. Conclusion Our findings suggest heterogeneous TGFβ activation in glioblastomas, which may cause potential differences in responses to anti-TGFβ therapies in these two distinct subgroups of glioblastomas patients. PMID:19192267

  7. Genome-wide transcriptome study in wheat identified candidate genes related to processing quality, majority of them showing interaction (quality x development) and having temporal and spatial distributions.

    PubMed

    Singh, Anuradha; Mantri, Shrikant; Sharma, Monica; Chaudhury, Ashok; Tuli, Rakesh; Roy, Joy

    2014-01-16

    The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study

  8. Genome-wide transcriptome study in wheat identified candidate genes related to processing quality, majority of them showing interaction (quality x development) and having temporal and spatial distributions

    PubMed Central

    2014-01-01

    Background The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Results Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT

  9. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.

    PubMed

    Wei, Pi-Jing; Zhang, Di; Xia, Junfeng; Zheng, Chun-Hou

    2016-12-23

    Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each individual. Correspondingly, one of the key challenges is to pinpoint functional driver mutations or pathways, which contributes to tumorigenesis, from millions of functional neutral passenger mutations. In this paper, in order to identify driver genes effectively, we applied a generalized additive model to mutation profiles to filter genes with long length and constructed a new gene-gene interaction network. Then we integrated the mutation data and expression data into the gene-gene interaction network. Lastly, greedy algorithm was used to prioritize candidate driver genes from the integrated data. We named the proposed method Length-Net-Driver (LNDriver). Experiments on three TCGA datasets, i.e., head and neck squamous cell carcinoma, kidney renal clear cell carcinoma and thyroid carcinoma, demonstrated that the proposed method was effective. Also, it can identify not only frequently mutated drivers, but also rare candidate driver genes.

  10. A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories

    PubMed Central

    2014-01-01

    Background In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. Results We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. Conclusions The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to

  11. Identification of the Core Set of Carbon-Associated Genes in a Bioenergy Grassland Soil

    DOE PAGES

    Howe, Adina; Yang, Fan; Williams, Ryan J.; ...

    2016-11-17

    Despite the central role of soil microbial communities in global carbon (C) cycling, little is known about soil microbial community structure and even less about their metabolic pathways. Efforts to characterize soil communities often focus on identifying differences in gene content across environmental gradients, but an alternative question is what genes are similar in soils. These genes may indicate critical species or potential functions that are required in all soils. Here we identified the “core” set of C cycling sequences widely present in multiple soil metagenomes from a fertilized prairie (FP). Of 226,887 sequences associated with known enzymes involved inmore » the synthesis, metabolism, and transport of carbohydrates, 843 were identified to be consistently prevalent across four replicate soil metagenomes. This core metagenome was functionally and taxonomically diverse, representing five enzyme classes and 99 enzyme families within the CAZy database. Though it only comprised 0.4% of all CAZy-associated genes identified in FP metagenomes, the core was found to be comprised of functions similar to those within cumulative soils. The FP CAZy-associated core sequences were present in multiple publicly available soil metagenomes and most similar to soils sharing geographic proximity. As a result, in soil ecosystems, where high diversity remains a key challenge for metagenomic investigations, these core genes represent a subset of critical functions necessary for carbohydrate metabolism, which can be targeted to evaluate important C fluxes in these and other similar soils.« less

  12. A framework to identify gene expression profiles in a model of inflammation induced by lipopolysaccharide after treatment with thalidomide

    PubMed Central

    2012-01-01

    Background Thalidomide is an anti-inflammatory and anti-angiogenic drug currently used for the treatment of several diseases, including erythema nodosum leprosum, which occurs in patients with lepromatous leprosy. In this research, we use DNA microarray analysis to identify the impact of thalidomide on gene expression responses in human cells after lipopolysaccharide (LPS) stimulation. We employed a two-stage framework. Initially, we identified 1584 altered genes in response to LPS. Modulation of this set of genes was then analyzed in the LPS stimulated cells treated with thalidomide. Results We identified 64 genes with altered expression induced by thalidomide using the rank product method. In addition, the lists of up-regulated and down-regulated genes were investigated by means of bioinformatics functional analysis, which allowed for the identification of biological processes affected by thalidomide. Confirmatory analysis was done in five of the identified genes using real time PCR. Conclusions The results showed some genes that can further our understanding of the biological mechanisms in the action of thalidomide. Of the five genes evaluated with real time PCR, three were down regulated and two were up regulated confirming the initial results of the microarray analysis. PMID:22695124

  13. Identifying key genes associated with acute myocardial infarction.

    PubMed

    Cheng, Ming; An, Shoukuan; Li, Junquan

    2017-10-01

    This study aimed to identify key genes associated with acute myocardial infarction (AMI) by reanalyzing microarray data. Three gene expression profile datasets GSE66360, GSE34198, and GSE48060 were downloaded from GEO database. After data preprocessing, genes without heterogeneity across different platforms were subjected to differential expression analysis between the AMI group and the control group using metaDE package. P < .05 was used as the cutoff for a differentially expressed gene (DEG). The expression data matrices of DEGs were imported in ReactomeFIViz to construct a gene functional interaction (FI) network. Then, DEGs in each module were subjected to pathway enrichment analysis using DAVID. MiRNAs and transcription factors predicted to regulate target DEGs were identified. Quantitative real-time polymerase chain reaction (RT-PCR) was applied to verify the expression of genes. A total of 913 upregulated genes and 1060 downregulated genes were identified in the AMI group. A FI network consists of 21 modules and DEGs in 12 modules were significantly enriched in pathways. The transcription factor-miRNA-gene network contains 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p. RT-PCR validations showed that expression levels of FOXO3 and MYBL2 were significantly increased in AMI, and expression levels of hsa-miR-21-5p and hsa-miR-30c-5p were obviously decreased in AMI. A total of 41 DEGs, such as SOCS3, VAPA, and COL5A2, are speculated to have roles in the pathogenesis of AMI; 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p may be involved in the regulation of the expression of these DEGs.

  14. A genomic approach to identify hybrid incompatibility genes.

    PubMed

    Cooper, Jacob C; Phadnis, Nitin

    2016-07-02

    Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids.

  15. A genomic approach to identify hybrid incompatibility genes

    PubMed Central

    Cooper, Jacob C.; Phadnis, Nitin

    2016-01-01

    ABSTRACT Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids. PMID:27230814

  16. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants.

    PubMed

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-11-01

    MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11-14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14-16 Type II MADS-box genes. The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS-box genes for the development of gymnosperms. This study is

  17. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  18. Gene selection for tumor classification using neighborhood rough sets and entropy measures.

    PubMed

    Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu

    2017-03-01

    With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Polygenic risk score, genome-wide association, and gene set analyses of cognitive domain deficits in schizophrenia.

    PubMed

    Nakahara, Soichiro; Medland, Sarah; Turner, Jessica A; Calhoun, Vince D; Lim, Kelvin O; Mueller, Bryon A; Bustillo, Juan R; O'Leary, Daniel S; Vaidya, Jatin G; McEwen, Sarah; Voyvodic, James; Belger, Aysenil; Mathalon, Daniel H; Ford, Judith M; Guffanti, Guia; Macciardi, Fabio; Potkin, Steven G; van Erp, Theo G M

    2018-06-12

    This study assessed genetic contributions to six cognitive domains, identified by the MATRICS Cognitive Consensus Battery as relevant for schizophrenia, cognition-enhancing, clinical trials. Psychiatric Genomics Consortium Schizophrenia polygenic risk scores showed significant negative correlations with each cognitive domain. Genome-wide association analyses identified loci associated with attention/vigilance (rs830786 within HNF4G), verbal memory (rs67017972 near NDUFS4), and reasoning/problem solving (rs76872642 within HDAC9). Gene set analysis identified unique and shared genes across cognitive domains. These findings suggest involvement of common and unique mechanisms across cognitive domains and may contribute to the discovery of new therapeutic targets to treat cognitive deficits in schizophrenia. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Identifying candidate genes affecting developmental time in Drosophila melanogaster: pervasive pleiotropy and gene-by-environment interaction

    PubMed Central

    Mensch, Julián; Lavagnino, Nicolás; Carreira, Valeria Paula; Massaldi, Ana; Hasson, Esteban; Fanara, Juan José

    2008-01-01

    Background Understanding the genetic architecture of ecologically relevant adaptive traits requires the contribution of developmental and evolutionary biology. The time to reach the age of reproduction is a complex life history trait commonly known as developmental time. In particular, in holometabolous insects that occupy ephemeral habitats, like fruit flies, the impact of developmental time on fitness is further exaggerated. The present work is one of the first systematic studies of the genetic basis of developmental time, in which we also evaluate the impact of environmental variation on the expression of the trait. Results We analyzed 179 co-isogenic single P[GT1]-element insertion lines of Drosophila melanogaster to identify novel genes affecting developmental time in flies reared at 25°C. Sixty percent of the lines showed a heterochronic phenotype, suggesting that a large number of genes affect this trait. Mutant lines for the genes Merlin and Karl showed the most extreme phenotypes exhibiting a developmental time reduction and increase, respectively, of over 2 days and 4 days relative to the control (a co-isogenic P-element insertion free line). In addition, a subset of 42 lines selected at random from the initial set of 179 lines was screened at 17°C. Interestingly, the gene-by-environment interaction accounted for 52% of total phenotypic variance. Plastic reaction norms were found for a large number of developmental time candidate genes. Conclusion We identified components of several integrated time-dependent pathways affecting egg-to-adult developmental time in Drosophila. At the same time, we also show that many heterochronic phenotypes may arise from changes in genes involved in several developmental mechanisms that do not explicitly control the timing of specific events. We also demonstrate that many developmental time genes have pleiotropic effects on several adult traits and that the action of most of them is sensitive to temperature during

  1. Genome-wide strategies identify downstream target genes of chick connective tissue-associated transcription factors.

    PubMed

    Orgeur, Mickael; Martens, Marvin; Leonte, Georgeta; Nassari, Sonya; Bonnin, Marie-Ange; Börno, Stefan T; Timmermann, Bernd; Hecht, Jochen; Duprez, Delphine; Stricker, Sigmar

    2018-03-29

    Connective tissues support organs and play crucial roles in development, homeostasis and fibrosis, yet our understanding of their formation is still limited. To gain insight into the molecular mechanisms of connective tissue specification, we selected five zinc-finger transcription factors - OSR1, OSR2, EGR1, KLF2 and KLF4 - based on their expression patterns and/or known involvement in connective tissue subtype differentiation. RNA-seq and ChIP-seq profiling of chick limb micromass cultures revealed a set of common genes regulated by all five transcription factors, which we describe as a connective tissue core expression set. This common core was enriched with genes associated with axon guidance and myofibroblast signature, including fibrosis-related genes. In addition, each transcription factor regulated a specific set of signalling molecules and extracellular matrix components. This suggests a concept whereby local molecular niches can be created by the expression of specific transcription factors impinging on the specification of local microenvironments. The regulatory network established here identifies common and distinct molecular signatures of limb connective tissue subtypes, provides novel insight into the signalling pathways governing connective tissue specification, and serves as a resource for connective tissue development. © 2018. Published by The Company of Biologists Ltd.

  2. Integrative Analysis of DNA Methylation and Gene Expression Data Identifies EPAS1 as a Key Regulator of COPD

    PubMed Central

    Yoo, Seungyeul; Takikawa, Sachiko; Geraghty, Patrick; Argmann, Carmen; Campbell, Joshua; Lin, Luan; Huang, Tao; Tu, Zhidong; Feronjy, Robert; Spira, Avrum; Schadt, Eric E.; Powell, Charles A.; Zhu, Jun

    2015-01-01

    Chronic Obstructive Pulmonary Disease (COPD) is a complex disease. Genetic, epigenetic, and environmental factors are known to contribute to COPD risk and disease progression. Therefore we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity. EPAS1 is distinct in comparison with other key regulators in terms of methylation profile and downstream target genes. Genes predicted to be regulated by EPAS1 were enriched for biological processes including signaling, cell communications, and system development. We confirmed that EPAS1 protein levels are lower in human COPD lung tissue compared to non-disease controls and that Epas1 gene expression is reduced in mice chronically exposed to cigarette smoke. As EPAS1 downstream genes were significantly enriched for hypoxia responsive genes in endothelial cells, we tested EPAS1 function in human endothelial cells. EPAS1 knockdown by siRNA in endothelial cells impacted genes that significantly overlapped with EPAS1 downstream genes in lung tissue including hypoxia responsive genes, and genes associated with emphysema severity. Our first integrative analysis of genome-wide DNA methylation and gene expression profiles illustrates that not only does DNA methylation play a ‘causal’ role in the molecular pathophysiology of COPD, but it can be leveraged to directly identify novel key mediators of this pathophysiology. PMID:25569234

  3. Integrative analysis of DNA methylation and gene expression data identifies EPAS1 as a key regulator of COPD.

    PubMed

    Yoo, Seungyeul; Takikawa, Sachiko; Geraghty, Patrick; Argmann, Carmen; Campbell, Joshua; Lin, Luan; Huang, Tao; Tu, Zhidong; Foronjy, Robert F; Feronjy, Robert; Spira, Avrum; Schadt, Eric E; Powell, Charles A; Zhu, Jun

    2015-01-01

    Chronic Obstructive Pulmonary Disease (COPD) is a complex disease. Genetic, epigenetic, and environmental factors are known to contribute to COPD risk and disease progression. Therefore we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity. EPAS1 is distinct in comparison with other key regulators in terms of methylation profile and downstream target genes. Genes predicted to be regulated by EPAS1 were enriched for biological processes including signaling, cell communications, and system development. We confirmed that EPAS1 protein levels are lower in human COPD lung tissue compared to non-disease controls and that Epas1 gene expression is reduced in mice chronically exposed to cigarette smoke. As EPAS1 downstream genes were significantly enriched for hypoxia responsive genes in endothelial cells, we tested EPAS1 function in human endothelial cells. EPAS1 knockdown by siRNA in endothelial cells impacted genes that significantly overlapped with EPAS1 downstream genes in lung tissue including hypoxia responsive genes, and genes associated with emphysema severity. Our first integrative analysis of genome-wide DNA methylation and gene expression profiles illustrates that not only does DNA methylation play a 'causal' role in the molecular pathophysiology of COPD, but it can be leveraged to directly identify novel key mediators of this pathophysiology.

  4. Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

    PubMed Central

    Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

    2005-01-01

    Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134

  5. Integrated microarray and ChIP analysis identifies multiple Foxa2 dependent target genes in the notochord.

    PubMed

    Tamplin, Owen J; Cox, Brian J; Rossant, Janet

    2011-12-15

    The node and notochord are key tissues required for patterning of the vertebrate body plan. Understanding the gene regulatory network that drives their formation and function is therefore important. Foxa2 is a key transcription factor at the top of this genetic hierarchy and finding its targets will help us to better understand node and notochord development. We performed an extensive microarray-based gene expression screen using sorted embryonic notochord cells to identify early notochord-enriched genes. We validated their specificity to the node and notochord by whole mount in situ hybridization. This provides the largest available resource of notochord-expressed genes, and therefore candidate Foxa2 target genes in the notochord. Using existing Foxa2 ChIP-seq data from adult liver, we were able to identify a set of genes expressed in the notochord that had associated regions of Foxa2-bound chromatin. Given that Foxa2 is a pioneer transcription factor, we reasoned that these sites might represent notochord-specific enhancers. Candidate Foxa2-bound regions were tested for notochord specific enhancer function in a zebrafish reporter assay and 7 novel notochord enhancers were identified. Importantly, sequence conservation or predictive models could not have readily identified these regions. Mutation of putative Foxa2 binding elements in two of these novel enhancers abrogated reporter expression and confirmed their Foxa2 dependence. The combination of highly specific gene expression profiling and genome-wide ChIP analysis is a powerful means of understanding developmental pathways, even for small cell populations such as the notochord. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. A method to identify differential expression profiles of time-course gene data with Fourier transformation.

    PubMed

    Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong

    2013-10-18

    Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be

  7. A method to identify differential expression profiles of time-course gene data with Fourier transformation

    PubMed Central

    2013-01-01

    Background Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. Results This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Conclusions Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The

  8. Integrative analysis of GWAS, eQTLs and meQTLs data suggests that multiple gene sets are associated with bone mineral density.

    PubMed

    Wang, W; Huang, S; Hou, W; Liu, Y; Fan, Q; He, A; Wen, Y; Hao, J; Guo, X; Zhang, F

    2017-10-01

    Several genome-wide association studies (GWAS) of bone mineral density (BMD) have successfully identified multiple susceptibility genes, yet isolated susceptibility genes are often difficult to interpret biologically. The aim of this study was to unravel the genetic background of BMD at pathway level, by integrating BMD GWAS data with genome-wide expression quantitative trait loci (eQTLs) and methylation quantitative trait loci (meQTLs) data METHOD: We employed the GWAS datasets of BMD from the Genetic Factors for Osteoporosis Consortium (GEFOS), analysing patients' BMD. The areas studied included 32 735 femoral necks, 28 498 lumbar spines, and 8143 forearms. Genome-wide eQTLs (containing 923 021 eQTLs) and meQTLs (containing 683 152 unique methylation sites with local meQTLs) data sets were collected from recently published studies. Gene scores were first calculated by summary data-based Mendelian randomisation (SMR) software and meQTL-aligned GWAS results. Gene set enrichment analysis (GSEA) was then applied to identify BMD-associated gene sets with a predefined significance level of 0.05. We identified multiple gene sets associated with BMD in one or more regions, including relevant known biological gene sets such as the Reactome Circadian Clock (GSEA p-value = 1.0 × 10 -4 for LS and 2.7 × 10 -2 for femoral necks BMD in eQTLs-based GSEA) and insulin-like growth factor receptor binding (GSEA p-value = 5.0 × 10 -4 for femoral necks and 2.6 × 10 -2 for lumbar spines BMD in meQTLs-based GSEA). Our results provided novel clues for subsequent functional analysis of bone metabolism, and illustrated the benefit of integrating eQTLs and meQTLs data into pathway association analysis for genetic studies of complex human diseases. Cite this article : W. Wang, S. Huang, W. Hou, Y. Liu, Q. Fan, A. He, Y. Wen, J. Hao, X. Guo, F. Zhang. Integrative analysis of GWAS, eQTLs and meQTLs data suggests that multiple gene sets are associated with bone mineral density. Bone Joint

  9. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Identifying Stable Reference Genes for qRT-PCR Normalisation in Gene Expression Studies of Narrow-Leafed Lupin (Lupinus angustifolius L.).

    PubMed

    Taylor, Candy M; Jost, Ricarda; Erskine, William; Nelson, Matthew N

    2016-01-01

    Quantitative Reverse Transcription PCR (qRT-PCR) is currently one of the most popular, high-throughput and sensitive technologies available for quantifying gene expression. Its accurate application depends heavily upon normalisation of gene-of-interest data with reference genes that are uniformly expressed under experimental conditions. The aim of this study was to provide the first validation of reference genes for Lupinus angustifolius (narrow-leafed lupin, a significant grain legume crop) using a selection of seven genes previously trialed as reference genes for the model legume, Medicago truncatula. In a preliminary evaluation, the seven candidate reference genes were assessed on the basis of primer specificity for their respective targeted region, PCR amplification efficiency, and ability to discriminate between cDNA and gDNA. Following this assessment, expression of the three most promising candidates [Ubiquitin C (UBC), Helicase (HEL), and Polypyrimidine tract-binding protein (PTB)] was evaluated using the NormFinder and RefFinder statistical algorithms in two narrow-leafed lupin lines, both with and without vernalisation treatment, and across seven organ types (cotyledons, stem, leaves, shoot apical meristem, flowers, pods and roots) encompassing three developmental stages. UBC was consistently identified as the most stable candidate and has sufficiently uniform expression that it may be used as a sole reference gene under the experimental conditions tested here. However, as organ type and developmental stage were associated with greater variability in relative expression, it is recommended using UBC and HEL as a pair to achieve optimal normalisation. These results highlight the importance of rigorously assessing candidate reference genes for each species across a diverse range of organs and developmental stages. With emerging technologies, such as RNAseq, and the completion of valuable transcriptome data sets, it is possible that other potentially more

  11. Identifying Stable Reference Genes for qRT-PCR Normalisation in Gene Expression Studies of Narrow-Leafed Lupin (Lupinus angustifolius L.)

    PubMed Central

    Erskine, William; Nelson, Matthew N.

    2016-01-01

    Quantitative Reverse Transcription PCR (qRT-PCR) is currently one of the most popular, high-throughput and sensitive technologies available for quantifying gene expression. Its accurate application depends heavily upon normalisation of gene-of-interest data with reference genes that are uniformly expressed under experimental conditions. The aim of this study was to provide the first validation of reference genes for Lupinus angustifolius (narrow-leafed lupin, a significant grain legume crop) using a selection of seven genes previously trialed as reference genes for the model legume, Medicago truncatula. In a preliminary evaluation, the seven candidate reference genes were assessed on the basis of primer specificity for their respective targeted region, PCR amplification efficiency, and ability to discriminate between cDNA and gDNA. Following this assessment, expression of the three most promising candidates [Ubiquitin C (UBC), Helicase (HEL), and Polypyrimidine tract-binding protein (PTB)] was evaluated using the NormFinder and RefFinder statistical algorithms in two narrow-leafed lupin lines, both with and without vernalisation treatment, and across seven organ types (cotyledons, stem, leaves, shoot apical meristem, flowers, pods and roots) encompassing three developmental stages. UBC was consistently identified as the most stable candidate and has sufficiently uniform expression that it may be used as a sole reference gene under the experimental conditions tested here. However, as organ type and developmental stage were associated with greater variability in relative expression, it is recommended using UBC and HEL as a pair to achieve optimal normalisation. These results highlight the importance of rigorously assessing candidate reference genes for each species across a diverse range of organs and developmental stages. With emerging technologies, such as RNAseq, and the completion of valuable transcriptome data sets, it is possible that other potentially more

  12. Identifying key genes associated with acute myocardial infarction

    PubMed Central

    Cheng, Ming; An, Shoukuan; Li, Junquan

    2017-01-01

    Abstract Background: This study aimed to identify key genes associated with acute myocardial infarction (AMI) by reanalyzing microarray data. Methods: Three gene expression profile datasets GSE66360, GSE34198, and GSE48060 were downloaded from GEO database. After data preprocessing, genes without heterogeneity across different platforms were subjected to differential expression analysis between the AMI group and the control group using metaDE package. P < .05 was used as the cutoff for a differentially expressed gene (DEG). The expression data matrices of DEGs were imported in ReactomeFIViz to construct a gene functional interaction (FI) network. Then, DEGs in each module were subjected to pathway enrichment analysis using DAVID. MiRNAs and transcription factors predicted to regulate target DEGs were identified. Quantitative real-time polymerase chain reaction (RT-PCR) was applied to verify the expression of genes. Result: A total of 913 upregulated genes and 1060 downregulated genes were identified in the AMI group. A FI network consists of 21 modules and DEGs in 12 modules were significantly enriched in pathways. The transcription factor-miRNA-gene network contains 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p. RT-PCR validations showed that expression levels of FOXO3 and MYBL2 were significantly increased in AMI, and expression levels of hsa-miR-21–5p and hsa-miR-30c-5p were obviously decreased in AMI. Conclusion: A total of 41 DEGs, such as SOCS3, VAPA, and COL5A2, are speculated to have roles in the pathogenesis of AMI; 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p may be involved in the regulation of the expression of these DEGs. PMID:29049183

  13. A Morpholino-based screen to identify novel genes involved in craniofacial morphogenesis

    PubMed Central

    Melvin, Vida Senkus; Feng, Weiguo; Hernandez-Lagunas, Laura; Artinger, Kristin Bruk; Williams, Trevor

    2014-01-01

    BACKGROUND The regulatory mechanisms underpinning facial development are conserved between diverse species. Therefore, results from model systems provide insight into the genetic causes of human craniofacial defects. Previously, we generated a comprehensive dataset examining gene expression during development and fusion of the mouse facial prominences. Here, we used this resource to identify genes that have dynamic expression patterns in the facial prominences, but for which only limited information exists concerning developmental function. RESULTS This set of ~80 genes was used for a high throughput functional analysis in the zebrafish system using Morpholino gene knockdown technology. This screen revealed three classes of cranial cartilage phenotypes depending upon whether knockdown of the gene affected the neurocranium, viscerocranium, or both. The targeted genes that produced consistent phenotypes encoded proteins linked to transcription (meis1, meis2a, tshz2, vgll4l), signaling (pkdcc, vlk, macc1, wu:fb16h09), and extracellular matrix function (smoc2). The majority of these phenotypes were not altered by reduction of p53 levels, demonstrating that both p53 dependent and independent mechanisms were involved in the craniofacial abnormalities. CONCLUSIONS This Morpholino-based screen highlights new genes involved in development of the zebrafish craniofacial skeleton with wider relevance to formation of the face in other species, particularly mouse and human. PMID:23559552

  14. Gene-based rare allele analysis identified a risk gene of Alzheimer's disease.

    PubMed

    Kim, Jong Hun; Song, Pamela; Lim, Hyunsun; Lee, Jae-Hyung; Lee, Jun Hong; Park, Sun Ah

    2014-01-01

    Alzheimer's disease (AD) has a strong propensity to run in families. However, the known risk genes excluding APOE are not clinically useful. In various complex diseases, gene studies have targeted rare alleles for unsolved heritability. Our study aims to elucidate previously unknown risk genes for AD by targeting rare alleles. We used data from five publicly available genetic studies from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the database of Genotypes and Phenotypes (dbGaP). A total of 4,171 cases and 9,358 controls were included. The genotype information of rare alleles was imputed using 1,000 genomes. We performed gene-based analysis of rare alleles (minor allele frequency≤3%). The genome-wide significance level was defined as meta P<1.8×10(-6) (0.05/number of genes in human genome = 0.05/28,517). ZNF628, which is located at chromosome 19q13.42, showed a genome-wide significant association with AD. The association of ZNF628 with AD was not dependent on APOE ε4. APOE and TREM2 were also significantly associated with AD, although not at genome-wide significance levels. Other genes identified by targeting common alleles could not be replicated in our gene-based rare allele analysis. We identified that rare variants in ZNF628 are associated with AD. The protein encoded by ZNF628 is known as a transcription factor. Furthermore, the associations of APOE and TREM2 with AD were highly significant, even in gene-based rare allele analysis, which implies that further deep sequencing of these genes is required in AD heritability studies.

  15. Association of High Myopia with Crystallin Beta A4 (CRYBA4) Gene Polymorphisms in the Linkage-Identified MYP6 Locus

    PubMed Central

    Ho, Daniel W. H.; Yap, Maurice K. H.; Ng, Po Wah; Fung, Wai Yan; Yip, Shea Ping

    2012-01-01

    Background Myopia is the most common ocular disorder worldwide and imposes tremendous burden on the society. It is a complex disease. The MYP6 locus at 22 q12 is of particular interest because many studies have detected linkage signals at this interval. The MYP6 locus is likely to contain susceptibility gene(s) for myopia, but none has yet been identified. Methodology/Principal Findings Two independent subject groups of southern Chinese in Hong Kong participated in the study an initial study using a discovery sample set of 342 cases and 342 controls, and a follow-up study using a replication sample set of 316 cases and 313 controls. Cases with high myopia were defined by spherical equivalent ≤ -8 dioptres and emmetropic controls by spherical equivalent within ±1.00 dioptre for both eyes. Manual candidate gene selection from the MYP6 locus was supported by objective in silico prioritization. DNA samples of discovery sample set were genotyped for 178 tagging single nucleotide polymorphisms (SNPs) from 26 genes. For replication, 25 SNPs (tagging or located at predicted transcription factor or microRNA binding sites) from 4 genes were subsequently examined using the replication sample set. Fisher P value was calculated for all SNPs and overall association results were summarized by meta-analysis. Based on initial and replication studies, rs2009066 located in the crystallin beta A4 (CRYBA4) gene was identified to be the most significantly associated with high myopia (initial study: P = 0.02; replication study: P = 1.88e-4; meta-analysis: P = 1.54e-5) among all the SNPs tested. The association result survived correction for multiple comparisons. Under the allelic genetic model for the combined sample set, the odds ratio of the minor allele G was 1.41 (95% confidence intervals, 1.21-1.64). Conclusions/Significance A novel susceptibility gene (CRYBA4) was discovered for high myopia. Our study also signified the potential importance of appropriate gene

  16. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets.

    PubMed

    Springer, Mark S; Gatesy, John

    2018-02-26

    coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful

  17. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets

    PubMed Central

    Springer, Mark S.; Gatesy, John

    2018-01-01

    Summary coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset—the ‘recombination ratchet’—is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d’etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent

  18. Challenges in Identifying Refugees in National Health Data Sets.

    PubMed

    Semere, Wagahta; Yun, Katherine; Ahalt, Cyrus; Williams, Brie; Wang, Emily A

    2016-07-01

    To evaluate publicly available data sets to determine their utility for studying refugee health. We searched for keywords describing refugees in data sets within the Society of General Internal Medicine Dataset Compendium and the Inter-University Consortium for Political and Social Research database. We included in our analysis US-based data sets with publicly available documentation and a self-defined, health-related focus that allowed for an examination of patient-level factors. Of the 68 data sets that met the study criteria, 37 (54%) registered keyword matches related to refugees, but only 2 uniquely identified refugees. Few health data sets identify refugee status among participants, presenting barriers to understanding refugees' health and health care needs. Information about refugee status in national health surveys should include expanded demographic questions and focus on mental health and chronic disease.

  19. Model-based gene set analysis for Bioconductor.

    PubMed

    Bauer, Sebastian; Robinson, Peter N; Gagneur, Julien

    2011-07-01

    Gene Ontology and other forms of gene-category analysis play a major role in the evaluation of high-throughput experiments in molecular biology. Single-category enrichment analysis procedures such as Fisher's exact test tend to flag large numbers of redundant categories as significant, which can complicate interpretation. We have recently developed an approach called model-based gene set analysis (MGSA), that substantially reduces the number of redundant categories returned by the gene-category analysis. In this work, we present the Bioconductor package mgsa, which makes the MGSA algorithm available to users of the R language. Our package provides a simple and flexible application programming interface for applying the approach. The mgsa package has been made available as part of Bioconductor 2.8. It is released under the conditions of the Artistic license 2.0. peter.robinson@charite.de; julien.gagneur@embl.de.

  20. Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features

    PubMed Central

    2011-01-01

    Background Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Methods Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Results Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. Conclusion This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast

  1. Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features.

    PubMed

    Haakensen, Vilde D; Lingjaerde, Ole Christian; Lüders, Torben; Riis, Margit; Prat, Aleix; Troester, Melissa A; Holmen, Marit M; Frantzen, Jan Ole; Romundstad, Linda; Navjord, Dina; Bukholm, Ida K; Johannesen, Tom B; Perou, Charles M; Ursin, Giske; Kristensen, Vessela N; Børresen-Dale, Anne-Lise; Helland, Aslaug

    2011-11-01

    Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer.

  2. TargetLink, a new method for identifying the endogenous target set of a specific microRNA in intact living cells.

    PubMed

    Xu, Yan; Chen, Yan; Li, Daliang; Liu, Qing; Xuan, Zhenyu; Li, Wen-Hong

    2017-02-01

    MicroRNAs are small non-coding RNAs acting as posttranscriptional repressors of gene expression. Identifying mRNA targets of a given miRNA remains an outstanding challenge in the field. We have developed a new experimental approach, TargetLink, that applied locked nucleic acid (LNA) as the affinity probe to enrich target genes of a specific microRNA in intact cells. TargetLink also consists a rigorous and systematic data analysis pipeline to identify target genes by comparing LNA-enriched sequences between experimental and control samples. Using miR-21 as a test microRNA, we identified 12 target genes of miR-21 in a human colorectal cancer cell by this approach. The majority of the identified targets interacted with miR-21 via imperfect seed pairing. Target validation confirmed that miR-21 repressed the expression of the identified targets. The cellular abundance of the identified miR-21 target transcripts varied over a wide range, with some targets expressed at a rather low level, confirming that both abundant and rare transcripts are susceptible to regulation by microRNAs, and that TargetLink is an efficient approach for identifying the target set of a specific microRNA in intact cells. C20orf111, one of the novel targets identified by TargetLink, was found to reside in the nuclear speckle and to be reliably repressed by miR-21 through the interaction at its coding sequence.

  3. Parallel analysis of tagged deletion mutants efficiently identifies genes involved in endoplasmic reticulum biogenesis.

    PubMed

    Wright, Robin; Parrish, Mark L; Cadera, Emily; Larson, Lynnelle; Matson, Clinton K; Garrett-Engele, Philip; Armour, Chris; Lum, Pek Yee; Shoemaker, Daniel D

    2003-07-30

    Increased levels of HMG-CoA reductase induce cell type- and isozyme-specific proliferation of the endoplasmic reticulum. In yeast, the ER proliferations induced by Hmg1p consist of nuclear-associated stacks of smooth ER membranes known as karmellae. To identify genes required for karmellae assembly, we compared the composition of populations of homozygous diploid S. cerevisiae deletion mutants following 20 generations of growth with and without karmellae. Using an initial population of 1,557 deletion mutants, 120 potential mutants were identified as a result of three independent experiments. Each experiment produced a largely non-overlapping set of potential mutants, suggesting that differences in specific growth conditions could be used to maximize the comprehensiveness of similar parallel analysis screens. Only two genes, UBC7 and YAL011W, were identified in all three experiments. Subsequent analysis of individual mutant strains confirmed that each experiment was identifying valid mutations, based on the mutant's sensitivity to elevated HMG-CoA reductase and inability to assemble normal karmellae. The largest class of HMG-CoA reductase-sensitive mutations was a subset of genes that are involved in chromatin structure and transcriptional regulation, suggesting that karmellae assembly requires changes in transcription or that the presence of karmellae may interfere with normal transcriptional regulation. Copyright 2003 John Wiley & Sons, Ltd.

  4. A reference gene set for sex pheromone biosynthesis and degradation genes from the diamondback moth, Plutella xylostella, based on genome and transcriptome digital gene expression analyses.

    PubMed

    He, Peng; Zhang, Yun-Fei; Hong, Duan-Yang; Wang, Jun; Wang, Xing-Liang; Zuo, Ling-Hua; Tang, Xian-Fu; Xu, Wei-Ming; He, Ming

    2017-03-01

    comprehensive gene data set of sex pheromone biosynthesis and degradation enzyme related genes in DBM created by genome- and transcriptome-wide identification, characterization and expression profiling. Our findings provide a basis to better understand the function of genes with tissue enriched expression. The results also provide information on the genes involved in sex pheromone biosynthesis and degradation, and may be useful to identify potential gene targets for pest control strategies by disrupting the insect-insect communication using pheromone-based behavioral antagonists.

  5. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  6. Rare copy number variations in congenital heart disease patients identify unique genes in left-right patterning

    PubMed Central

    Fakhro, Khalid A.; Choi, Murim; Ware, Stephanie M.; Belmont, John W.; Towbin, Jeffrey A.; Lifton, Richard P.; Khokha, Mustafa K.; Brueckner, Martina

    2011-01-01

    Dominant human genetic diseases that impair reproductive fitness and have high locus heterogeneity constitute a problem for gene discovery because the usual criterion of finding more mutations in specific genes than expected by chance may require extremely large populations. Heterotaxy (Htx), a congenital heart disease resulting from abnormalities in left-right (LR) body patterning, has features suggesting that many cases fall into this category. In this setting, appropriate model systems may provide a means to support implication of specific genes. By high-resolution genotyping of 262 Htx subjects and 991 controls, we identify a twofold excess of subjects with rare genic copy number variations in Htx (14.5% vs. 7.4%, P = 1.5 × 10−4). Although 7 of 45 Htx copy number variations were large chromosomal abnormalities, 38 smaller copy number variations altered a total of 61 genes, 22 of which had Xenopus orthologs. In situ hybridization identified 7 of these 22 genes with expression in the ciliated LR organizer (gastrocoel roof plate), a marked enrichment compared with 40 of 845 previously studied genes (sevenfold enrichment, P < 10−6). Morpholino knockdown in Xenopus of Htx candidates demonstrated that five (NEK2, ROCK2, TGFBR2, GALNT11, and NUP188) strongly disrupted both morphological LR development and expression of pitx2, a molecular marker of LR patterning. These effects were specific, because 0 of 13 control genes from rare Htx or control copy number variations produced significant LR abnormalities (P = 0.001). These findings identify genes not previously implicated in LR patterning. PMID:21282601

  7. Rare copy number variations in congenital heart disease patients identify unique genes in left-right patterning.

    PubMed

    Fakhro, Khalid A; Choi, Murim; Ware, Stephanie M; Belmont, John W; Towbin, Jeffrey A; Lifton, Richard P; Khokha, Mustafa K; Brueckner, Martina

    2011-02-15

    Dominant human genetic diseases that impair reproductive fitness and have high locus heterogeneity constitute a problem for gene discovery because the usual criterion of finding more mutations in specific genes than expected by chance may require extremely large populations. Heterotaxy (Htx), a congenital heart disease resulting from abnormalities in left-right (LR) body patterning, has features suggesting that many cases fall into this category. In this setting, appropriate model systems may provide a means to support implication of specific genes. By high-resolution genotyping of 262 Htx subjects and 991 controls, we identify a twofold excess of subjects with rare genic copy number variations in Htx (14.5% vs. 7.4%, P = 1.5 × 10(-4)). Although 7 of 45 Htx copy number variations were large chromosomal abnormalities, 38 smaller copy number variations altered a total of 61 genes, 22 of which had Xenopus orthologs. In situ hybridization identified 7 of these 22 genes with expression in the ciliated LR organizer (gastrocoel roof plate), a marked enrichment compared with 40 of 845 previously studied genes (sevenfold enrichment, P < 10(-6)). Morpholino knockdown in Xenopus of Htx candidates demonstrated that five (NEK2, ROCK2, TGFBR2, GALNT11, and NUP188) strongly disrupted both morphological LR development and expression of pitx2, a molecular marker of LR patterning. These effects were specific, because 0 of 13 control genes from rare Htx or control copy number variations produced significant LR abnormalities (P = 0.001). These findings identify genes not previously implicated in LR patterning.

  8. Identifying osteosarcoma metastasis associated genes by weighted gene co-expression network analysis (WGCNA).

    PubMed

    Tian, Honglai; Guan, Donghui; Li, Jianmin

    2018-06-01

    Osteosarcoma (OS), the most common malignant bone tumor, accounts for the heavy healthy threat in the period of children and adolescents. OS occurrence usually correlates with early metastasis and high death rate. This study aimed to better understand the mechanism of OS metastasis.Based on Gene Expression Omnibus (GEO) database, we downloaded 4 expression profile data sets associated with OS metastasis, and selected differential expressed genes. Weighted gene co-expression network analysis (WGCNA) approach allowed us to investigate the most OS metastasis-correlated module. Gene Ontology functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to give annotation of selected OS metastasis-associated genes.We select 897 differential expressed genes from OS metastasis and OS non-metastasis groups. Based on these selected genes, WGCNA further explored 142 genes included in the most OS metastasis-correlated module. Gene Ontology functional and KEGG pathway enrichment analyses showed that significantly OS metastasis-associated genes were involved in pathway correlated with insulin-like growth factor binding.Our research figured out several potential molecules participating in metastasis process and factors acting as biomarker. With this study, we could better explore the mechanism of OS metastasis and further discover more therapy targets.

  9. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    PubMed Central

    Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

    2007-01-01

    Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127

  10. RUBIC identifies driver genes by detecting recurrent DNA copy number breaks

    PubMed Central

    van Dyk, Ewald; Hoogstraat, Marlous; ten Hoeve, Jelle; Reinders, Marcel J. T.; Wessels, Lodewyk F. A.

    2016-01-01

    The frequent recurrence of copy number aberrations across tumour samples is a reliable hallmark of certain cancer driver genes. However, state-of-the-art algorithms for detecting recurrent aberrations fail to detect several known drivers. In this study, we propose RUBIC, an approach that detects recurrent copy number breaks, rather than recurrently amplified or deleted regions. This change of perspective allows for a simplified approach as recursive peak splitting procedures and repeated re-estimation of the background model are avoided. Furthermore, we control the false discovery rate on the level of called regions, rather than at the probe level, as in competing algorithms. We benchmark RUBIC against GISTIC2 (a state-of-the-art approach) and RAIG (a recently proposed approach) on simulated copy number data and on three SNP6 and NGS copy number data sets from TCGA. We show that RUBIC calls more focal recurrent regions and identifies a much larger fraction of known cancer genes. PMID:27396759

  11. A Strategy for Identifying Quantitative Trait Genes Using Gene Expression Analysis and Causal Analysis.

    PubMed

    Ishikawa, Akira

    2017-11-27

    Large numbers of quantitative trait loci (QTL) affecting complex diseases and other quantitative traits have been reported in humans and model animals. However, the genetic architecture of these traits remains elusive due to the difficulty in identifying causal quantitative trait genes (QTGs) for common QTL with relatively small phenotypic effects. A traditional strategy based on techniques such as positional cloning does not always enable identification of a single candidate gene for a QTL of interest because it is difficult to narrow down a target genomic interval of the QTL to a very small interval harboring only one gene. A combination of gene expression analysis and statistical causal analysis can greatly reduce the number of candidate genes. This integrated approach provides causal evidence that one of the candidate genes is a putative QTG for the QTL. Using this approach, I have recently succeeded in identifying a single putative QTG for resistance to obesity in mice. Here, I outline the integration approach and discuss its usefulness using my studies as an example.

  12. Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe

    PubMed Central

    Duhig, Trevor; Nam, Miyoung; Palmer, Georgia; Han, Sangjo; Jeffery, Linda; Baek, Seung-Tae; Lee, Hyemi; Shim, Young Sam; Lee, Minho; Kim, Lila; Heo, Kyung-Sun; Noh, Eun Joo; Lee, Ah-Reum; Jang, Young-Joo; Chung, Kyung-Sook; Choi, Shin-Jung; Park, Jo-Young; Park, Youngwoo; Kim, Hwan Mook; Park, Song-Kyu; Park, Hae-Joon; Kang, Eun-Jung; Kim, Hyong Bai; Kang, Hyun-Sam; Park, Hee-Moon; Kim, Kyunghoon; Song, Kiwon; Song, Kyung Bin; Nurse, Paul; Hoe, Kwang-Lae

    2014-01-01

    SUMMARY We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome. This resource provides a powerful tool for biotechnological and eukaryotic cell biology research. Comprehensive gene dispensability comparisons with budding yeast, the first time such studies have been possible between two eukaryotes, revealed that 83% of single copy orthologues in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than non-essential genes to be single copy, broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth. PMID:20473289

  13. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  14. Epigenomic elements analyses for promoters identify ESRRG as a new susceptibility gene for obesity-related traits.

    PubMed

    Dong, S-S; Guo, Y; Zhu, D-L; Chen, X-F; Wu, X-M; Shen, H; Chen, X-D; Tan, L-J; Tian, Q; Deng, H-W; Yang, T-L

    2016-07-01

    With ENCODE epigenomic data and results from published genome-wide association studies (GWASs), we aimed to find regulatory signatures of obesity genes and discover novel susceptibility genes. Obesity genes were obtained from public GWAS databases and their promoters were annotated based on the regulatory element information. Significantly enriched or depleted epigenomic elements in the promoters of obesity genes were evaluated and all human genes were then prioritized according to the existence of the selected elements to predict new candidate genes. Top-ranked genes were subsequently applied to validate their associations with obesity-related traits in three independent in-house GWAS samples. We identified RAD21 and EZH2 as over-represented, and STAT2 (signal transducer and activator of transcription 2) and IRF3 (interferon regulatory transcription factor 3) as depleted transcription factors. Histone modification of H3K9me3 and chromatin state segmentation of 'poised promoter' and 'repressed' were over-represented. All genes were prioritized and we selected the top five genes for validation at the population level. Combining results from the three GWAS samples, rs7522101 in ESRRG (estrogen-related receptor-γ) remained significantly associated with body mass index after multiple testing corrections (P=7.25 × 10(-5)). It was also associated with β-cell function (P=1.99 × 10(-3)) and fasting glucose level (P<0.05) in the meta-analyses of glucose and insulin-related traits consortium (MAGIC) data set.Cnoclusions:In summary, we identified epigenomic characteristics for obesity genes and suggested ESRRG as a novel obesity-susceptibility gene.

  15. Identifying and applying psychological theory to setting and achieving rehabilitation goals.

    PubMed

    Scobbie, Lesley; Wyke, Sally; Dixon, Diane

    2009-04-01

    Goal setting is considered to be a fundamental part of rehabilitation; however, theories of behaviour change relevant to goal-setting practice have not been comprehensively reviewed. (i) To identify and discuss specific theories of behaviour change relevant to goal-setting practice in the rehabilitation setting. (ii) To identify 'candidate' theories that that offer most potential to inform clinical practice. The rehabilitation and self-management literature was systematically searched to identify review papers or empirical studies that proposed a specific theory of behaviour change relevant to setting and/or achieving goals in a clinical context. Data from included papers were extracted under the headings of: key constructs, clinical application and empirical support. Twenty-four papers were included in the review which proposed a total of five theories: (i) social cognitive theory, (ii) goal setting theory, (iii) health action process approach, (iv) proactive coping theory, and (v) the self-regulatory model of illness behaviour. The first three of these theories demonstrated most potential to inform clinical practice, on the basis of their capacity to inform interventions that resulted in improved patient outcomes. Social cognitive theory, goal setting theory and the health action process approach are theories of behaviour change that can inform clinicians in the process of setting and achieving goals in the rehabilitation setting. Overlapping constructs within these theories have been identified, and can be applied in clinical practice through the development and evaluation of a goal-setting practice framework.

  16. Gene integrated set profile analysis: a context-based approach for inferring biological endpoints

    PubMed Central

    Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.

    2016-01-01

    The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710

  17. Gene expression profiling of prostate tissue identifies chromatin regulation as a potential link between obesity and lethal prostate cancer.

    PubMed

    Ebot, Ericka M; Gerke, Travis; Labbé, David P; Sinnott, Jennifer A; Zadra, Giorgia; Rider, Jennifer R; Tyekucheva, Svitlana; Wilson, Kathryn M; Kelly, Rachel S; Shui, Irene M; Loda, Massimo; Kantoff, Philip W; Finn, Stephen; Vander Heiden, Matthew G; Brown, Myles; Giovannucci, Edward L; Mucci, Lorelei A

    2017-11-01

    Obese men are at higher risk of advanced prostate cancer and cancer-specific mortality; however, the biology underlying this association remains unclear. This study examined gene expression profiles of prostate tissue to identify biological processes differentially expressed by obesity status and lethal prostate cancer. Gene expression profiling was performed on tumor (n = 402) and adjacent normal (n = 200) prostate tissue from participants in 2 prospective cohorts who had been diagnosed with prostate cancer from 1982 to 2005. Body mass index (BMI) was calculated from the questionnaire immediately preceding cancer diagnosis. Men were followed for metastases or prostate cancer-specific death (lethal disease) through 2011. Gene Ontology biological processes differentially expressed by BMI were identified using gene set enrichment analysis. Pathway scores were computed by averaging the signal intensities of member genes. Odds ratios (ORs) for lethal prostate cancer were estimated with logistic regression. Among 402 men, 48% were healthy weight, 31% were overweight, and 21% were very overweight/obese. Fifteen gene sets were enriched in tumor tissue, but not normal tissue, of very overweight/obese men versus healthy-weight men; 5 of these were related to chromatin modification and remodeling (false-discovery rate < 0.25). Patients with high tumor expression of chromatin-related genes had worse clinical characteristics (Gleason grade > 7, 41% vs 17%; P = 2 × 10 -4 ) and an increased risk of lethal disease that was independent of grade and stage (OR, 5.26; 95% confidence interval, 2.37-12.25). This study improves our understanding of the biology of aggressive prostate cancer and identifies a potential mechanistic link between obesity and prostate cancer death that warrants further study. Cancer 2017;123:4130-4138. © 2017 American Cancer Society. © 2017 American Cancer Society.

  18. Gene-Trap Mutagenesis Identifies Mammalian Genes Contributing to Intoxication by Clostridium perfringens ε-Toxin

    PubMed Central

    Ivie, Susan E.; Fennessey, Christine M.; Sheng, Jinsong; Rubin, Donald H.; McClain, Mark S.

    2011-01-01

    The Clostridium perfringens ε-toxin is an extremely potent toxin associated with lethal toxemias in domesticated ruminants and may be toxic to humans. Intoxication results in fluid accumulation in various tissues, most notably in the brain and kidneys. Previous studies suggest that the toxin is a pore-forming toxin, leading to dysregulated ion homeostasis and ultimately cell death. However, mammalian host factors that likely contribute to ε-toxin-induced cytotoxicity are poorly understood. A library of insertional mutant Madin Darby canine kidney (MDCK) cells, which are highly susceptible to the lethal affects of ε-toxin, was used to select clones of cells resistant to ε-toxin-induced cytotoxicity. The genes mutated in 9 surviving resistant cell clones were identified. We focused additional experiments on one of the identified genes as a means of validating the experimental approach. Gene expression microarray analysis revealed that one of the identified genes, hepatitis A virus cellular receptor 1 (HAVCR1, KIM-1, TIM1), is more abundantly expressed in human kidney cell lines than it is expressed in human cells known to be resistant to ε-toxin. One human kidney cell line, ACHN, was found to be sensitive to the toxin and expresses a larger isoform of the HAVCR1 protein than the HAVCR1 protein expressed by other, toxin-resistant human kidney cell lines. RNA interference studies in MDCK and in ACHN cells confirmed that HAVCR1 contributes to ε-toxin-induced cytotoxicity. Additionally, ε-toxin was shown to bind to HAVCR1 in vitro. The results of this study indicate that HAVCR1 and the other genes identified through the use of gene-trap mutagenesis and RNA interference strategies represent important targets for investigation of the process by which ε-toxin induces cell death and new targets for potential therapeutic intervention. PMID:21412435

  19. Gene-trap mutagenesis identifies mammalian genes contributing to intoxication by Clostridium perfringens ε-toxin.

    PubMed

    Ivie, Susan E; Fennessey, Christine M; Sheng, Jinsong; Rubin, Donald H; McClain, Mark S

    2011-03-11

    The Clostridium perfringens ε-toxin is an extremely potent toxin associated with lethal toxemias in domesticated ruminants and may be toxic to humans. Intoxication results in fluid accumulation in various tissues, most notably in the brain and kidneys. Previous studies suggest that the toxin is a pore-forming toxin, leading to dysregulated ion homeostasis and ultimately cell death. However, mammalian host factors that likely contribute to ε-toxin-induced cytotoxicity are poorly understood. A library of insertional mutant Madin Darby canine kidney (MDCK) cells, which are highly susceptible to the lethal affects of ε-toxin, was used to select clones of cells resistant to ε-toxin-induced cytotoxicity. The genes mutated in 9 surviving resistant cell clones were identified. We focused additional experiments on one of the identified genes as a means of validating the experimental approach. Gene expression microarray analysis revealed that one of the identified genes, hepatitis A virus cellular receptor 1 (HAVCR1, KIM-1, TIM1), is more abundantly expressed in human kidney cell lines than it is expressed in human cells known to be resistant to ε-toxin. One human kidney cell line, ACHN, was found to be sensitive to the toxin and expresses a larger isoform of the HAVCR1 protein than the HAVCR1 protein expressed by other, toxin-resistant human kidney cell lines. RNA interference studies in MDCK and in ACHN cells confirmed that HAVCR1 contributes to ε-toxin-induced cytotoxicity. Additionally, ε-toxin was shown to bind to HAVCR1 in vitro. The results of this study indicate that HAVCR1 and the other genes identified through the use of gene-trap mutagenesis and RNA interference strategies represent important targets for investigation of the process by which ε-toxin induces cell death and new targets for potential therapeutic intervention.

  20. A Penalized Robust Method for Identifying Gene-Environment Interactions

    PubMed Central

    Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Xie, Yang; Ma, Shuangge

    2015-01-01

    In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model mis-specification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications. PMID:24616063

  1. Systemic Response to Microgravity: Utilizing GeneLab Datasets to Identify Molecular Targets for Future Hypotheses-Driven Spaceflight Studies

    NASA Technical Reports Server (NTRS)

    Beheshti, Afshin; Ray, Shayoni; Fogle, Homer; Berrios, Daniel C.; Costes, Sylvain V.

    2017-01-01

    Biological risks associated with microgravity are a major concern for long-term space travel. Although determination of risk has been a focus for NASA research, data examining systemic (i.e., multi- or pan-tissue) responses to space flight are sparse. To perform our analysis, we utilized the NASA GeneLab database which is a publicly available repository containing a wide array of omics results from experiments conducted with: i) with different flight conditions (space shuttle (STS) missions vs. International Space Station (ISS); ii) a variety of tissues; and 3) assays that measure epigenetic, transcriptional, and protein expression changes. Meta-analysis of the transcriptomic data from 7 different murine and rat data sets, examining tissues such as liver, kidney, adrenal gland, thymus, mammary gland, skin, and skeletal muscle (soleus, extensor digitorum longus, tibialis anterior, quadriceps, and gastrocnemius) revealed for the first time, the existence of potential master regulators coordinating systemic responses to microgravity in rodents. We identified p53, TGF(beta)1 and immune related pathways as the highly prevalent pan-tissue signaling pathways that are affected by microgravity. Some variability in the degree of change in their expression across species, strain and time of flight was also observed. Interestingly, while certain skeletal muscle (gastrocnemius and soleus) exhibited an overall down-regulation of these genes, some other muscle types such as the extensor digitorum longus, tibialis anterior and quadriceps, showed an up-regulated expression, indicative of potential compensatory mechanisms to prevent microgravity-induced atrophy. Key genes isolated by unbiased systems analyses displayed a major overlap between tissue types and flight conditions and established TGF(beta)1 to be the most connected gene across all data sets. Finally, a set of microgravity responsive miRNA signature was identified and based on their predicted functional state and

  2. Suppression subtractive hybridization and comparative expression analysis to identify developmentally regulated genes in filamentous fungi.

    PubMed

    Gesing, Stefan; Schindler, Daniel; Nowrousian, Minou

    2013-09-01

    Ascomycetes differentiate four major morphological types of fruiting bodies (apothecia, perithecia, pseudothecia and cleistothecia) that are derived from an ancestral fruiting body. Thus, fruiting body differentiation is most likely controlled by a set of common core genes. One way to identify such genes is to search for genes with evolutionary conserved expression patterns. Using suppression subtractive hybridization (SSH), we selected differentially expressed transcripts in Pyronema confluens (Pezizales) by comparing two cDNA libraries specific for sexual and for vegetative development, respectively. The expression patterns of selected genes from both libraries were verified by quantitative real time PCR. Expression of several corresponding homologous genes was found to be conserved in two members of the Sordariales (Sordaria macrospora and Neurospora crassa), a derived group of ascomycetes that is only distantly related to the Pezizales. Knockout studies with N. crassa orthologues of differentially regulated genes revealed a functional role during fruiting body development for the gene NCU05079, encoding a putative MFS peptide transporter. These data indicate conserved gene expression patterns and a functional role of the corresponding genes during fruiting body development; such genes are candidates of choice for further functional analysis. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Predicting hepatocellular carcinoma through cross-talk genes identified by risk pathways

    PubMed Central

    Shao, Zhuo; Huo, Diwei; Zhang, Denan; Xie, Hongbo; Yang, Jingbo; Liu, Qiuqi; Chen, Xiujie

    2018-01-01

    Hepatocellular carcinoma (HCC) is the most frequent type of liver cancer with poor survival rate and high mortality. Despite efforts on the mechanism of HCC, new molecular markers are needed for exact diagnosis, evaluation and treatment. Here, we combined transcriptome of HCC with networks and pathways to identify reliable molecular markers. Through integrating 249 differentially expressed genes with syncretic protein interaction networks, we constructed a HCC-specific network, from which we further extracted 480 pivotal genes. Based on the cross-talk between the enriched pathways of the pivotal genes, we finally identified a HCC signature of 45 genes, which could accurately distinguish HCC patients with normal individuals and reveal the prognosis of HCC patients. Among these 45 genes, 15 showed dysregulated expression patterns and a part have been reported to be associated with HCC and/or other cancers. These findings suggested that our identified 45 gene signature could be potential and valuable molecular markers for diagnosis and evaluation of HCC. PMID:29765536

  4. Inferring Gene Family Histories in Yeast Identifies Lineage Specific Expansions

    PubMed Central

    Ames, Ryan M.; Money, Daniel; Lovell, Simon C.

    2014-01-01

    The complement of genes found in the genome is a balance between gene gain and gene loss. Knowledge of the specific genes that are gained and lost over evolutionary time allows an understanding of the evolution of biological functions. Here we use new evolutionary models to infer gene family histories across complete yeast genomes; these models allow us to estimate the relative genome-wide rates of gene birth, death, innovation and extinction (loss of an entire family) for the first time. We show that the rates of gene family evolution vary both between gene families and between species. We are also able to identify those families that have experienced rapid lineage specific expansion/contraction and show that these families are enriched for specific functions. Moreover, we find that families with specific functions are repeatedly expanded in multiple species, suggesting the presence of common adaptations and that these family expansions/contractions are not random. Additionally, we identify potential specialisations, unique to specific species, in the functions of lineage specific expanded families. These results suggest that an important mechanism in the evolution of genome content is the presence of lineage-specific gene family changes. PMID:24921666

  5. Defining the optimal animal model for translational research using gene set enrichment analysis.

    PubMed

    Weidner, Christopher; Steinfath, Matthias; Opitz, Elisa; Oelgeschläger, Michael; Schönfelder, Gilbert

    2016-08-01

    The mouse is the main model organism used to study the functions of human genes because most biological processes in the mouse are highly conserved in humans. Recent reports that compared identical transcriptomic datasets of human inflammatory diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. To reduce susceptibility to biased interpretation, all genes of interest for the biological question under investigation should be considered. Thus, standardized approaches for systematic data analysis are needed. We analyzed the same datasets using gene set enrichment analysis focusing on pathways assigned to inflammatory processes in either humans or mice. The analyses revealed a moderate overlap between all human and mouse datasets, with average positive and negative predictive values of 48 and 57% significant correlations. Subgroups of the septic mouse models (i.e., Staphylococcus aureus injection) correlated very well with most human studies. These findings support the applicability of targeted strategies to identify the optimal animal model and protocol to improve the success of translational research. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.

  6. Investigating the different mechanisms of genotoxic and non-genotoxic carcinogens by a gene set analysis.

    PubMed

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed

  7. Investigating the Different Mechanisms of Genotoxic and Non-Genotoxic Carcinogens by a Gene Set Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed

  8. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

    PubMed

    Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

    2011-08-01

    The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Genomic convergence to identify candidate genes for Alzheimer disease on chromosome 10

    PubMed Central

    Liang, Xueying; Slifer, Michael; Martin, Eden R.; Schnetz-Boutaud, Nathalie; Bartlett, Jackie; Anderson, Brent; Züchner, Stephan; Gwirtsman, Harry; Gilbert, John R.; Pericak-Vance, Margaret A.; Haines, Jonathan L.

    2009-01-01

    A broad region of chromosome 10 (chr10) has engendered continued interest in the etiology of late-onset Alzheimer Disease (LOAD) from both linkage and candidate gene studies. However, there is a very extensive heterogeneity on chr10. We converged linkage analysis and gene expression data using the concept of genomic convergence that suggests that genes showing positive results across multiple different data types are more likely to be involved in AD. We identified and examined 28 genes on chr10 for association with AD in a Caucasian case-control dataset of 506 cases and 558 controls with substantial clinical information. The cases were all LOAD (minimum age at onset ≥ 60 years). Both single marker and haplotypic associations were tested in the overall dataset and 8 subsets defined by age, gender, ApoE and clinical status. PTPLA showed allelic, genotypic and haplotypic association in the overall dataset. SORCS1 was significant in the overall data sets (p=0.0025) and most significant in the female subset (allelic association p=0.00002, a 3-locus haplotype had p=0.0005). Odds Ratio of SORCS1 in the female subset was 1.7 (p<0.0001). SORCS1 is an interesting candidate gene involved in the Aβ pathway. Therefore, genetic variations in PTPLA and SORCS1 may be associated and have modest effect to the risk of AD by affecting Aβ pathway. The replication of the effect of these genes in different study populations and search for susceptible variants and functional studies of these genes are necessary to get a better understanding of the roles of the genes in Alzheimer disease. PMID:19241460

  10. The genetics of alcoholism: identifying specific genes through family studies.

    PubMed

    Edenberg, Howard J; Foroud, Tatiana

    2006-09-01

    Alcoholism is a complex disorder with both genetic and environmental risk factors. Studies in humans have begun to elucidate the genetic underpinnings of the risk for alcoholism. Here we briefly review strategies for identifying individual genes in which variations affect the risk for alcoholism and related phenotypes, in the context of one large study that has successfully identified such genes. The Collaborative Study on the Genetics of Alcoholism (COGA) is a family-based study that has collected detailed phenotypic data on individuals in families with multiple alcoholic members. A genome-wide linkage approach led to the identification of chromosomal regions containing genes that influenced alcoholism risk and related phenotypes. Subsequently, single nucleotide polymorphisms (SNPs) were genotyped in positional candidate genes located within the linked chromosomal regions, and analyzed for association with these phenotypes. Using this sequential approach, COGA has detected association with GABRA2, CHRM2 and ADH4; these associations have all been replicated by other researchers. COGA has detected association to additional genes including GABRG3, TAS2R16, SNCA, OPRK1 and PDYN, results that are awaiting confirmation. These successes demonstrate that genes contributing to the risk for alcoholism can be reliably identified using human subjects.

  11. Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data.

    PubMed

    Paisitkriangkrai, Sakrapee; Quek, Kelly; Nievergall, Eva; Jabbour, Anissa; Zannettino, Andrew; Kok, Chung Hoow

    2018-06-07

    Recurrent oncogenic fusion genes play a critical role in the development of various cancers and diseases and provide, in some cases, excellent therapeutic targets. To date, analysis tools that can identify and compare recurrent fusion genes across multiple samples have not been available to researchers. To address this deficiency, we developed Co-occurrence Fusion (Co-fuse), a new and easy to use software tool that enables biologists to merge RNA-seq information, allowing them to identify recurrent fusion genes, without the need for exhaustive data processing. Notably, Co-fuse is based on pattern mining and statistical analysis which enables the identification of hidden patterns of recurrent fusion genes. In this report, we show that Co-fuse can be used to identify 2 distinct groups within a set of 49 leukemic cell lines based on their recurrent fusion genes: a multiple myeloma (MM) samples-enriched cluster and an acute myeloid leukemia (AML) samples-enriched cluster. Our experimental results further demonstrate that Co-fuse can identify known driver fusion genes (e.g., IGH-MYC, IGH-WHSC1) in MM, when compared to AML samples, indicating the potential of Co-fuse to aid the discovery of yet unknown driver fusion genes through cohort comparisons. Additionally, using a 272 primary glioma sample RNA-seq dataset, Co-fuse was able to validate recurrent fusion genes, further demonstrating the power of this analysis tool to identify recurrent fusion genes. Taken together, Co-fuse is a powerful new analysis tool that can be readily applied to large RNA-seq datasets, and may lead to the discovery of new disease subgroups and potentially new driver genes, for which, targeted therapies could be developed. The Co-fuse R source code is publicly available at https://github.com/sakrapee/co-fuse .

  12. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we

  13. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions.

    PubMed

    Gundersen, Gregory W; Jones, Matthew R; Rouillard, Andrew D; Kou, Yan; Monteiro, Caroline D; Feldmann, Axel S; Hu, Kevin S; Ma'ayan, Avi

    2015-09-15

    Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/. avi.maayan@mssm.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus.

    PubMed

    Lu, Yao; Deng, Jingyuan; Rhodes, Judith C; Lu, Hui; Lu, Long Jason

    2014-06-01

    Aspergillus fumigatus (Af) is a ubiquitous and opportunistic pathogen capable of causing acute, invasive pulmonary disease in susceptible hosts. Despite current therapeutic options, mortality associated with invasive Af infections remains unacceptably high, increasing 357% since 1980. Therefore, there is an urgent need for the development of novel therapeutic strategies, including more efficacious drugs acting on new targets. Thus, as noted in a recent review, "the identification of essential genes in fungi represents a crucial step in the development of new antifungal drugs". Expanding the target space by rapidly identifying new essential genes has thus been described as "the most important task of genomics-based target validation". In previous research, we were the first to show that essential gene annotation can be reliably transferred between distantly related four Prokaryotic species. In this study, we extend our machine learning approach to the much more complex Eukaryotic fungal species. A compendium of essential genes is predicted in Af by transferring known essential gene annotations from another filamentous fungus Neurospora crassa. This approach predicts essential genes by integrating diverse types of intrinsic and context-dependent genomic features encoded in microbial genomes. The predicted essential datasets contained 1674 genes. We validated our results by comparing our predictions with known essential genes in Af, comparing our predictions with those predicted by homology mapping, and conducting conditional expressed alleles. We applied several layers of filters and selected a set of potential drug targets from the predicted essential genes. Finally, we have conducted wet lab knockout experiments to verify our predictions, which further validates the accuracy and wide applicability of the machine learning approach. The approach presented here significantly extended our ability to predict essential genes beyond orthologs and made it possible to

  15. Integrated Analysis of Mutation Data from Various Sources Identifies Key Genes and Signaling Pathways in Hepatocellular Carcinoma

    PubMed Central

    Wei, Lin; Tang, Ruqi; Lian, Baofeng; Zhao, Yingjun; He, Xianghuo; Xie, Lu

    2014-01-01

    Background Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored. Principal Findings In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features. Conclusions Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers. PMID:24988079

  16. Integrated analysis of mutation data from various sources identifies key genes and signaling pathways in hepatocellular carcinoma.

    PubMed

    Zhang, Yuannv; Qiu, Zhaoping; Wei, Lin; Tang, Ruqi; Lian, Baofeng; Zhao, Yingjun; He, Xianghuo; Xie, Lu

    2014-01-01

    Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored. In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features. Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers.

  17. Gene Signature in Sessile Serrated Polyps Identifies Colon Cancer Subtype

    PubMed Central

    Kanth, Priyanka; Bronner, Mary P.; Boucher, Kenneth M.; Burt, Randall W.; Neklason, Deborah W.; Hagedorn, Curt H.; Delker, Don A.

    2016-01-01

    Sessile serrated colon adenoma/polyps (SSA/Ps) are found during routine screening colonoscopy and may account for 20–30% of colon cancers. However, differentiating SSA/Ps from hyperplastic polyps (HP) with little risk of cancer is challenging and complementary molecular markers are needed. Additionally, the molecular mechanisms of colon cancer development from SSA/Ps are poorly understood. RNA sequencing was performed on 21 SSA/Ps, 10 HPs, 10 adenomas, 21 uninvolved colon and 20 control colon specimens. Differential expression and leave-one-out cross validation methods were used to define a unique gene signature of SSA/Ps. Our SSA/P gene signature was evaluated in colon cancer RNA-Seq data from The Cancer Genome Atlas (TCGA) to identify a subtype of colon cancers that may develop from SSA/Ps. A total of 1422 differentially expressed genes were found in SSA/Ps relative to controls. Serrated polyposis syndrome (n=12) and sporadic SSA/Ps (n=9) exhibited almost complete (96%) gene overlap. A 51-gene panel in SSA/P showed similar expression in a subset of TCGA colon cancers with high microsatellite instability (MSI-H). A smaller seven-gene panel showed high sensitivity and specificity in identifying BRAF mutant, CpG island methylator phenotype high (CIMP-H) and MLH1 silenced colon cancers. We describe a unique gene signature in SSA/Ps that identifies a subset of colon cancers likely to develop through the serrated pathway. These gene panels may be utilized for improved differentiation of SSA/Ps from HPs and provide insights into novel molecular pathways altered in colon cancer arising from the serrated pathway. PMID:27026680

  18. Network-Based Integration of GWAS and Gene Expression Identifies a HOX-Centric Network Associated with Serous Ovarian Cancer Risk.

    PubMed

    Kar, Siddhartha P; Tyrer, Jonathan P; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S; Cramer, Daniel; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K; Kelemen, Linda E; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Iain A; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B; Narod, Steven A; Nedergaard, Lotte; Ness, Roberta B; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schildkraut, Joellen M; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Sucheston-Campbell, Lara E; Tangen, Ingvild L; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S; van Altena, Anne M; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A; Monteiro, Alvaro N A; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P

    2015-10-01

    Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by coexpression may also be enriched for additional EOC risk associations. We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly coexpressed with each selected TF gene in the unified microarray dataset of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this dataset were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P < 0.05 and FDR < 0.05). These results were replicated (P < 0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Network analysis integrating large, context-specific datasets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. ©2015 American Association for Cancer Research.

  19. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  20. Identifying prognostic signature in ovarian cancer using DirGenerank

    PubMed Central

    Wang, Jian-Yong; Chen, Ling-Ling; Zhou, Xiong-Hui

    2017-01-01

    Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes’ correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients. PMID:28615526

  1. Phenoscape: Identifying Candidate Genes for Evolutionary Phenotypes

    PubMed Central

    Edmunds, Richard C.; Su, Baofeng; Balhoff, James P.; Eames, B. Frank; Dahdul, Wasila M.; Lapp, Hilmar; Lundberg, John G.; Vision, Todd J.; Dunham, Rex A.; Mabee, Paula M.; Westerfield, Monte

    2016-01-01

    Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology. PMID:26500251

  2. Common Marker Genes Identified from Various Sample Types for Systemic Lupus Erythematosus.

    PubMed

    Bing, Peng-Fei; Xia, Wei; Wang, Lan; Zhang, Yong-Hong; Lei, Shu-Feng; Deng, Fei-Yan

    2016-01-01

    Systemic lupus erythematosus (SLE) is a complex auto-immune disease. Gene expression studies have been conducted to identify SLE-related genes in various types of samples. It is unknown whether there are common marker genes significant for SLE but independent of sample types, which may have potentials for follow-up translational research. The aim of this study is to identify common marker genes across various sample types for SLE. Based on four public microarray gene expression datasets for SLE covering three representative types of blood-born samples (monocyte; peripheral blood mononuclear cell, PBMC; whole blood), we utilized three statistics (fold-change, FC; t-test p value; false discovery rate adjusted p value) to scrutinize genes simultaneously regulated with SLE across various sample types. For common marker genes, we conducted the Gene Ontology enrichment analysis and Protein-Protein Interaction analysis to gain insights into their functions. We identified 10 common marker genes associated with SLE (IFI6, IFI27, IFI44L, OAS1, OAS2, EIF2AK2, PLSCR1, STAT1, RNASE2, and GSTO1). Significant up-regulation of IFI6, IFI27, and IFI44L with SLE was observed in all the studied sample types, though the FC was most striking in monocyte, compared with PBMC and whole blood (8.82-251.66 vs. 3.73-74.05 vs. 1.19-1.87). Eight of the above 10 genes, except RNASE2 and GSTO1, interact with each other and with known SLE susceptibility genes, participate in immune response, RNA and protein catabolism, and cell death. Our data suggest that there exist common marker genes across various sample types for SLE. The 10 common marker genes, identified herein, deserve follow-up studies to dissert their potentials as diagnostic or therapeutic markers to predict SLE or treatment response.

  3. Transcriptional dissection of melanoma identifies a high-risk subtype underlying TP53 family genes and epigenome deregulation

    PubMed Central

    Badal, Brateil; Solovyov, Alexander; Di Cecilia, Serena; Chan, Joseph Minhow; Chang, Li-Wei; Iqbal, Ramiz; Aydin, Iraz T.; Rajan, Geena S.; Chen, Chen; Abbate, Franco; Arora, Kshitij S.; Tanne, Antoine; Gruber, Stephen B.; Johnson, Timothy M.; Fullen, Douglas R.; Phelps, Robert; Bhardwaj, Nina; Bernstein, Emily; Ting, David T.; Brunner, Georg; Schadt, Eric E.; Greenbaum, Benjamin D.; Celebi, Julide Tok

    2017-01-01

    BACKGROUND. Melanoma is a heterogeneous malignancy. We set out to identify the molecular underpinnings of high-risk melanomas, those that are likely to progress rapidly, metastasize, and result in poor outcomes. METHODS. We examined transcriptome changes from benign states to early-, intermediate-, and late-stage tumors using a set of 78 treatment-naive melanocytic tumors consisting of primary melanomas of the skin and benign melanocytic lesions. We utilized a next-generation sequencing platform that enabled a comprehensive analysis of protein-coding and -noncoding RNA transcripts. RESULTS. Gene expression changes unequivocally discriminated between benign and malignant states, and a dual epigenetic and immune signature emerged defining this transition. To our knowledge, we discovered previously unrecognized melanoma subtypes. A high-risk primary melanoma subset was distinguished by a 122-epigenetic gene signature (“epigenetic” cluster) and TP53 family gene deregulation (TP53, TP63, and TP73). This subtype associated with poor overall survival and showed enrichment of cell cycle genes. Noncoding repetitive element transcripts (LINEs, SINEs, and ERVs) that can result in immunostimulatory signals recapitulating a state of “viral mimicry” were significantly repressed. The high-risk subtype and its poor predictive characteristics were validated in several independent cohorts. Additionally, primary melanomas distinguished by specific immune signatures (“immune” clusters) were identified. CONCLUSION. The TP53 family of genes and genes regulating the epigenetic machinery demonstrate strong prognostic and biological relevance during progression of early disease. Gene expression profiling of protein-coding and -noncoding RNA transcripts may be a better predictor for disease course in melanoma. This study outlines the transcriptional interplay of the cancer cell’s epigenome with the immune milieu with potential for future therapeutic targeting. FUNDING

  4. Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill

    2016-01-01

    Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological

  5. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes.

    PubMed

    Law, MeiYee; Childs, Kevin L; Campbell, Michael S; Stein, Joshua C; Olson, Andrew J; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M; Lawrence, Carolyn J; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

    2015-01-01

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. © 2015 American Society of Plant Biologists. All Rights Reserved.

  6. Genetic study of congenital bile-duct dilatation identifies de novo and inherited variants in functionally related genes.

    PubMed

    Wong, John K L; Campbell, Desmond; Ngo, Ngoc Diem; Yeung, Fanny; Cheng, Guo; Tang, Clara S M; Chung, Patrick H Y; Tran, Ngoc Son; So, Man-Ting; Cherny, Stacey S; Sham, Pak C; Tam, Paul K; Garcia-Barcelo, Maria-Mercè

    2016-12-12

    Congenital dilatation of the bile-duct (CDD) is a rare, mostly sporadic, disorder that results in bile retention with severe associated complications. CDD affects mainly Asians. To our knowledge, no genetic study has ever been conducted. We aim to identify genetic risk factors by a "trio-based" exome-sequencing approach, whereby 31 CDD probands and their unaffected parents were exome-sequenced. Seven-hundred controls from the local population were used to detect gene-sets significantly enriched with rare variants in CDD patients. Twenty-one predicted damaging de novo variants (DNVs; 4 protein truncating and 17 missense) were identified in several evolutionarily constrained genes (p < 0.01). Six genes carrying DNVs were associated with human developmental disorders involving epithelial, connective or bone morphologies (PXDN, RTEL1, ANKRD11, MAP2K1, CYLD, ACAN) and four linked with cholangio- and hepatocellular carcinomas (PIK3CA, TLN1 CYLD, MAP2K1). Importantly, CDD patients have an excess of DNVs in cancer-related genes (p < 0.025). Thirteen genes were recurrently mutated at different sites, forming compound heterozygotes or functionally related complexes within patients. Our data supports a strong genetic basis for CDD and show that CDD is not only genetically heterogeneous but also non-monogenic, requiring mutations in more than one genes for the disease to develop. The data is consistent with the rarity and sporadic presentation of CDD.

  7. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer.

    PubMed

    Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia

    2015-06-01

    To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Mapping autosomal recessive intellectual disability: combined microarray and exome sequencing identifies 26 novel candidate genes in 192 consanguineous families.

    PubMed

    Harripaul, R; Vasli, N; Mikhailov, A; Rafiq, M A; Mittal, K; Windpassinger, C; Sheikh, T I; Noor, A; Mahmood, H; Downey, S; Johnson, M; Vleuten, K; Bell, L; Ilyas, M; Khan, F S; Khan, V; Moradi, M; Ayaz, M; Naeem, F; Heidari, A; Ahmed, I; Ghadami, S; Agha, Z; Zeinali, S; Qamar, R; Mozhdehipanah, H; John, P; Mir, A; Ansar, M; French, L; Ayub, M; Vincent, J B

    2018-04-01

    Approximately 1% of the global population is affected by intellectual disability (ID), and the majority receive no molecular diagnosis. Previous studies have indicated high levels of genetic heterogeneity, with estimates of more than 2500 autosomal ID genes, the majority of which are autosomal recessive (AR). Here, we combined microarray genotyping, homozygosity-by-descent (HBD) mapping, copy number variation (CNV) analysis, and whole exome sequencing (WES) to identify disease genes/mutations in 192 multiplex Pakistani and Iranian consanguineous families with non-syndromic ID. We identified definite or candidate mutations (or CNVs) in 51% of families in 72 different genes, including 26 not previously reported for ARID. The new ARID genes include nine with loss-of-function mutations (ABI2, MAPK8, MPDZ, PIDD1, SLAIN1, TBC1D23, TRAPPC6B, UBA7 and USP44), and missense mutations include the first reports of variants in BDNF or TET1 associated with ID. The genes identified also showed overlap with de novo gene sets for other neuropsychiatric disorders. Transcriptional studies showed prominent expression in the prenatal brain. The high yield of AR mutations for ID indicated that this approach has excellent clinical potential and should inform clinical diagnostics, including clinical whole exome and genome sequencing, for populations in which consanguinity is common. As with other AR disorders, the relevance will also apply to outbred populations.

  9. Three gene expression vector sets for concurrently expressing multiple genes in Saccharomyces cerevisiae.

    PubMed

    Ishii, Jun; Kondo, Takashi; Makino, Harumi; Ogura, Akira; Matsuda, Fumio; Kondo, Akihiko

    2014-05-01

    Yeast has the potential to be used in bulk-scale fermentative production of fuels and chemicals due to its tolerance for low pH and robustness for autolysis. However, expression of multiple external genes in one host yeast strain is considerably labor-intensive due to the lack of polycistronic transcription. To promote the metabolic engineering of yeast, we generated systematic and convenient genetic engineering tools to express multiple genes in Saccharomyces cerevisiae. We constructed a series of multi-copy and integration vector sets for concurrently expressing two or three genes in S. cerevisiae by embedding three classical promoters. The comparative expression capabilities of the constructed vectors were monitored with green fluorescent protein, and the concurrent expression of genes was monitored with three different fluorescent proteins. Our multiple gene expression tool will be helpful to the advanced construction of genetically engineered yeast strains in a variety of research fields other than metabolic engineering. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  10. Gene regulatory network inference using fused LASSO on multiple data sets

    PubMed Central

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-01-01

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687

  11. Transcriptome analysis identifies genes involved in sex determination and development of Xenopus laevis gonads.

    PubMed

    Piprek, Rafal P; Damulewicz, Milena; Kloc, Malgorzata; Kubiak, Jacek Z

    Development of the gonads is a complex process, which starts with a period of undifferentiated, bipotential gonads. During this period the expression of sex-determining genes is initiated. Sex determination is a process triggering differentiation of the gonads into the testis or ovary. Sex determination period is followed by sexual differentiation, i.e. appearance of the first testis- and ovary-specific features. In Xenopus laevis W-linked DM-domain gene (DM-W) had been described as a master determinant of the gonadal female sex. However, the data on the expression and function of other genes participating in gonad development in X. laevis, and in anurans, in general, are very limited. We applied microarray technique to analyze the expression pattern of a subset of X. laevis genes previously identified to be involved in gonad development in several vertebrate species. We also analyzed the localization and the expression level of proteins encoded by these genes in developing X. laevis gonads. These analyses pointed to the set of genes differentially expressed in developing testes and ovaries. Gata4, Sox9, Dmrt1, Amh, Fgf9, Ptgds, Pdgf, Fshr, and Cyp17a1 expression was upregulated in developing testes, while DM-W, Fst, Foxl2, and Cyp19a1 were upregulated in developing ovaries. We discuss the possible roles of these genes in development of X. laevis gonads. Copyright © 2018 International Society of Differentiation. Published by Elsevier B.V. All rights reserved.

  12. ICan: an integrated co-alteration network to identify ovarian cancer-related genes.

    PubMed

    Zhou, Yuanshuai; Liu, Yongjing; Li, Kening; Zhang, Rui; Qiu, Fujun; Zhao, Ning; Xu, Yan

    2015-01-01

    Over the last decade, an increasing number of integrative studies on cancer-related genes have been published. Integrative analyses aim to overcome the limitation of a single data type, and provide a more complete view of carcinogenesis. The vast majority of these studies used sample-matched data of gene expression and copy number to investigate the impact of copy number alteration on gene expression, and to predict and prioritize candidate oncogenes and tumor suppressor genes. However, correlations between genes were neglected in these studies. Our work aimed to evaluate the co-alteration of copy number, methylation and expression, allowing us to identify cancer-related genes and essential functional modules in cancer. We built the Integrated Co-alteration network (ICan) based on multi-omics data, and analyzed the network to uncover cancer-related genes. After comparison with random networks, we identified 155 ovarian cancer-related genes, including well-known (TP53, BRCA1, RB1 and PTEN) and also novel cancer-related genes, such as PDPN and EphA2. We compared the results with a conventional method: CNAmet, and obtained a significantly better area under the curve value (ICan: 0.8179, CNAmet: 0.5183). In this paper, we describe a framework to find cancer-related genes based on an Integrated Co-alteration network. Our results proved that ICan could precisely identify candidate cancer genes and provide increased mechanistic understanding of carcinogenesis. This work suggested a new research direction for biological network analyses involving multi-omics data.

  13. A recellularized human colon model identifies cancer driver genes

    PubMed Central

    Chen, Huanhuan Joyce; Wei, Zhubo; Sun, Jian; Bhattacharya, Asmita; Savage, David J; Serda, Rita; Mackeyev, Yuri; Curley, Steven A.; Bu, Pengcheng; Wang, Lihua; Chen, Shuibing; Cohen-Gould, Leona; Huang, Emina; Shen, Xiling; Lipkin, Steven M.; Copeland, Neal G.; Jenkins, Nancy A.; Shuler, Michael L.

    2016-01-01

    Refined cancer models are needed to bridge the gap between cell-line, animal and clinical research. Here we describe the engineering of an organotypic colon cancer model by recellularization of a native human matrix that contains cell-populated mucosa and an intact muscularis mucosa layer. This ex vivo system recapitulates the pathophysiological progression from APC-mutant neoplasia to submucosal invasive tumor. We used it to perform a Sleeping Beauty transposon mutagenesis screen to identify genes that cooperate with mutant APC in driving invasive neoplasia. 38 candidate invasion driver genes were identified, 17 of which have been previously implicated in colorectal cancer progression, including TCF7L2, TWIST2, MSH2, DCC and EPHB1/2. Six invasion driver genes that to our knowledge have not been previously described were validated in vitro using cell proliferation, migration and invasion assays, and ex vivo using recellularized human colon. These results demonstrate the utility of our organoid model for studying cancer biology. PMID:27398792

  14. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification.

    PubMed

    Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino

    2014-01-30

    The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.

  15. Identifying a set of influential spreaders in complex networks

    NASA Astrophysics Data System (ADS)

    Zhang, Jian-Xiong; Chen, Duan-Bing; Dong, Qiang; Zhao, Zhi-Dan

    2016-06-01

    Identifying a set of influential spreaders in complex networks plays a crucial role in effective information spreading. A simple strategy is to choose top-r ranked nodes as spreaders according to influence ranking method such as PageRank, ClusterRank and k-shell decomposition. Besides, some heuristic methods such as hill-climbing, SPIN, degree discount and independent set based are also proposed. However, these approaches suffer from a possibility that some spreaders are so close together that they overlap sphere of influence or time consuming. In this report, we present a simply yet effectively iterative method named VoteRank to identify a set of decentralized spreaders with the best spreading ability. In this approach, all nodes vote in a spreader in each turn, and the voting ability of neighbors of elected spreader will be decreased in subsequent turn. Experimental results on four real networks show that under Susceptible-Infected-Recovered (SIR) and Susceptible-Infected (SI) models, VoteRank outperforms the traditional benchmark methods on both spreading rate and final affected scale. What’s more, VoteRank has superior computational efficiency.

  16. Gene expression profiling combined with bioinformatics analysis identify biomarkers for Parkinson disease.

    PubMed

    Diao, Hongyu; Li, Xinxing; Hu, Sheng; Liu, Yunhui

    2012-01-01

    Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result.

  17. Gene Expression Profiling Combined with Bioinformatics Analysis Identify Biomarkers for Parkinson Disease

    PubMed Central

    Diao, Hongyu; Li, Xinxing; Hu, Sheng; Liu, Yunhui

    2012-01-01

    Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result. PMID:23284986

  18. Axon Regeneration Genes Identified by RNAi Screening in C. elegans

    PubMed Central

    Nix, Paola; Hammarlund, Marc; Hauth, Linda; Lachnit, Martina; Jorgensen, Erik M.

    2014-01-01

    Axons of the mammalian CNS lose the ability to regenerate soon after development due to both an inhibitory CNS environment and the loss of cell-intrinsic factors necessary for regeneration. The complex molecular events required for robust regeneration of mature neurons are not fully understood, particularly in vivo. To identify genes affecting axon regeneration in Caenorhabditis elegans, we performed both an RNAi-based screen for defective motor axon regeneration in unc-70/β-spectrin mutants and a candidate gene screen. From these screens, we identified at least 50 conserved genes with growth-promoting or growth-inhibiting functions. Through our analysis of mutants, we shed new light on certain aspects of regeneration, including the role of β-spectrin and membrane dynamics, the antagonistic activity of MAP kinase signaling pathways, and the role of stress in promoting axon regeneration. Many gene candidates had not previously been associated with axon regeneration and implicate new pathways of interest for therapeutic intervention. PMID:24403161

  19. Gene expression profiles analysis identifies key genes for acute lung injury in patients with sepsis.

    PubMed

    Guo, Zhiqiang; Zhao, Chuncheng; Wang, Zheng

    2014-09-26

    To identify critical genes and biological pathways in acute lung injury (ALI), a comparative analysis of gene expression profiles of patients with ALI + sepsis compared with patients with sepsis alone were performed with bioinformatic tools. GSE10474 was downloaded from Gene Expression Omnibus, including a collective of 13 whole blood samples with ALI + sepsis and 21 whole blood samples with sepsis alone. After pre-treatment with robust multichip averaging (RMA) method, differential analysis was conducted using simpleaffy package based upon t-test and fold change. Hierarchical clustering was also performed using function hclust from package stats. Beisides, functional enrichment analysis was conducted using iGepros. Moreover, the gene regulatory network was constructed with information from Kyoto Encyclopedia of Genes and Genomes (KEGG) and then visualized by Cytoscape. A total of 128 differentially expressed genes (DEGs) were identified, including 47 up- and 81 down-regulated genes. The significantly enriched functions included negative regulation of cell proliferation, regulation of response to stimulus and cellular component morphogenesis. A total of 27 DEGs were significantly enriched in 16 KEGG pathways, such as protein digestion and absorption, fatty acid metabolism, amoebiasis, etc. Furthermore, the regulatory network of these 27 DEGs was constructed, which involved several key genes, including protein tyrosine kinase 2 (PTK2), v-src avian sarcoma (SRC) and Caveolin 2 (CAV2). PTK2, SRC and CAV2 may be potential markers for diagnosis and treatment of ALI. The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/5865162912987143.

  20. Genome-wide histone state profiling of fibroblasts from the opossum, Monodelphis domestica, identifies the first marsupial-specific imprinted gene

    PubMed Central

    2014-01-01

    Background Imprinted genes have been extensively documented in eutherian mammals and found to exhibit significant interspecific variation in the suites of genes that are imprinted and in their regulation between tissues and developmental stages. Much less is known about imprinted loci in metatherian (marsupial) mammals, wherein studies have been limited to a small number of genes previously known to be imprinted in eutherians. We describe the first ab initio search for imprinted marsupial genes, in fibroblasts from the opossum, Monodelphis domestica, based on a genome-wide ChIP-seq strategy to identify promoters that are simultaneously marked by mutually exclusive, transcriptionally opposing histone modifications. Results We identified a novel imprinted gene (Meis1) and two additional monoallelically expressed genes, one of which (Cstb) showed allele-specific, but non-imprinted expression. Imprinted vs. allele-specific expression could not be resolved for the third monoallelically expressed gene (Rpl17). Transcriptionally opposing histone modifications H3K4me3, H3K9Ac, and H3K9me3 were found at the promoters of all three genes, but differential DNA methylation was not detected at CpG islands at any of these promoters. Conclusions In generating the first genome-wide histone modification profiles for a marsupial, we identified the first gene that is imprinted in a marsupial but not in eutherian mammals. This outcome demonstrates the practicality of an ab initio discovery strategy and implicates histone modification, but not differential DNA methylation, as a conserved mechanism for marking imprinted genes in all therian mammals. Our findings suggest that marsupials use multiple epigenetic mechanisms for imprinting and support the concept that lineage-specific selective forces can produce sets of imprinted genes that differ between metatherian and eutherian lines. PMID:24484454

  1. Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys

    PubMed Central

    Werner, Jeffrey J; Koren, Omry; Hugenholtz, Philip; DeSantis, Todd Z; Walters, William A; Caporaso, J Gregory; Angenent, Largus T; Knight, Rob; Ley, Ruth E

    2012-01-01

    Taxonomic classification of the thousands–millions of 16S rRNA gene sequences generated in microbiome studies is often achieved using a naïve Bayesian classifier (for example, the Ribosomal Database Project II (RDP) classifier), due to favorable trade-offs among automation, speed and accuracy. The resulting classification depends on the reference sequences and taxonomic hierarchy used to train the model; although the influence of primer sets and classification algorithms have been explored in detail, the influence of training set has not been characterized. We compared classification results obtained using three different publicly available databases as training sets, applied to five different bacterial 16S rRNA gene pyrosequencing data sets generated (from human body, mouse gut, python gut, soil and anaerobic digester samples). We observed numerous advantages to using the largest, most diverse training set available, that we constructed from the Greengenes (GG) bacterial/archaeal 16S rRNA gene sequence database and the latest GG taxonomy. Phylogenetic clusters of previously unclassified experimental sequences were identified with notable improvements (for example, 50% reduction in reads unclassified at the phylum level in mouse gut, soil and anaerobic digester samples), especially for phylotypes belonging to specific phyla (Tenericutes, Chloroflexi, Synergistetes and Candidate phyla TM6, TM7). Trimming the reference sequences to the primer region resulted in systematic improvements in classification depth, and greatest gains at higher confidence thresholds. Phylotypes unclassified at the genus level represented a greater proportion of the total community variation than classified operational taxonomic units in mouse gut and anaerobic digester samples, underscoring the need for greater diversity in existing reference databases. PMID:21716311

  2. Meta-analysis identifies five novel loci associated with endometriosis highlighting key genes involved in hormone metabolism

    PubMed Central

    Sapkota, Yadav; Steinthorsdottir, Valgerdur; Morris, Andrew P.; Fassbender, Amelie; Rahmioglu, Nilufer; De Vivo, Immaculata; Buring, Julie E.; Zhang, Futao; Edwards, Todd L.; Jones, Sarah; O, Dorien; Peterse, Daniëlle; Rexrode, Kathryn M.; Ridker, Paul M.; Schork, Andrew J.; MacGregor, Stuart; Martin, Nicholas G.; Becker, Christian M.; Adachi, Sosuke; Yoshihara, Kosuke; Enomoto, Takayuki; Takahashi, Atsushi; Kamatani, Yoichiro; Matsuda, Koichi; Kubo, Michiaki; Thorleifsson, Gudmar; Geirsson, Reynir T.; Thorsteinsdottir, Unnur; Wallace, Leanne M.; Werge, Thomas M.; Thompson, Wesley K.; Yang, Jian; Velez Edwards, Digna R.; Nyegaard, Mette; Low, Siew-Kee; Zondervan, Krina T.; Missmer, Stacey A.; D'Hooghe, Thomas; Montgomery, Grant W.; Chasman, Daniel I.; Stefansson, Kari; Tung, Joyce Y.; Nyholt, Dale R.

    2017-01-01

    Endometriosis is a heritable hormone-dependent gynecological disorder, associated with severe pelvic pain and reduced fertility; however, its molecular mechanisms remain largely unknown. Here we perform a meta-analysis of 11 genome-wide association case-control data sets, totalling 17,045 endometriosis cases and 191,596 controls. In addition to replicating previously reported loci, we identify five novel loci significantly associated with endometriosis risk (P<5 × 10−8), implicating genes involved in sex steroid hormone pathways (FN1, CCDC170, ESR1, SYNE1 and FSHB). Conditional analysis identified five secondary association signals, including two at the ESR1 locus, resulting in 19 independent single nucleotide polymorphisms (SNPs) robustly associated with endometriosis, which together explain up to 5.19% of variance in endometriosis. These results highlight novel variants in or near specific genes with important roles in sex steroid hormone signalling and function, and offer unique opportunities for more targeted functional research efforts. PMID:28537267

  3. Combining Genome-Scale Experimental and Computational Methods To Identify Essential Genes in Rhodobacter sphaeroides

    DOE PAGES

    Burger, Brian T.; Imam, Saheed; Scarborough, Matthew J.; ...

    2017-06-06

    Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosyntheticmore » growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species.« less

  4. A Functional Genomics Approach to Identify Novel Breast Cancer Gene Targets in Yeast

    DTIC Science & Technology

    2004-05-01

    AD Award Number: DAMD17-03-1-0232 TITLE: A Functional Genomics Approach to Identify Novel Breast Cancer Gene Targets in Yeast PRINCIPAL INVESTIGATOR...Approach to Identify Novel Breast DAMD17-03-1-0232 Cancer Gene Targets in Yeast 6. A UTHOR(S) Craig Bennett, Ph.D. 7. PERFORMING ORGANIZA TION NAME(S...Unlimited 13. ABSTRACT (Maximum 200 Words) We are using the yeast Saccharomyces cerevisiae to identify new cancer gene targets that interact with the

  5. ICan: An Integrated Co-Alteration Network to Identify Ovarian Cancer-Related Genes

    PubMed Central

    Zhou, Yuanshuai; Liu, Yongjing; Li, Kening; Zhang, Rui; Qiu, Fujun; Zhao, Ning; Xu, Yan

    2015-01-01

    Background Over the last decade, an increasing number of integrative studies on cancer-related genes have been published. Integrative analyses aim to overcome the limitation of a single data type, and provide a more complete view of carcinogenesis. The vast majority of these studies used sample-matched data of gene expression and copy number to investigate the impact of copy number alteration on gene expression, and to predict and prioritize candidate oncogenes and tumor suppressor genes. However, correlations between genes were neglected in these studies. Our work aimed to evaluate the co-alteration of copy number, methylation and expression, allowing us to identify cancer-related genes and essential functional modules in cancer. Results We built the Integrated Co-alteration network (ICan) based on multi-omics data, and analyzed the network to uncover cancer-related genes. After comparison with random networks, we identified 155 ovarian cancer-related genes, including well-known (TP53, BRCA1, RB1 and PTEN) and also novel cancer-related genes, such as PDPN and EphA2. We compared the results with a conventional method: CNAmet, and obtained a significantly better area under the curve value (ICan: 0.8179, CNAmet: 0.5183). Conclusion In this paper, we describe a framework to find cancer-related genes based on an Integrated Co-alteration network. Our results proved that ICan could precisely identify candidate cancer genes and provide increased mechanistic understanding of carcinogenesis. This work suggested a new research direction for biological network analyses involving multi-omics data. PMID:25803614

  6. MMTV insertional mutagenesis identifies genes, gene families and pathways involved in mammary cancer.

    PubMed

    Theodorou, Vassiliki; Kimm, Melanie A; Boer, Mandy; Wessels, Lodewyk; Theelen, Wendy; Jonkers, Jos; Hilkens, John

    2007-06-01

    We performed a high-throughput retroviral insertional mutagenesis screen in mouse mammary tumor virus (MMTV)-induced mammary tumors and identified 33 common insertion sites, of which 17 genes were previously not known to be associated with mammary cancer and 13 had not previously been linked to cancer in general. Although members of the Wnt and fibroblast growth factors (Fgf) families were frequently tagged, our exhaustive screening for MMTV insertion sites uncovered a new repertoire of candidate breast cancer oncogenes. We validated one of these genes, Rspo3, as an oncogene by overexpression in a p53-deficient mammary epithelial cell line. The human orthologs of the candidate oncogenes were frequently deregulated in human breast cancers and associated with several tumor parameters. Computational analysis of all MMTV-tagged genes uncovered specific gene families not previously associated with cancer and showed a significant overrepresentation of protein domains and signaling pathways mainly associated with development and growth factor signaling. Comparison of all tagged genes in MMTV and Moloney murine leukemia virus-induced malignancies showed that both viruses target mostly different genes that act predominantly in distinct pathways.

  7. Systematic analysis of microarray datasets to identify Parkinson's disease‑associated pathways and genes.

    PubMed

    Feng, Yinling; Wang, Xuefeng

    2017-03-01

    In order to investigate commonly disturbed genes and pathways in various brain regions of patients with Parkinson's disease (PD), microarray datasets from previous studies were collected and systematically analyzed. Different normalization methods were applied to microarray datasets from different platforms. A strategy combining gene co‑expression networks and clinical information was adopted, using weighted gene co‑expression network analysis (WGCNA) to screen for commonly disturbed genes in different brain regions of patients with PD. Functional enrichment analysis of commonly disturbed genes was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Co‑pathway relationships were identified with Pearson's correlation coefficient tests and a hypergeometric distribution‑based test. Common genes in pathway pairs were selected out and regarded as risk genes. A total of 17 microarray datasets from 7 platforms were retained for further analysis. Five gene coexpression modules were identified, containing 9,745, 736, 233, 101 and 93 genes, respectively. One module was significantly correlated with PD samples and thus the 736 genes it contained were considered to be candidate PD‑associated genes. Functional enrichment analysis demonstrated that these genes were implicated in oxidative phosphorylation and PD. A total of 44 pathway pairs and 52 risk genes were revealed, and a risk gene pathway relationship network was constructed. Eight modules were identified and were revealed to be associated with PD, cancers and metabolism. A number of disturbed pathways and risk genes were unveiled in PD, and these findings may help advance understanding of PD pathogenesis.

  8. Deep sequencing analysis of the transcriptomes of peanut aerial and subterranean young pods identifies candidate genes related to early embryo abortion.

    PubMed

    Chen, Xiaoping; Zhu, Wei; Azam, Sarwar; Li, Heying; Zhu, Fanghe; Li, Haifen; Hong, Yanbin; Liu, Haiyan; Zhang, Erhua; Wu, Hong; Yu, Shanlin; Zhou, Guiyuan; Li, Shaoxiong; Zhong, Ni; Wen, Shijie; Li, Xingyu; Knapp, Steve J; Ozias-Akins, Peggy; Varshney, Rajeev K; Liang, Xuanqiang

    2013-01-01

    The failure of peg penetration into the soil leads to seed abortion in peanut. Knowledge of genes involved in these processes is comparatively deficient. Here, we used RNA-seq to gain insights into transcriptomes of aerial and subterranean pods. More than 2 million transcript reads with an average length of 396 bp were generated from one aerial (AP) and two subterranean (SP1 and SP2) pod libraries using pyrosequencing technology. After assembly, sets of 49 632, 49 952 and 50 494 from a total of 74 974 transcript assembly contigs (TACs) were identified in AP, SP1 and SP2, respectively. A clear linear relationship in the gene expression level was observed between these data sets. In brief, 2194 differentially expressed TACs with a 99.0% true-positive rate were identified, among which 859 and 1068 TACs were up-regulated in aerial and subterranean pods, respectively. Functional analysis showed that putative function based on similarity with proteins catalogued in UniProt and gene ontology term classification could be determined for 59 342 (79.2%) and 42 955 (57.3%) TACs, respectively. A total of 2968 TACs were mapped to 174 KEGG pathways, of which 168 were shared by aerial and subterranean transcriptomes. TACs involved in photosynthesis were significantly up-regulated and enriched in the aerial pod. In addition, two senescence-associated genes were identified as significantly up-regulated in the aerial pod, which potentially contribute to embryo abortion in aerial pods, and in turn, to cessation of swelling. The data set generated in this study provides evidence for some functional genes as robust candidates underlying aerial and subterranean pod development and contributes to an elucidation of the evolutionary implications resulting from fruit development under light and dark conditions. © 2012 The Authors Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  9. Distinct ontogenic and regional expressions of newly identified Cajal-Retzius cell-specific genes during neocorticogenesis.

    PubMed

    Yamazaki, Hiroshi; Sekiguchi, Mariko; Takamatsu, Masako; Tanabe, Yasuto; Nakanishi, Shigetada

    2004-10-05

    Cajal-Retzius (CR) cells are early-generated transient neurons and are important in the regulation of cortical neuronal migration and cortical laminar formation. Molecular entities characterizing the CR cell identity, however, remain largely elusive. We purified mouse cortical CR cells expressing GFP to homogeneity by fluorescence-activated cell sorting and examined a genome-wide expression profile of cortical CR cells at embryonic and postnatal periods. We identified 49 genes that exceeded hybridization signals by >10-fold in CR cells compared with non-CR cells at embryonic day 13.5, postnatal day 2, or both. Among these CR cell-specific genes, 25 genes, including the CR cell marker genes such as the reelin and calretinin genes, are selectively and highly expressed in both embryonic and postnatal CR cells. These genes, which encode generic properties of CR cell specificity, are eminently characterized as modulatory composites of voltage-dependent calcium channels and sets of functionally related cellular components involved in cell migration, adhesion, and neurite extension. Five genes are highly expressed in CR cells at the early embryonic period and are rapidly down-regulated thereafter. Furthermore, some of these genes have been shown to mark two distinctly different focal regions corresponding to the CR cell origins. At the late prenatal and postnatal periods, 19 genes are selectively up-regulated in CR cells. These genes include functional molecules implicated in synaptic transmission and modulation. CR cells thus strikingly change their cellular phenotypes during cortical development and play a pivotal role in both corticogenesis and cortical circuit maturation.

  10. Identification and Construction of Combinatory Cancer Hallmark-Based Gene Signature Sets to Predict Recurrence and Chemotherapy Benefit in Stage II Colorectal Cancer.

    PubMed

    Gao, Shanwu; Tibiche, Chabane; Zou, Jinfeng; Zaman, Naif; Trifiro, Mark; O'Connor-McCourt, Maureen; Wang, Edwin

    2016-01-01

    Decisions regarding adjuvant therapy in patients with stage II colorectal cancer (CRC) have been among the most challenging and controversial in oncology over the past 20 years. To develop robust combinatory cancer hallmark-based gene signature sets (CSS sets) that more accurately predict prognosis and identify a subset of patients with stage II CRC who could gain survival benefits from adjuvant chemotherapy. Thirteen retrospective studies of patients with stage II CRC who had clinical follow-up and adjuvant chemotherapy were analyzed. Respective totals of 162 and 843 patients from 2 and 11 independent cohorts were used as the discovery and validation cohorts, respectively. A total of 1005 patients with stage II CRC were included in the 13 cohorts. Among them, 84 of 416 patients in 3 independent cohorts received fluorouracil-based adjuvant chemotherapy. Identification of CSS sets to predict relapse-free survival and identify a subset of patients with stage II CRC who could gain substantial survival benefits from fluorouracil-based adjuvant chemotherapy. Eight cancer hallmark-based gene signatures (30 genes each) were identified and used to construct CSS sets for determining prognosis. The CSS sets were validated in 11 independent cohorts of 767 patients with stage II CRC who did not receive adjuvant chemotherapy. The CSS sets accurately stratified patients into low-, intermediate-, and high-risk groups. Five-year relapse-free survival rates were 94%, 78%, and 45%, respectively, representing 60%, 28%, and 12% of patients with stage II disease. The 416 patients with CSS set-defined high-risk stage II CRC who received fluorouracil-based adjuvant chemotherapy showed a substantial gain in survival benefits from the treatment (ie, recurrence reduced by 30%-40% in 5 years). The CSS sets substantially outperformed other prognostic predictors of stage 2 CRC. They are more accurate and robust for prognostic predictions and facilitate the identification of patients with stage

  11. Identifying arbitrary parameter zonation using multiple level set functions

    NASA Astrophysics Data System (ADS)

    Lu, Zhiming; Vesselinov, Velimir V.; Lei, Hongzhuan

    2018-07-01

    In this paper, we extended the analytical level set method [1,2] for identifying a piece-wisely heterogeneous (zonation) binary system to the case with an arbitrary number of materials with unknown material properties. In the developed level set approach, starting from an initial guess, the material interfaces are propagated through iterations such that the residuals between the simulated and observed state variables (hydraulic head) is minimized. We derived an expression for the propagation velocity of the interface between any two materials, which is related to the permeability contrast between the materials on two sides of the interface, the sensitivity of the head to permeability, and the head residual. We also formulated an expression for updating the permeability of all materials, which is consistent with the steepest descent of the objective function. The developed approach has been demonstrated through many examples, ranging from totally synthetic cases to a case where the flow conditions are representative of a groundwater contaminant site at the Los Alamos National Laboratory. These examples indicate that the level set method can successfully identify zonation structures, even if the number of materials in the model domain is not exactly known in advance. Although the evolution of the material zonation depends on the initial guess field, inverse modeling runs starting with different initial guesses fields may converge to the similar final zonation structure. These examples also suggest that identifying interfaces of spatially distributed heterogeneities is more important than estimating their permeability values.

  12. Gene-centric Meta-analysis in 87,736 Individuals of European Ancestry Identifies Multiple Blood-Pressure-Related Loci

    PubMed Central

    Tragante, Vinicius; Barnes, Michael R.; Ganesh, Santhi K.; Lanktree, Matthew B.; Guo, Wei; Franceschini, Nora; Smith, Erin N.; Johnson, Toby; Holmes, Michael V.; Padmanabhan, Sandosh; Karczewski, Konrad J.; Almoguera, Berta; Barnard, John; Baumert, Jens; Chang, Yen-Pei Christy; Elbers, Clara C.; Farrall, Martin; Fischer, Mary E.; Gaunt, Tom R.; Gho, Johannes M.I.H.; Gieger, Christian; Goel, Anuj; Gong, Yan; Isaacs, Aaron; Kleber, Marcus E.; Leach, Irene Mateo; McDonough, Caitrin W.; Meijs, Matthijs F.L.; Melander, Olle; Nelson, Christopher P.; Nolte, Ilja M.; Pankratz, Nathan; Price, Tom S.; Shaffer, Jonathan; Shah, Sonia; Tomaszewski, Maciej; van der Most, Peter J.; Van Iperen, Erik P.A.; Vonk, Judith M.; Witkowska, Kate; Wong, Caroline O.L.; Zhang, Li; Beitelshees, Amber L.; Berenson, Gerald S.; Bhatt, Deepak L.; Brown, Morris; Burt, Amber; Cooper-DeHoff, Rhonda M.; Connell, John M.; Cruickshanks, Karen J.; Curtis, Sean P.; Davey-Smith, George; Delles, Christian; Gansevoort, Ron T.; Guo, Xiuqing; Haiqing, Shen; Hastie, Claire E.; Hofker, Marten H.; Hovingh, G. Kees; Kim, Daniel S.; Kirkland, Susan A.; Klein, Barbara E.; Klein, Ronald; Li, Yun R.; Maiwald, Steffi; Newton-Cheh, Christopher; O’Brien, Eoin T.; Onland-Moret, N. Charlotte; Palmas, Walter; Parsa, Afshin; Penninx, Brenda W.; Pettinger, Mary; Vasan, Ramachandran S.; Ranchalis, Jane E.; M Ridker, Paul; Rose, Lynda M.; Sever, Peter; Shimbo, Daichi; Steele, Laura; Stolk, Ronald P.; Thorand, Barbara; Trip, Mieke D.; van Duijn, Cornelia M.; Verschuren, W. Monique; Wijmenga, Cisca; Wyatt, Sharon; Young, J. Hunter; Zwinderman, Aeilko H.; Bezzina, Connie R.; Boerwinkle, Eric; Casas, Juan P.; Caulfield, Mark J.; Chakravarti, Aravinda; Chasman, Daniel I.; Davidson, Karina W.; Doevendans, Pieter A.; Dominiczak, Anna F.; FitzGerald, Garret A.; Gums, John G.; Fornage, Myriam; Hakonarson, Hakon; Halder, Indrani; Hillege, Hans L.; Illig, Thomas; Jarvik, Gail P.; Johnson, Julie A.; Kastelein, John J.P.; Koenig, Wolfgang; Kumari, Meena; März, Winfried; Murray, Sarah S.; O’Connell, Jeffery R.; Oldehinkel, Albertine J.; Pankow, James S.; Rader, Daniel J.; Redline, Susan; Reilly, Muredach P.; Schadt, Eric E.; Kottke-Marchant, Kandice; Snieder, Harold; Snyder, Michael; Stanton, Alice V.; Tobin, Martin D.; Uitterlinden, André G.; van der Harst, Pim; van der Schouw, Yvonne T.; Samani, Nilesh J.; Watkins, Hugh; Johnson, Andrew D.; Reiner, Alex P.; Zhu, Xiaofeng; de Bakker, Paul I.W.; Levy, Daniel; Asselbergs, Folkert W.; Munroe, Patricia B.; Keating, Brendan J.

    2014-01-01

    Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ∼50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10−7) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification. PMID:24560520

  13. Exome Sequencing Identifies Three Novel Candidate Genes Implicated in Intellectual Disability

    PubMed Central

    Azam, Maleeha; Ayub, Humaira; Vissers, Lisenka E. L. M.; Gilissen, Christian; Ali, Syeda Hafiza Benish; Riaz, Moeen; Veltman, Joris A.; Pfundt, Rolph; van Bokhoven, Hans; Qamar, Raheel

    2014-01-01

    Intellectual disability (ID) is a major health problem mostly with an unknown etiology. Recently exome sequencing of individuals with ID identified novel genes implicated in the disease. Therefore the purpose of the present study was to identify the genetic cause of ID in one syndromic and two non-syndromic Pakistani families. Whole exome of three ID probands was sequenced. Missense variations in two plausible novel genes implicated in autosomal recessive ID were identified: lysine (K)-specific methyltransferase 2B (KMT2B), zinc finger protein 589 (ZNF589), as well as hedgehog acyltransferase (HHAT) with a de novo mutation with autosomal dominant mode of inheritance. The KMT2B recessive variant is the first report of recessive Kleefstra syndrome-like phenotype. Identification of plausible causative mutations for two recessive and a dominant type of ID, in genes not previously implicated in disease, underscores the large genetic heterogeneity of ID. These results also support the viewpoint that large number of ID genes converge on limited number of common networks i.e. ZNF589 belongs to KRAB-domain zinc-finger proteins previously implicated in ID, HHAT is predicted to affect sonic hedgehog, which is involved in several disorders with ID, KMT2B associated with syndromic ID fits the epigenetic module underlying the Kleefstra syndromic spectrum. The association of these novel genes in three different Pakistani ID families highlights the importance of screening these genes in more families with similar phenotypes from different populations to confirm the involvement of these genes in pathogenesis of ID. PMID:25405613

  14. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits.

    PubMed

    Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé; Kichaev, Gleb; Gusev, Alexander; Pasaniuc, Bogdan

    2017-03-02

    Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. Here, we introduce a method for estimating the local genetic correlation between gene expression and a complex trait and utilize it to estimate the genetic correlation due to predicted expression between pairs of traits. We integrated gene expression measurements from 45 expression panels with summary GWAS data to perform 30 multi-tissue transcriptome-wide association studies (TWASs). We identified 1,196 genes whose expression is associated with these traits; of these, 168 reside more than 0.5 Mb away from any previously reported GWAS significant variant. We then used our approach to find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, eight were not found through genetic correlation at the SNP level. Finally, we used bi-directional regression to find evidence that BMI causally influences triglyceride levels and that triglyceride levels causally influence low-density lipoprotein. Together, our results provide insight into the role of gene expression in the susceptibility of complex traits and diseases. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  15. Genes associated with thermosensitive genic male sterility in rice identified by comparative expression profiling.

    PubMed

    Pan, Yufang; Li, Qiaofeng; Wang, Zhizheng; Wang, Yang; Ma, Rui; Zhu, Lili; He, Guangcun; Chen, Rongzhi

    2014-12-16

    development, low temperature responses or TGMS was validated by quantitative RT-PCR (qRT-PCR). Temperature strongly affects global gene expression and may be the common regulator of fertility in PGMS/TGMS rice lines. The identified expression changes reflect perturbations in the transcriptomic regulation of pollen development networks in TGMS-Co27. Findings from this and previous studies indicate that sets of genes involved in post-transcriptional and translation processes are involved in thermosensitive male sterility transitions in TGMS-Co27.

  16. Positive-unlabeled learning for disease gene identification

    PubMed Central

    Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong

    2012-01-01

    Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Result: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r

  17. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.

    PubMed

    Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. [Key effect genes responding to nerve injury identified by gene ontology and computer pattern recognition].

    PubMed

    Pan, Qian; Peng, Jin; Zhou, Xue; Yang, Hao; Zhang, Wei

    2012-07-01

    In order to screen out important genes from large gene data of gene microarray after nerve injury, we combine gene ontology (GO) method and computer pattern recognition technology to find key genes responding to nerve injury, and then verify one of these screened-out genes. Data mining and gene ontology analysis of gene chip data GSE26350 was carried out through MATLAB software. Cd44 was selected from screened-out key gene molecular spectrum by comparing genes' different GO terms and positions on score map of principal component. Function interferences were employed to influence the normal binding of Cd44 and one of its ligands, chondroitin sulfate C (CSC), to observe neurite extension. Gene ontology analysis showed that the first genes on score map (marked by red *) mainly distributed in molecular transducer activity, receptor activity, protein binding et al molecular function GO terms. Cd44 is one of six effector protein genes, and attracted us with its function diversity. After adding different reagents into the medium to interfere the normal binding of CSC and Cd44, varying-degree remissions of CSC's inhibition on neurite extension were observed. CSC can inhibit neurite extension through binding Cd44 on the neuron membrane. This verifies that important genes in given physiological processes can be identified by gene ontology analysis of gene chip data.

  19. High-Throughput Genetic Screens Identify a Large and Diverse Collection of New Sporulation Genes in Bacillus subtilis.

    PubMed

    Meeske, Alexander J; Rodrigues, Christopher D A; Brady, Jacqueline; Lim, Hoong Chuin; Bernhardt, Thomas G; Rudner, David Z

    2016-01-01

    The differentiation of the bacterium Bacillus subtilis into a dormant spore is among the most well-characterized developmental pathways in biology. Classical genetic screens performed over the past half century identified scores of factors involved in every step of this morphological process. More recently, transcriptional profiling uncovered additional sporulation-induced genes required for successful spore development. Here, we used transposon-sequencing (Tn-seq) to assess whether there were any sporulation genes left to be discovered. Our screen identified 133 out of the 148 genes with known sporulation defects. Surprisingly, we discovered 24 additional genes that had not been previously implicated in spore formation. To investigate their functions, we used fluorescence microscopy to survey early, middle, and late stages of differentiation of null mutants from the B. subtilis ordered knockout collection. This analysis identified mutants that are delayed in the initiation of sporulation, defective in membrane remodeling, and impaired in spore maturation. Several mutants had novel sporulation phenotypes. We performed in-depth characterization of two new factors that participate in cell-cell signaling pathways during sporulation. One (SpoIIT) functions in the activation of σE in the mother cell; the other (SpoIIIL) is required for σG activity in the forespore. Our analysis also revealed that as many as 36 sporulation-induced genes with no previously reported mutant phenotypes are required for timely spore maturation. Finally, we discovered a large set of transposon insertions that trigger premature initiation of sporulation. Our results highlight the power of Tn-seq for the discovery of new genes and novel pathways in sporulation and, combined with the recently completed null mutant collection, open the door for similar screens in other, less well-characterized processes.

  20. High-Throughput Genetic Screens Identify a Large and Diverse Collection of New Sporulation Genes in Bacillus subtilis

    PubMed Central

    Brady, Jacqueline; Lim, Hoong Chuin; Bernhardt, Thomas G.; Rudner, David Z.

    2016-01-01

    The differentiation of the bacterium Bacillus subtilis into a dormant spore is among the most well-characterized developmental pathways in biology. Classical genetic screens performed over the past half century identified scores of factors involved in every step of this morphological process. More recently, transcriptional profiling uncovered additional sporulation-induced genes required for successful spore development. Here, we used transposon-sequencing (Tn-seq) to assess whether there were any sporulation genes left to be discovered. Our screen identified 133 out of the 148 genes with known sporulation defects. Surprisingly, we discovered 24 additional genes that had not been previously implicated in spore formation. To investigate their functions, we used fluorescence microscopy to survey early, middle, and late stages of differentiation of null mutants from the B. subtilis ordered knockout collection. This analysis identified mutants that are delayed in the initiation of sporulation, defective in membrane remodeling, and impaired in spore maturation. Several mutants had novel sporulation phenotypes. We performed in-depth characterization of two new factors that participate in cell–cell signaling pathways during sporulation. One (SpoIIT) functions in the activation of σE in the mother cell; the other (SpoIIIL) is required for σG activity in the forespore. Our analysis also revealed that as many as 36 sporulation-induced genes with no previously reported mutant phenotypes are required for timely spore maturation. Finally, we discovered a large set of transposon insertions that trigger premature initiation of sporulation. Our results highlight the power of Tn-seq for the discovery of new genes and novel pathways in sporulation and, combined with the recently completed null mutant collection, open the door for similar screens in other, less well-characterized processes. PMID:26735940

  1. Sleeping Beauty transposon mutagenesis identifies genes that cooperate with mutant Smad4 in gastric cancer development

    PubMed Central

    Takeda, Haruna; Rust, Alistair G.; Ward, Jerrold M.; Yew, Christopher Chin Kuan; Jenkins, Nancy A.; Copeland, Neal G.

    2016-01-01

    Mutations in SMAD4 predispose to the development of gastrointestinal cancer, which is the third leading cause of cancer-related deaths. To identify genes driving gastric cancer (GC) development, we performed a Sleeping Beauty (SB) transposon mutagenesis screen in the stomach of Smad4+/− mutant mice. This screen identified 59 candidate GC trunk drivers and a much larger number of candidate GC progression genes. Strikingly, 22 SB-identified trunk drivers are known or candidate cancer genes, whereas four SB-identified trunk drivers, including PTEN, SMAD4, RNF43, and NF1, are known human GC trunk drivers. Similar to human GC, pathway analyses identified WNT, TGF-β, and PI3K-PTEN signaling, ubiquitin-mediated proteolysis, adherens junctions, and RNA degradation in addition to genes involved in chromatin modification and organization as highly deregulated pathways in GC. Comparative oncogenomic filtering of the complete list of SB-identified genes showed that they are highly enriched for genes mutated in human GC and identified many candidate human GC genes. Finally, by comparing our complete list of SB-identified genes against the list of mutated genes identified in five large-scale human GC sequencing studies, we identified LDL receptor-related protein 1B (LRP1B) as a previously unidentified human candidate GC tumor suppressor gene. In LRP1B, 129 mutations were found in 462 human GC samples sequenced, and LRP1B is one of the top 10 most deleted genes identified in a panel of 3,312 human cancers. SB mutagenesis has, thus, helped to catalog the cooperative molecular mechanisms driving SMAD4-induced GC growth and discover genes with potential clinical importance in human GC. PMID:27006499

  2. Sleeping Beauty transposon mutagenesis identifies genes that cooperate with mutant Smad4 in gastric cancer development.

    PubMed

    Takeda, Haruna; Rust, Alistair G; Ward, Jerrold M; Yew, Christopher Chin Kuan; Jenkins, Nancy A; Copeland, Neal G

    2016-04-05

    Mutations in SMAD4 predispose to the development of gastrointestinal cancer, which is the third leading cause of cancer-related deaths. To identify genes driving gastric cancer (GC) development, we performed a Sleeping Beauty (SB) transposon mutagenesis screen in the stomach of Smad4(+/-) mutant mice. This screen identified 59 candidate GC trunk drivers and a much larger number of candidate GC progression genes. Strikingly, 22 SB-identified trunk drivers are known or candidate cancer genes, whereas four SB-identified trunk drivers, including PTEN, SMAD4, RNF43, and NF1, are known human GC trunk drivers. Similar to human GC, pathway analyses identified WNT, TGF-β, and PI3K-PTEN signaling, ubiquitin-mediated proteolysis, adherens junctions, and RNA degradation in addition to genes involved in chromatin modification and organization as highly deregulated pathways in GC. Comparative oncogenomic filtering of the complete list of SB-identified genes showed that they are highly enriched for genes mutated in human GC and identified many candidate human GC genes. Finally, by comparing our complete list of SB-identified genes against the list of mutated genes identified in five large-scale human GC sequencing studies, we identified LDL receptor-related protein 1B (LRP1B) as a previously unidentified human candidate GC tumor suppressor gene. In LRP1B, 129 mutations were found in 462 human GC samples sequenced, and LRP1B is one of the top 10 most deleted genes identified in a panel of 3,312 human cancers. SB mutagenesis has, thus, helped to catalog the cooperative molecular mechanisms driving SMAD4-induced GC growth and discover genes with potential clinical importance in human GC.

  3. Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN

    PubMed Central

    Law, MeiYee; Childs, Kevin L.; Campbell, Michael S.; Stein, Joshua C.; Olson, Andrew J.; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M.; Lawrence, Carolyn J.; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

    2015-01-01

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. PMID:25384563

  4. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

    PubMed

    Dwivedi, Bhakti; Kowalski, Jeanne

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.

  5. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations

    PubMed Central

    Dwivedi, Bhakti

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010

  6. Expression profiling identifies novel Hh/Gli regulated genes in developing zebrafish embryos.

    PubMed Central

    Bergeron, Sadie A.; Milla, Luis A.; Villegas, Rosario; Shen, Meng-Chieh; Burgess, Shawn M.; Allende, Miguel L.; Karlstrom, Rolf O.; Palma, Verónica

    2008-01-01

    The Hedgehog (Hh) signaling pathway plays critical instructional roles during embryonic development. Mis-regulation of Hh/Gli signaling is a major causative factor in human congenital disorders and in a variety of cancers. The zebrafish is a powerful genetic model for the study of Hh signaling during embryogenesis, as a large number of mutants have been identified affecting different components of the Hh/Gli signaling system. By performing global profiling of gene expression in different Hh/Gli gain- and loss-of-function scenarios we identified several known (e.g. ptc1 and nkx2.2a) as well as a large number of novel Hh regulated genes that are differentially expressed in embryos with altered Hh/Gli signaling function. By uncovering changes in tissue specific gene expression, we revealed new embryological processes that are influenced by Hh signaling. We thus provide a comprehensive survey of Hh/Gli regulated genes during embryogenesis and we identify new Hh-regulated genes that may be targets of mis-regulation during tumorogenesis. PMID:18055165

  7. Identifying Stress Transcription Factors Using Gene Expression and TF-Gene Association Data

    PubMed Central

    Wu, Wei-Sheng; Chen, Bor-Sen

    2007-01-01

    Unicellular organisms such as yeasts have evolved to survive environmental stresses by rapidly reorganizing the genomic expression program to meet the challenges of harsh environments. The complex adaptation mechanisms to stress remain to be elucidated. In this study, we developed Stress Transcription Factor Identification Algorithm (STFIA), which integrates gene expression and TF-gene association data to identify the stress transcription factors (TFs) of six kinds of stresses. We identified some general stress TFs that are in response to various stresses, and some specific stress TFs that are in response to one specific stress. The biological significance of our findings is validated by the literature. We found that a small number of TFs may be sufficient to control a wide variety of expression patterns in yeast under different stresses. Two implications can be inferred from this observation. First, the adaptation mechanisms to different stresses may have a bow-tie structure. Second, there may exist extensive regulatory cross-talk among different stress responses. In conclusion, this study proposes a network of the regulators of stress responses and their mechanism of action. PMID:20066130

  8. GeneCOST: a novel scoring-based prioritization framework for identifying disease causing genes.

    PubMed

    Ozer, Bugra; Sağıroğlu, Mahmut; Demirci, Hüseyin

    2015-11-15

    Due to the big data produced by next-generation sequencing studies, there is an evident need for methods to extract the valuable information gathered from these experiments. In this work, we propose GeneCOST, a novel scoring-based method to evaluate every gene for their disease association. Without any prior filtering and any prior knowledge, we assign a disease likelihood score to each gene in correspondence with their variations. Then, we rank all genes based on frequency, conservation, pedigree and detailed variation information to find out the causative reason of the disease state. We demonstrate the usage of GeneCOST with public and real life Mendelian disease cases including recessive, dominant, compound heterozygous and sporadic models. As a result, we were able to identify causative reason behind the disease state in top rankings of our list, proving that this novel prioritization framework provides a powerful environment for the analysis in genetic disease studies alternative to filtering-based approaches. GeneCOST software is freely available at www.igbam.bilgem.tubitak.gov.tr/en/softwares/genecost-en/index.html. buozer@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Pathway-driven gene stability selection of two rheumatoid arthritis GWAS identifies and validates new susceptibility genes in receptor mediated signalling pathways.

    PubMed

    Eleftherohorinou, Hariklia; Hoggart, Clive J; Wright, Victoria J; Levin, Michael; Coin, Lachlan J M

    2011-09-01

    Rheumatoid arthritis (RA) is the commonest chronic, systemic, inflammatory disorder affecting ∼1% of the world population. It has a strong genetic component and a growing number of associated genes have been discovered in genome-wide association studies (GWAS), which nevertheless only account for 23% of the total genetic risk. We aimed to identify additional susceptibility loci through the analysis of GWAS in the context of biological function. We bridge the gap between pathway and gene-oriented analyses of GWAS, by introducing a pathway-driven gene stability-selection methodology that identifies potential causal genes in the top-associated disease pathways that may be driving the pathway association signals. We analysed the WTCCC and the NARAC studies of ∼5000 and ∼2000 subjects, respectively. We examined 700 pathways comprising ∼8000 genes. Ranking pathways by significance revealed that the NARAC top-ranked ∼6% laid within the top 10% of WTCCC. Gene selection on those pathways identified 58 genes in WTCCC and 61 in NARAC; 21 of those were common (P(overlap)< 10(-21)), of which 16 were novel discoveries. Among the identified genes, we validated 10 known RA associations in WTCCC and 13 in NARAC, not discovered using single-SNP approaches on the same data. Gene ontology functional enrichment analysis on the identified genes showed significant over-representation of signalling activity (P< 10(-29)) in both studies. Our findings suggest a novel model of RA genetic predisposition, which involves cell-membrane receptors and genes in second messenger signalling systems, in addition to genes that regulate immune responses, which have been the focus of interest previously.

  10. SiBIC: a web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining.

    PubMed

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.

  11. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  12. Integrated network analysis identifies fight-club nodes as a class of hubs encompassing key putative switch genes that induce major transcriptome reprogramming during grapevine development.

    PubMed

    Palumbo, Maria Concetta; Zenoni, Sara; Fasoli, Marianna; Massonnet, Mélanie; Farina, Lorenzo; Castiglione, Filippo; Pezzotti, Mario; Paci, Paola

    2014-12-01

    We developed an approach that integrates different network-based methods to analyze the correlation network arising from large-scale gene expression data. By studying grapevine (Vitis vinifera) and tomato (Solanum lycopersicum) gene expression atlases and a grapevine berry transcriptomic data set during the transition from immature to mature growth, we identified a category named "fight-club hubs" characterized by a marked negative correlation with the expression profiles of neighboring genes in the network. A special subset named "switch genes" was identified, with the additional property of many significant negative correlations outside their own group in the network. Switch genes are involved in multiple processes and include transcription factors that may be considered master regulators of the previously reported transcriptome remodeling that marks the developmental shift from immature to mature growth. All switch genes, expressed at low levels in vegetative/green tissues, showed a significant increase in mature/woody organs, suggesting a potential regulatory role during the developmental transition. Finally, our analysis of tomato gene expression data sets showed that wild-type switch genes are downregulated in ripening-deficient mutants. The identification of known master regulators of tomato fruit maturation suggests our method is suitable for the detection of key regulators of organ development in different fleshy fruit crops. © 2014 American Society of Plant Biologists. All rights reserved.

  13. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    PubMed

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  14. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes

    PubMed Central

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-01-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information. PMID:22384404

  15. Identifying arbitrary parameter zonation using multiple level set functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Zhiming; Vesselinov, Velimir Valentinov; Lei, Hongzhuan

    In this paper, we extended the analytical level set method [1, 2] for identifying a piece-wisely heterogeneous (zonation) binary system to the case with an arbitrary number of materials with unknown material properties. In the developed level set approach, starting from an initial guess, the material interfaces are propagated through iterations such that the residuals between the simulated and observed state variables (hydraulic head) is minimized. We derived an expression for the propagation velocity of the interface between any two materials, which is related to the permeability contrast between the materials on two sides of the interface, the sensitivity ofmore » the head to permeability, and the head residual. We also formulated an expression for updating the permeability of all materials, which is consistent with the steepest descent of the objective function. The developed approach has been demonstrated through many examples, ranging from totally synthetic cases to a case where the flow conditions are representative of a groundwater contaminant site at the Los Alamos National Laboratory. These examples indicate that the level set method can successfully identify zonation structures, even if the number of materials in the model domain is not exactly known in advance. Although the evolution of the material zonation depends on the initial guess field, inverse modeling runs starting with different initial guesses fields may converge to the similar final zonation structure. These examples also suggest that identifying interfaces of spatially distributed heterogeneities is more important than estimating their permeability values.« less

  16. Identifying arbitrary parameter zonation using multiple level set functions

    DOE PAGES

    Lu, Zhiming; Vesselinov, Velimir Valentinov; Lei, Hongzhuan

    2018-03-14

    In this paper, we extended the analytical level set method [1, 2] for identifying a piece-wisely heterogeneous (zonation) binary system to the case with an arbitrary number of materials with unknown material properties. In the developed level set approach, starting from an initial guess, the material interfaces are propagated through iterations such that the residuals between the simulated and observed state variables (hydraulic head) is minimized. We derived an expression for the propagation velocity of the interface between any two materials, which is related to the permeability contrast between the materials on two sides of the interface, the sensitivity ofmore » the head to permeability, and the head residual. We also formulated an expression for updating the permeability of all materials, which is consistent with the steepest descent of the objective function. The developed approach has been demonstrated through many examples, ranging from totally synthetic cases to a case where the flow conditions are representative of a groundwater contaminant site at the Los Alamos National Laboratory. These examples indicate that the level set method can successfully identify zonation structures, even if the number of materials in the model domain is not exactly known in advance. Although the evolution of the material zonation depends on the initial guess field, inverse modeling runs starting with different initial guesses fields may converge to the similar final zonation structure. These examples also suggest that identifying interfaces of spatially distributed heterogeneities is more important than estimating their permeability values.« less

  17. A general method for identifying major hybrid male sterility genes in Drosophila.

    PubMed

    Zeng, L W; Singh, R S

    1995-10-01

    The genes responsible for hybrid male sterility in species crosses are usually identified by introgressing chromosome segments, monitored by visible markers, between closely related species by continuous backcrosses. This commonly used method, however, suffers from two problems. First, it relies on the availability of markers to monitor the introgressed regions and so the portion of the genome examined is limited to the marked regions. Secondly, the introgressed regions are usually large and it is impossible to tell if the effects of the introgressed regions are the result of single (or few) major genes or many minor genes (polygenes). Here we introduce a simple and general method for identifying putative major hybrid male sterility genes which is free of these problems. In this method, the actual hybrid male sterility genes (rather than markers), or tightly linked gene complexes with large effects, are selectively introgressed from one species into the background of another species by repeated backcrosses. This is performed by selectively backcrossing heterozygous (for hybrid male sterility gene or genes) females producing fertile and sterile sons in roughly equal proportions to males of either parental species. As no marker gene is required for this procedure, this method can be used with any species pairs that produce unisexual sterility. With the application of this method, a small X chromosome region of Drosophila mauritiana which produces complete hybrid male sterility (aspermic testes) in the background of D. simulans was identified. Recombination analysis reveals that this region contains a second major hybrid male sterility gene linked to the forked locus located at either 62.7 +/- 0.66 map units or at the centromere region of the X chromosome of D. mauritiana.

  18. Epidermal growth factor gene is a newly identified candidate gene for gout.

    PubMed

    Han, Lin; Cao, Chunwei; Jia, Zhaotong; Liu, Shiguo; Liu, Zhen; Xin, Ruosai; Wang, Can; Li, Xinde; Ren, Wei; Wang, Xuefeng; Li, Changgui

    2016-08-10

    Chromosome 4q25 has been identified as a genomic region associated with gout. However, the associations of gout with the genes in this region have not yet been confirmed. Here, we performed two-stage analysis to determine whether variations in candidate genes in the 4q25 region are associated with gout in a male Chinese Han population. We first evaluated 96 tag single nucleotide polymorphisms (SNPs) in eight inflammatory/immune pathway- or glucose/lipid metabolism-related genes in the 4q25 region in 480 male gout patients and 480 controls. The SNP rs12504538, located in the elongation of very-long-chain-fatty-acid-like family member 6 gene (Elovl6), was found to be associated with gout susceptibility (Padjusted = 0.00595). In the second stage of analysis, we performed fine mapping analysis of 93 tag SNPs in Elovl6 and in the epidermal growth factor gene (EGF) and its flanking regions in 1017 male patients gout and 1897 healthy male controls. We observed a significant association between the T allele of EGF rs2298999 and gout (odds ratio = 0.77, 95% confidence interval = 0.67-0.88, Padjusted = 6.42 × 10(-3)). These results provide the first evidence for an association between the EGF rs2298999 C/T polymorphism and gout. Our findings should be validated in additional populations.

  19. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.).

    PubMed

    Raju, Nikku L; Gnanesh, Belaghihalli N; Lekha, Pazhamala; Jayashree, Balaji; Pande, Suresh; Hiremath, Pavana J; Byregowda, Munishamappa; Singh, Nagendra K; Varshney, Rajeev K

    2010-03-11

    . Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with >or= 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.

  20. The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.)

    PubMed Central

    2010-01-01

    molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding. PMID:20222972

  1. Whole exome sequencing identifies novel candidate genes that modify chronic obstructive pulmonary disease susceptibility.

    PubMed

    Bruse, Shannon; Moreau, Michael; Bromberg, Yana; Jang, Jun-Ho; Wang, Nan; Ha, Hongseok; Picchi, Maria; Lin, Yong; Langley, Raymond J; Qualls, Clifford; Klensney-Tait, Julia; Zabner, Joseph; Leng, Shuguang; Mao, Jenny; Belinsky, Steven A; Xing, Jinchuan; Nyunoya, Toru

    2016-01-07

    Chronic obstructive pulmonary disease (COPD) is characterized by an irreversible airflow limitation in response to inhalation of noxious stimuli, such as cigarette smoke. However, only 15-20 % smokers manifest COPD, suggesting a role for genetic predisposition. Although genome-wide association studies have identified common genetic variants that are associated with susceptibility to COPD, effect sizes of the identified variants are modest, as is the total heritability accounted for by these variants. In this study, an extreme phenotype exome sequencing study was combined with in vitro modeling to identify COPD candidate genes. We performed whole exome sequencing of 62 highly susceptible smokers and 30 exceptionally resistant smokers to identify rare variants that may contribute to disease risk or resistance to COPD. This was a cross-sectional case-control study without therapeutic intervention or longitudinal follow-up information. We identified candidate genes based on rare variant analyses and evaluated exonic variants to pinpoint individual genes whose function was computationally established to be significantly different between susceptible and resistant smokers. Top scoring candidate genes from these analyses were further filtered by requiring that each gene be expressed in human bronchial epithelial cells (HBECs). A total of 81 candidate genes were thus selected for in vitro functional testing in cigarette smoke extract (CSE)-exposed HBECs. Using small interfering RNA (siRNA)-mediated gene silencing experiments, we showed that silencing of several candidate genes augmented CSE-induced cytotoxicity in vitro. Our integrative analysis through both genetic and functional approaches identified two candidate genes (TACC2 and MYO1E) that augment cigarette smoke (CS)-induced cytotoxicity and, potentially, COPD susceptibility.

  2. Clustering approaches to identifying gene expression patterns from DNA microarray data.

    PubMed

    Do, Jin Hwan; Choi, Dong-Kug

    2008-04-30

    The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

  3. Genes Important for Schizosaccharomyces pombe Meiosis Identified Through a Functional Genomics Screen

    PubMed Central

    Blyth, Julie; Makrantoni, Vasso; Barton, Rachael E.; Spanos, Christos; Rappsilber, Juri; Marston, Adele L.

    2018-01-01

    Meiosis is a specialized cell division that generates gametes, such as eggs and sperm. Errors in meiosis result in miscarriages and are the leading cause of birth defects; however, the molecular origins of these defects remain unknown. Studies in model organisms are beginning to identify the genes and pathways important for meiosis, but the parts list is still poorly defined. Here we present a comprehensive catalog of genes important for meiosis in the fission yeast, Schizosaccharomyces pombe. Our genome-wide functional screen surveyed all nonessential genes for roles in chromosome segregation and spore formation. Novel genes important at distinct stages of the meiotic chromosome segregation and differentiation program were identified. Preliminary characterization implicated three of these genes in centrosome/spindle pole body, centromere, and cohesion function. Our findings represent a near-complete parts list of genes important for meiosis in fission yeast, providing a valuable resource to advance our molecular understanding of meiosis. PMID:29259000

  4. Microarray and differential display identify genes involved in jasmonate-dependent anther development.

    PubMed

    Mandaokar, Ajin; Kumar, V Dinesh; Amway, Matt; Browse, John

    2003-07-01

    Jasmonate (JA) is a signaling compound essential for anther development and pollen fertility in Arabidopsis. Mutations that block the pathway of JA synthesis result into male sterility. To understand the processes of anther and pollen maturation, we used microarray and differential display approaches to compare gene expression pattern in anthers of wild-type Arabidopsis and the male-sterile mutant, opr3. Microarray experiment revealed 25 genes that were up-regulated more than 1.8-fold in wild-type anthers as compared to mutant anthers. Experiments based on differential display identified 13 additional genes up-regulated in wild-type anthers compared to opr3 for a total of 38 differentially expressed genes. Searches of the Arabidopsis and non-redundant databases disclosed known or likely functions for 28 of the 38 genes identified, while 10 genes encode proteins of unknown function. Northern blot analysis of eight representative clones as probes confirmed low expression in opr3 anthers compared with wild-type anthers. JA responsiveness of these same genes was also investigated by northern blot analysis of anther RNA isolated from wild-type and opr3 plants, In these experiments, four genes were induced in opr3 anthers within 0.5-1 h of JA treatment while the remaining genes were up-regulated only 1-8 h after JA application. None of these genes was induced by JA in anthers of the coil mutant that is deficient in JA responsiveness. The four early-induced genes in opr3 encode lipoxygenase, a putative bHLH transcription factor, epithiospecifier protein and an unknown protein. We propose that these and other early components may be involved in JA signaling and in the initiation of developmental processes. The four late genes encode an extensin-like protein, a peptide transporter and two unknown proteins, which may represent components required later in anther and pollen maturation. Transcript profiling has provided a successful approach to identify genes involved in

  5. Identification of a novel set of genes reflecting different in vivo invasive patterns of human GBM cells.

    PubMed

    Monticone, Massimiliano; Daga, Antonio; Candiani, Simona; Romeo, Francesco; Mirisola, Valentina; Viaggi, Silvia; Melloni, Ilaria; Pedemonte, Simona; Zona, Gianluigi; Giaretti, Walter; Pfeffer, Ulrich; Castagnola, Patrizio

    2012-08-17

    Most patients affected by Glioblastoma multiforme (GBM, grade IV glioma) experience a recurrence of the disease because of the spreading of tumor cells beyond surgical boundaries. Unveiling mechanisms causing this process is a logic goal to impair the killing capacity of GBM cells by molecular targeting.We noticed that our long-term GBM cultures, established from different patients, may display two categories/types of growth behavior in an orthotopic xenograft model: expansion of the tumor mass and formation of tumor branches/nodules (nodular like, NL-type) or highly diffuse single tumor cell infiltration (HD-type). We determined by DNA microarrays the gene expression profiles of three NL-type and three HD-type long-term GBM cultures. Subsequently, individual genes with different expression levels between the two groups were identified using Significance Analysis of Microarrays (SAM). Real time RT-PCR, immunofluorescence and immunoblot analyses, were performed for a selected subgroup of regulated gene products to confirm the results obtained by the expression analysis. Here, we report the identification of a set of 34 differentially expressed genes in the two types of GBM cultures. Twenty-three of these genes encode for proteins localized to the plasma membrane and 9 of these for proteins are involved in the process of cell adhesion. This study suggests the participation in the diffuse infiltrative/invasive process of GBM cells within the CNS of a novel set of genes coding for membrane-associated proteins, which should be thus susceptible to an inhibition strategy by specific targeting.Massimiliano Monticone and Antonio Daga contributed equally to this work.

  6. Lentiviral vector-based insertional mutagenesis identifies genes associated with liver cancer

    PubMed Central

    Ranzani, Marco; Cesana, Daniela; Bartholomae, Cynthia C.; Sanvito, Francesca; Pala, Mauro; Benedicenti, Fabrizio; Gallina, Pierangela; Sergi, Lucia Sergi; Merella, Stefania; Bulfone, Alessandro; Doglioni, Claudio; von Kalle, Christof; Kim, Yoon Jun; Schmidt, Manfred; Tonon, Giovanni; Naldini, Luigi; Montini, Eugenio

    2013-01-01

    Transposons and γ-retroviruses have been efficiently used as insertional mutagens in different tissues to identify molecular culprits of cancer. However, these systems are characterized by recurring integrations that accumulate in tumor cells, hampering the identification of early cancer-driving events amongst bystander and progression-related events. We developed an insertional mutagenesis platform based on lentiviral vectors (LVV) by which we could efficiently induce hepatocellular carcinoma (HCC) in 3 different mouse models. By virtue of LVV’s replication-deficient nature and broad genome-wide integration pattern, LVV-based insertional mutagenesis allowed identification of 4 new liver cancer genes from a limited number of integrations. We validated the oncogenic potential of all the identified genes in vivo, with different levels of penetrance. Our newly identified cancer genes are likely to play a role in human disease, since they are upregulated and/or amplified/deleted in human HCCs and can predict clinical outcome of patients. PMID:23314173

  7. htsint: a Python library for sequencing pipelines that combines data through gene set generation.

    PubMed

    Richards, Adam J; Herrel, Anthony; Bonneaud, Camille

    2015-09-24

    Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.

  8. Meta-analysis identifies a MECOM gene as a novel predisposing factor of osteoporotic fracture

    PubMed Central

    Hwang, Joo-Yeon; Lee, Seung Hun; Go, Min Jin; Kim, Beom-Jun; Kou, Ikuyo; Ikegawa, Shiro; Guo, Yan; Deng, Hong-Wen; Raychaudhuri, Soumya; Kim, Young Jin; Oh, Ji Hee; Kim, Youngdoe; Moon, Sanghoon; Kim, Dong-Joon; Koo, Heejo; Cha, My-Jung; Lee, Min Hye; Yun, Ji Young; Yoo, Hye-Sook; Kang, Young-Ah; Cho, Eun-Hee; Kim, Sang-Wook; Oh, Ki Won; Kang, Moo II; Son, Ho Young; Kim, Shin-Yoon; Kim, Ghi Su; Han, Bok-Ghee; Cho, Yoon Shin; Cho, Myeong-Chan; Lee, Jong-Young; Koh, Jung-Min

    2014-01-01

    Background Osteoporotic fracture (OF) as a clinical endpoint is a major complication of osteoporosis. To screen for OF susceptibility genes, we performed a genome-wide association study and carried out de novo replication analysis of an East Asian population. Methods Association was tested using a logistic regression analysis. A meta-analysis was performed on the combined results using effect size and standard errors estimated for each study. Results In a combined meta-analysis of a discovery cohort (288 cases and 1139 controls), three hospital based sets in replication stage I (462 cases and 1745 controls), and an independent ethnic group in replication stage II (369 cases and 560 for controls), we identified a new locus associated with OF (rs784288 in the MECOM gene) that showed genome-wide significance (p=3.59×10−8; OR 1.39). RNA interference revealed that a MECOM knockdown suppresses osteoclastogenesis. Conclusions Our findings provide new insights into the genetic architecture underlying OF in East Asians. PMID:23349225

  9. TGMI: an efficient algorithm for identifying pathway regulators through evaluation of triple-gene mutual interaction

    PubMed Central

    Gunasekara, Chathura; Zhang, Kui; Deng, Wenping; Brown, Laura

    2018-01-01

    Abstract Despite their important roles, the regulators for most metabolic pathways and biological processes remain elusive. Presently, the methods for identifying metabolic pathway and biological process regulators are intensively sought after. We developed a novel algorithm called triple-gene mutual interaction (TGMI) for identifying these regulators using high-throughput gene expression data. It first calculated the regulatory interactions among triple gene blocks (two pathway genes and one transcription factor (TF)), using conditional mutual information, and then identifies significantly interacted triple genes using a newly identified novel mutual interaction measure (MIM), which was substantiated to reflect strengths of regulatory interactions within each triple gene block. The TGMI calculated the MIM for each triple gene block and then examined its statistical significance using bootstrap. Finally, the frequencies of all TFs present in all significantly interacted triple gene blocks were calculated and ranked. We showed that the TFs with higher frequencies were usually genuine pathway regulators upon evaluating multiple pathways in plants, animals and yeast. Comparison of TGMI with several other algorithms demonstrated its higher accuracy. Therefore, TGMI will be a valuable tool that can help biologists to identify regulators of metabolic pathways and biological processes from the exploded high-throughput gene expression data in public repositories. PMID:29579312

  10. Gene-centric meta-analysis in 87,736 individuals of European ancestry identifies multiple blood-pressure-related loci.

    PubMed

    Tragante, Vinicius; Barnes, Michael R; Ganesh, Santhi K; Lanktree, Matthew B; Guo, Wei; Franceschini, Nora; Smith, Erin N; Johnson, Toby; Holmes, Michael V; Padmanabhan, Sandosh; Karczewski, Konrad J; Almoguera, Berta; Barnard, John; Baumert, Jens; Chang, Yen-Pei Christy; Elbers, Clara C; Farrall, Martin; Fischer, Mary E; Gaunt, Tom R; Gho, Johannes M I H; Gieger, Christian; Goel, Anuj; Gong, Yan; Isaacs, Aaron; Kleber, Marcus E; Mateo Leach, Irene; McDonough, Caitrin W; Meijs, Matthijs F L; Melander, Olle; Nelson, Christopher P; Nolte, Ilja M; Pankratz, Nathan; Price, Tom S; Shaffer, Jonathan; Shah, Sonia; Tomaszewski, Maciej; van der Most, Peter J; Van Iperen, Erik P A; Vonk, Judith M; Witkowska, Kate; Wong, Caroline O L; Zhang, Li; Beitelshees, Amber L; Berenson, Gerald S; Bhatt, Deepak L; Brown, Morris; Burt, Amber; Cooper-DeHoff, Rhonda M; Connell, John M; Cruickshanks, Karen J; Curtis, Sean P; Davey-Smith, George; Delles, Christian; Gansevoort, Ron T; Guo, Xiuqing; Haiqing, Shen; Hastie, Claire E; Hofker, Marten H; Hovingh, G Kees; Kim, Daniel S; Kirkland, Susan A; Klein, Barbara E; Klein, Ronald; Li, Yun R; Maiwald, Steffi; Newton-Cheh, Christopher; O'Brien, Eoin T; Onland-Moret, N Charlotte; Palmas, Walter; Parsa, Afshin; Penninx, Brenda W; Pettinger, Mary; Vasan, Ramachandran S; Ranchalis, Jane E; M Ridker, Paul; Rose, Lynda M; Sever, Peter; Shimbo, Daichi; Steele, Laura; Stolk, Ronald P; Thorand, Barbara; Trip, Mieke D; van Duijn, Cornelia M; Verschuren, W Monique; Wijmenga, Cisca; Wyatt, Sharon; Young, J Hunter; Zwinderman, Aeilko H; Bezzina, Connie R; Boerwinkle, Eric; Casas, Juan P; Caulfield, Mark J; Chakravarti, Aravinda; Chasman, Daniel I; Davidson, Karina W; Doevendans, Pieter A; Dominiczak, Anna F; FitzGerald, Garret A; Gums, John G; Fornage, Myriam; Hakonarson, Hakon; Halder, Indrani; Hillege, Hans L; Illig, Thomas; Jarvik, Gail P; Johnson, Julie A; Kastelein, John J P; Koenig, Wolfgang; Kumari, Meena; März, Winfried; Murray, Sarah S; O'Connell, Jeffery R; Oldehinkel, Albertine J; Pankow, James S; Rader, Daniel J; Redline, Susan; Reilly, Muredach P; Schadt, Eric E; Kottke-Marchant, Kandice; Snieder, Harold; Snyder, Michael; Stanton, Alice V; Tobin, Martin D; Uitterlinden, André G; van der Harst, Pim; van der Schouw, Yvonne T; Samani, Nilesh J; Watkins, Hugh; Johnson, Andrew D; Reiner, Alex P; Zhu, Xiaofeng; de Bakker, Paul I W; Levy, Daniel; Asselbergs, Folkert W; Munroe, Patricia B; Keating, Brendan J

    2014-03-06

    Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  11. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    PubMed

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  12. Epidermal growth factor gene is a newly identified candidate gene for gout

    PubMed Central

    Han, Lin; Cao, Chunwei; Jia, Zhaotong; Liu, Shiguo; Liu, Zhen; Xin, Ruosai; Wang, Can; Li, Xinde; Ren, Wei; Wang, Xuefeng; Li, Changgui

    2016-01-01

    Chromosome 4q25 has been identified as a genomic region associated with gout. However, the associations of gout with the genes in this region have not yet been confirmed. Here, we performed two-stage analysis to determine whether variations in candidate genes in the 4q25 region are associated with gout in a male Chinese Han population. We first evaluated 96 tag single nucleotide polymorphisms (SNPs) in eight inflammatory/immune pathway- or glucose/lipid metabolism-related genes in the 4q25 region in 480 male gout patients and 480 controls. The SNP rs12504538, located in the elongation of very-long-chain-fatty-acid-like family member 6 gene (Elovl6), was found to be associated with gout susceptibility (Padjusted = 0.00595). In the second stage of analysis, we performed fine mapping analysis of 93 tag SNPs in Elovl6 and in the epidermal growth factor gene (EGF) and its flanking regions in 1017 male patients gout and 1897 healthy male controls. We observed a significant association between the T allele of EGF rs2298999 and gout (odds ratio = 0.77, 95% confidence interval = 0.67–0.88, Padjusted = 6.42 × 10−3). These results provide the first evidence for an association between the EGF rs2298999 C/T polymorphism and gout. Our findings should be validated in additional populations. PMID:27506295

  13. Identifying the genes of unconventional high temperature superconductors.

    PubMed

    Hu, Jiangping

    We elucidate a recently emergent framework in unifying the two families of high temperature (high [Formula: see text]) superconductors, cuprates and iron-based superconductors. The unification suggests that the latter is simply the counterpart of the former to realize robust extended s-wave pairing symmetries in a square lattice. The unification identifies that the key ingredients (gene) of high [Formula: see text] superconductors is a quasi two dimensional electronic environment in which the d -orbitals of cations that participate in strong in-plane couplings to the p -orbitals of anions are isolated near Fermi energy. With this gene, the superexchange magnetic interactions mediated by anions could maximize their contributions to superconductivity. Creating the gene requires special arrangements between local electronic structures and crystal lattice structures. The speciality explains why high [Formula: see text] superconductors are so rare. An explicit prediction is made to realize high [Formula: see text] superconductivity in Co/Ni-based materials with a quasi two dimensional hexagonal lattice structure formed by trigonal bipyramidal complexes.

  14. Combining gene expression and genetic analyses to identify candidate genes involved in cold responses in pea.

    PubMed

    Legrand, Sylvain; Marque, Gilles; Blassiau, Christelle; Bluteau, Aurélie; Canoy, Anne-Sophie; Fontaine, Véronique; Jaminon, Odile; Bahrman, Nasser; Mautord, Julie; Morin, Julie; Petit, Aurélie; Baranger, Alain; Rivière, Nathalie; Wilmer, Jeroen; Delbreil, Bruno; Lejeune-Hénaut, Isabelle

    2013-09-01

    Cold stress affects plant growth and development. In order to better understand the responses to cold (chilling or freezing tolerance), we used two contrasted pea lines. Following a chilling period, the Champagne line becomes tolerant to frost whereas the Terese line remains sensitive. Four suppression subtractive hybridisation libraries were obtained using mRNAs isolated from pea genotypes Champagne and Terese. Using quantitative polymerase chain reaction (qPCR) performed on 159 genes, 43 and 54 genes were identified as differentially expressed at the initial time point and during the time course study, respectively. Molecular markers were developed from the differentially expressed genes and were genotyped on a population of 164 RILs derived from a cross between Champagne and Terese. We identified 5 candidate genes colocalizing with 3 different frost damage quantitative trait loci (QTL) intervals and a protein quantity locus (PQL) rich region previously reported. This investigation revealed the role of constitutive differences between both genotypes in the cold responses, in particular with genes related to glycine degradation pathway that could confer to Champagne a better frost tolerance. We showed that freezing tolerance involves a decrease of expression of genes related to photosynthesis and the expression of a gene involved in the production of cysteine and methionine that could act as cryoprotectant molecules. Although it remains to be confirmed, this study could also reveal the involvement of the jasmonate pathway in the cold responses, since we observed that two genes related to this pathway were mapped in a frost damage QTL interval and in a PQL rich region interval, respectively. Copyright © 2013 Elsevier GmbH. All rights reserved.

  15. Joint genetic analysis of hippocampal size in mouse and human identifies a novel gene linked to neurodegenerative disease.

    PubMed

    Ashbrook, David G; Williams, Robert W; Lu, Lu; Stein, Jason L; Hibar, Derrek P; Nichols, Thomas E; Medland, Sarah E; Thompson, Paul M; Hager, Reinmar

    2014-10-03

    Variation in hippocampal volume has been linked to significant differences in memory, behavior, and cognition among individuals. To identify genetic variants underlying such differences and associated disease phenotypes, multinational consortia such as ENIGMA have used large magnetic resonance imaging (MRI) data sets in human GWAS studies. In addition, mapping studies in mouse model systems have identified genetic variants for brain structure variation with great power. A key challenge is to understand how genetically based differences in brain structure lead to the propensity to develop specific neurological disorders. We combine the largest human GWAS of brain structure with the largest mammalian model system, the BXD recombinant inbred mouse population, to identify novel genetic targets influencing brain structure variation that are linked to increased risk for neurological disorders. We first use a novel cross-species, comparative analysis using mouse and human genetic data to identify a candidate gene, MGST3, associated with adult hippocampus size in both systems. We then establish the coregulation and function of this gene in a comprehensive systems-analysis. We find that MGST3 is associated with hippocampus size and is linked to a group of neurodegenerative disorders, such as Alzheimer's.

  16. PCR and restriction fragment length polymorphism of a pel gene as a tool to identify Erwinia carotovora in relation to potato diseases.

    PubMed Central

    Darrasse, A; Priou, S; Kotoujansky, A; Bertheau, Y

    1994-01-01

    Using a sequenced pectate lyase-encoding gene (pel gene), we developed a PCR test for Erwinia carotovora. A set of primers allowed the amplification of a 434-bp fragment in E. carotovora strains. Among the 89 E. carotovora strains tested, only the Erwinia carotovora subsp. betavasculorum strains were not detected. A restriction fragment length polymorphism (RFLP) study was undertaken on the amplified fragment with seven endonucleases. The Sau3AI digestion pattern specifically identified the Erwinia carotovora subsp. atroseptica strains, and the whole set of data identified the Erwinia carotovora subsp. wasabiae strains. However, Erwinia carotovora subsp. carotovora and Erwinia carotovora subsp. odorifera could not be separated. Phenetic and phylogenic analyses of RFLP results showed E. carotovora subsp. atroseptica as a homogeneous group while E. carotovora subsp. carotovora and E. carotovora subsp. odorifera strains exhibited a genetic diversity that may result from a nonmonophyletic origin. The use of RFLP on amplified fragments in epidemiology and for diagnosis is discussed. Images PMID:7912502

  17. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    PubMed

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  18. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Callow, Matthew J.; Dudoit, Sandrine; Gong, Elaine L.

    2000-05-05

    Based on the assumption that severe alterations in the expression of genes known to be involved in HDL metabolism may affect the expression of other genes we screened an array of over 5000 mouse expressed sequence tags (ESTs) for altered gene expression in the livers of two lines of mice with dramatic decreases in HDL plasma concentrations. Labeled cDNA from livers of apolipoprotein AI (apo AI) knockout mice, Scavenger Receptor BI (SR-BI) transgenic mice and control mice were co-hybridized to microarrays. Two-sample t-statistics were used to identify genes with altered expression levels in the knockout or transgenic mice compared withmore » the control mice. In the SR-BI group we found 9 array elements representing at least 5 genes to be significantly altered on the basis of an adjusted p value of less than 0.05. In the apo AI knockout group 8 array elements representing 4 genes were altered compared with the control group (p < 0.05). Several of the genes identified in the SR-BI transgenic suggest altered sterol metabolism and oxidative processes. These studies illustrate the use of multiple-testing methods for the identification of genes with altered expression in replicated microarray experiments of apo AI knockout and SR-BI transgenic mice.« less

  19. High-Throughput Screening Using iPSC-Derived Neuronal Progenitors to Identify Compounds Counteracting Epigenetic Gene Silencing in Fragile X Syndrome.

    PubMed

    Kaufmann, Markus; Schuffenhauer, Ansgar; Fruh, Isabelle; Klein, Jessica; Thiemeyer, Anke; Rigo, Pierre; Gomez-Mancilla, Baltazar; Heidinger-Millot, Valerie; Bouwmeester, Tewis; Schopfer, Ulrich; Mueller, Matthias; Fodor, Barna D; Cobos-Correa, Amanda

    2015-10-01

    Fragile X syndrome (FXS) is the most common form of inherited mental retardation, and it is caused in most of cases by epigenetic silencing of the Fmr1 gene. Today, no specific therapy exists for FXS, and current treatments are only directed to improve behavioral symptoms. Neuronal progenitors derived from FXS patient induced pluripotent stem cells (iPSCs) represent a unique model to study the disease and develop assays for large-scale drug discovery screens since they conserve the Fmr1 gene silenced within the disease context. We have established a high-content imaging assay to run a large-scale phenotypic screen aimed to identify compounds that reactivate the silenced Fmr1 gene. A set of 50,000 compounds was tested, including modulators of several epigenetic targets. We describe an integrated drug discovery model comprising iPSC generation, culture scale-up, and quality control and screening with a very sensitive high-content imaging assay assisted by single-cell image analysis and multiparametric data analysis based on machine learning algorithms. The screening identified several compounds that induced a weak expression of fragile X mental retardation protein (FMRP) and thus sets the basis for further large-scale screens to find candidate drugs or targets tackling the underlying mechanism of FXS with potential for therapeutic intervention. © 2015 Society for Laboratory Automation and Screening.

  20. Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines

    PubMed Central

    Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J

    2016-01-01

    Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours’ biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription–quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes. PMID:29263807

  1. Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines.

    PubMed

    Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J

    2016-01-01

    Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours' biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription-quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes.

  2. An 80-gene set to predict response to preoperative chemoradiotherapy for rectal cancer by principle component analysis.

    PubMed

    Empuku, Shinichiro; Nakajima, Kentaro; Akagi, Tomonori; Kaneko, Kunihiko; Hijiya, Naoki; Etoh, Tsuyoshi; Shiraishi, Norio; Moriyama, Masatsugu; Inomata, Masafumi

    2016-05-01

    Preoperative chemoradiotherapy (CRT) for locally advanced rectal cancer not only improves the postoperative local control rate, but also induces downstaging. However, it has not been established how to individually select patients who receive effective preoperative CRT. The aim of this study was to identify a predictor of response to preoperative CRT for locally advanced rectal cancer. This study is additional to our multicenter phase II study evaluating the safety and efficacy of preoperative CRT using oral fluorouracil (UMIN ID: 03396). From April, 2009 to August, 2011, 26 biopsy specimens obtained prior to CRT were analyzed by cyclopedic microarray analysis. Response to CRT was evaluated according to a histological grading system using surgically resected specimens. To decide on the number of genes for dividing into responder and non-responder groups, we statistically analyzed the data using a dimension reduction method, a principle component analysis. Of the 26 cases, 11 were responders and 15 non-responders. No significant difference was found in clinical background data between the two groups. We determined that the optimal number of genes for the prediction of response was 80 of 40,000 and the functions of these genes were analyzed. When comparing non-responders with responders, genes expressed at a high level functioned in alternative splicing, whereas those expressed at a low level functioned in the septin complex. Thus, an 80-gene expression set that predicts response to preoperative CRT for locally advanced rectal cancer was identified using a novel statistical method.

  3. Integrated Network Analysis Identifies Fight-Club Nodes as a Class of Hubs Encompassing Key Putative Switch Genes That Induce Major Transcriptome Reprogramming during Grapevine Development[W][OPEN

    PubMed Central

    Palumbo, Maria Concetta; Zenoni, Sara; Fasoli, Marianna; Massonnet, Mélanie; Farina, Lorenzo; Castiglione, Filippo; Pezzotti, Mario; Paci, Paola

    2014-01-01

    We developed an approach that integrates different network-based methods to analyze the correlation network arising from large-scale gene expression data. By studying grapevine (Vitis vinifera) and tomato (Solanum lycopersicum) gene expression atlases and a grapevine berry transcriptomic data set during the transition from immature to mature growth, we identified a category named “fight-club hubs” characterized by a marked negative correlation with the expression profiles of neighboring genes in the network. A special subset named “switch genes” was identified, with the additional property of many significant negative correlations outside their own group in the network. Switch genes are involved in multiple processes and include transcription factors that may be considered master regulators of the previously reported transcriptome remodeling that marks the developmental shift from immature to mature growth. All switch genes, expressed at low levels in vegetative/green tissues, showed a significant increase in mature/woody organs, suggesting a potential regulatory role during the developmental transition. Finally, our analysis of tomato gene expression data sets showed that wild-type switch genes are downregulated in ripening-deficient mutants. The identification of known master regulators of tomato fruit maturation suggests our method is suitable for the detection of key regulators of organ development in different fleshy fruit crops. PMID:25490918

  4. Haplotype Analysis in Multiple Crosses to Identify a QTL Gene

    PubMed Central

    Wang, Xiaosong; Korstanje, Ron; Higgins, David; Paigen, Beverly

    2004-01-01

    Identifying quantitative trait locus (QTL) genes is a challenging task. Herein, we report using a two-step process to identify Apoa2 as the gene underlying Hdlq5, a QTL for plasma high-density lipoprotein cholesterol (HDL) levels on mouse chromosome 1. First, we performed a sequence analysis of the Apoa2 coding region in 46 genetically diverse mouse strains and found five different APOA2 protein variants, which we named APOA2a to APOA2e. Second, we conducted a haplotype analysis of the strains in 21 crosses that have so far detected HDL QTLs; we found that Hdlq5 was detected only in the nine crosses where one parent had the APOA2b protein variant characterized by an Ala61-to-Val61 substitution. We then found that strains with the APOA2b variant had significantly higher (P ≤ 0.002) plasma HDL levels than those with either the APOA2a or the APOA2c variant. These findings support Apoa2 as the underlying Hdlq5 gene and suggest the Apoa2 polymorphisms responsible for the Hdlq5 phenotype. Therefore, haplotype analysis in multiple crosses can be used to support a candidate QTL gene. PMID:15310659

  5. Haplotype analysis in multiple crosses to identify a QTL gene.

    PubMed

    Wang, Xiaosong; Korstanje, Ron; Higgins, David; Paigen, Beverly

    2004-09-01

    Identifying quantitative trait locus (QTL) genes is a challenging task. Herein, we report using a two-step process to identify Apoa2 as the gene underlying Hdlq5, a QTL for plasma high-density lipoprotein cholesterol (HDL) levels on mouse chromosome 1. First, we performed a sequence analysis of the Apoa2 coding region in 46 genetically diverse mouse strains and found five different APOA2 protein variants, which we named APOA2a to APOA2e. Second, we conducted a haplotype analysis of the strains in 21 crosses that have so far detected HDL QTLs; we found that Hdlq5 was detected only in the nine crosses where one parent had the APOA2b protein variant characterized by an Ala61-to-Val61 substitution. We then found that strains with the APOA2b variant had significantly higher (P < or = 0.002) plasma HDL levels than those with either the APOA2a or the APOA2c variant. These findings support Apoa2 as the underlying Hdlq5 gene and suggest the Apoa2 polymorphisms responsible for the Hdlq5 phenotype. Therefore, haplotype analysis in multiple crosses can be used to support a candidate QTL gene.

  6. RAMONA: a Web application for gene set analysis on multilevel omics data.

    PubMed

    Sass, Steffen; Buettner, Florian; Mueller, Nikola S; Theis, Fabian J

    2015-01-01

    Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data, which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds. RAMONA is implemented as ASP.NET Web application and publicly available at http://icb.helmholtz-muenchen.de/ramona. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. A transposon-based genetic screen in mice identifies genes altered in colorectal cancer.

    PubMed

    Starr, Timothy K; Allaei, Raha; Silverstein, Kevin A T; Staggs, Rodney A; Sarver, Aaron L; Bergemann, Tracy L; Gupta, Mihir; O'Sullivan, M Gerard; Matise, Ilze; Dupuy, Adam J; Collier, Lara S; Powers, Scott; Oberg, Ann L; Asmann, Yan W; Thibodeau, Stephen N; Tessarollo, Lino; Copeland, Neal G; Jenkins, Nancy A; Cormier, Robert T; Largaespada, David A

    2009-03-27

    Human colorectal cancers (CRCs) display a large number of genetic and epigenetic alterations, some of which are causally involved in tumorigenesis (drivers) and others that have little functional impact (passengers). To help distinguish between these two classes of alterations, we used a transposon-based genetic screen in mice to identify candidate genes for CRC. Mice harboring mutagenic Sleeping Beauty (SB) transposons were crossed with mice expressing SB transposase in gastrointestinal tract epithelium. Most of the offspring developed intestinal lesions, including intraepithelial neoplasia, adenomas, and adenocarcinomas. Analysis of over 16,000 transposon insertions identified 77 candidate CRC genes, 60 of which are mutated and/or dysregulated in human CRC and thus are most likely to drive tumorigenesis. These genes include APC, PTEN, and SMAD4. The screen also identified 17 candidate genes that had not previously been implicated in CRC, including POLI, PTPRK, and RSPO2.

  8. An extended set of yeast-based functional assays accurately identifies human disease mutations

    PubMed Central

    Sun, Song; Yang, Fan; Tan, Guihong; Costanzo, Michael; Oughtred, Rose; Hirschman, Jodi; Theesfeld, Chandra L.; Bansal, Pritpal; Sahni, Nidhi; Yi, Song; Yu, Analyn; Tyagi, Tanya; Tie, Cathy; Hill, David E.; Vidal, Marc; Andrews, Brenda J.; Boone, Charles; Dolinski, Kara; Roth, Frederick P.

    2016-01-01

    We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods. PMID:26975778

  9. Comparison of gene expression in segregating families identifies genes and genomic regions involved in a novel adaptation, zinc hyperaccumulation.

    PubMed

    Filatov, Victor; Dowdle, John; Smirnoff, Nicholas; Ford-Lloyd, Brian; Newbury, H John; Macnair, Mark R

    2006-09-01

    One of the challenges of comparative genomics is to identify specific genetic changes associated with the evolution of a novel adaptation or trait. We need to be able to disassociate the genes involved with a particular character from all the other genetic changes that take place as lineages diverge. Here we show that by comparing the transcriptional profile of segregating families with that of parent species differing in a novel trait, it is possible to narrow down substantially the list of potential target genes. In addition, by assuming synteny with a related model organism for which the complete genome sequence is available, it is possible to use the cosegregation of markers differing in transcription level to identify regions of the genome which probably contain quantitative trait loci (QTLs) for the character. This novel combination of genomics and classical genetics provides a very powerful tool to identify candidate genes. We use this methodology to investigate zinc hyperaccumulation in Arabidopsis halleri, the sister species to the model plant, Arabidopsis thaliana. We compare the transcriptional profile of A. halleri with that of its sister nonaccumulator species, Arabidopsis petraea, and between accumulator and nonaccumulator F(3)s derived from the cross between the two species. We identify eight genes which consistently show greater expression in accumulator phenotypes in both roots and shoots, including two metal transporter genes (NRAMP3 and ZIP6), and cytoplasmic aconitase, a gene involved in iron homeostasis in mammals. We also show that there appear to be two QTLs for zinc accumulation, on chromosomes 3 and 7.

  10. Loci influencing blood pressure identified using a cardiovascular gene-centric array.

    PubMed

    Ganesh, Santhi K; Tragante, Vinicius; Guo, Wei; Guo, Yiran; Lanktree, Matthew B; Smith, Erin N; Johnson, Toby; Castillo, Berta Almoguera; Barnard, John; Baumert, Jens; Chang, Yen-Pei Christy; Elbers, Clara C; Farrall, Martin; Fischer, Mary E; Franceschini, Nora; Gaunt, Tom R; Gho, Johannes M I H; Gieger, Christian; Gong, Yan; Isaacs, Aaron; Kleber, Marcus E; Mateo Leach, Irene; McDonough, Caitrin W; Meijs, Matthijs F L; Mellander, Olle; Molony, Cliona M; Nolte, Ilja M; Padmanabhan, Sandosh; Price, Tom S; Rajagopalan, Ramakrishnan; Shaffer, Jonathan; Shah, Sonia; Shen, Haiqing; Soranzo, Nicole; van der Most, Peter J; Van Iperen, Erik P A; Van Setten, Jessica; Van Setten, Jessic A; Vonk, Judith M; Zhang, Li; Beitelshees, Amber L; Berenson, Gerald S; Bhatt, Deepak L; Boer, Jolanda M A; Boerwinkle, Eric; Burkley, Ben; Burt, Amber; Chakravarti, Aravinda; Chen, Wei; Cooper-Dehoff, Rhonda M; Curtis, Sean P; Dreisbach, Albert; Duggan, David; Ehret, Georg B; Fabsitz, Richard R; Fornage, Myriam; Fox, Ervin; Furlong, Clement E; Gansevoort, Ron T; Hofker, Marten H; Hovingh, G Kees; Kirkland, Susan A; Kottke-Marchant, Kandice; Kutlar, Abdullah; Lacroix, Andrea Z; Langaee, Taimour Y; Li, Yun R; Lin, Honghuang; Liu, Kiang; Maiwald, Steffi; Malik, Rainer; Murugesan, Gurunathan; Newton-Cheh, Christopher; O'Connell, Jeffery R; Onland-Moret, N Charlotte; Ouwehand, Willem H; Palmas, Walter; Penninx, Brenda W; Pepine, Carl J; Pettinger, Mary; Polak, Joseph F; Ramachandran, Vasan S; Ranchalis, Jane; Redline, Susan; Ridker, Paul M; Rose, Lynda M; Scharnag, Hubert; Schork, Nicholas J; Shimbo, Daichi; Shuldiner, Alan R; Srinivasan, Sathanur R; Stolk, Ronald P; Taylor, Herman A; Thorand, Barbara; Trip, Mieke D; van Duijn, Cornelia M; Verschuren, W Monique; Wijmenga, Cisca; Winkelmann, Bernhard R; Wyatt, Sharon; Young, J Hunter; Boehm, Bernhard O; Caulfield, Mark J; Chasman, Daniel I; Davidson, Karina W; Doevendans, Pieter A; Fitzgerald, Garret A; Gums, John G; Hakonarson, Hakon; Hillege, Hans L; Illig, Thomas; Jarvik, Gail P; Johnson, Julie A; Kastelein, John J P; Koenig, Wolfgang; März, Winfried; Mitchell, Braxton D; Murray, Sarah S; Oldehinkel, Albertine J; Rader, Daniel J; Reilly, Muredach P; Reiner, Alex P; Schadt, Eric E; Silverstein, Roy L; Snieder, Harold; Stanton, Alice V; Uitterlinden, André G; van der Harst, Pim; van der Schouw, Yvonne T; Samani, Nilesh J; Johnson, Andrew D; Munroe, Patricia B; de Bakker, Paul I W; Zhu, Xiaofeng; Levy, Daniel; Keating, Brendan J; Asselbergs, Folkert W

    2013-04-15

    Blood pressure (BP) is a heritable determinant of risk for cardiovascular disease (CVD). To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP) and pulse pressure (PP), we genotyped ∼50 000 single-nucleotide polymorphisms (SNPs) that capture variation in ∼2100 candidate genes for cardiovascular phenotypes in 61 619 individuals of European ancestry from cohort studies in the USA and Europe. We identified novel associations between rs347591 and SBP (chromosome 3p25.3, in an intron of HRH1) and between rs2169137 and DBP (chromosome1q32.1 in an intron of MDM4) and between rs2014408 and SBP (chromosome 11p15 in an intron of SOX6), previously reported to be associated with MAP. We also confirmed 10 previously known loci associated with SBP, DBP, MAP or PP (ADRB1, ATP2B1, SH2B3/ATXN2, CSK, CYP17A1, FURIN, HFE, LSP1, MTHFR, SOX6) at array-wide significance (P < 2.4 × 10(-6)). We then replicated these associations in an independent set of 65 886 individuals of European ancestry. The findings from expression QTL (eQTL) analysis showed associations of SNPs in the MDM4 region with MDM4 expression. We did not find any evidence of association of the two novel SNPs in MDM4 and HRH1 with sequelae of high BP including coronary artery disease (CAD), left ventricular hypertrophy (LVH) or stroke. In summary, we identified two novel loci associated with BP and confirmed multiple previously reported associations. Our findings extend our understanding of genes involved in BP regulation, some of which may eventually provide new targets for therapeutic intervention.

  11. Identifying anti-cancer drug response related genes using an integrative analysis of transcriptomic and genomic variations with cell line-based drug perturbations.

    PubMed

    Sun, Yi; Zhang, Wei; Chen, Yunqin; Ma, Qin; Wei, Jia; Liu, Qi

    2016-02-23

    Clinical responses to anti-cancer therapies often only benefit a defined subset of patients. Predicting the best treatment strategy hinges on our ability to effectively translate genomic data into actionable information on drug responses. To achieve this goal, we compiled a comprehensive collection of baseline cancer genome data and drug response information derived from a large panel of cancer cell lines. This data set was applied to identify the signature genes relevant to drug sensitivity and their resistance by integrating CNVs and the gene expression of cell lines with in vitro drug responses. We presented an efficient in-silico pipeline for integrating heterogeneous cell line data sources with the simultaneous modeling of drug response values across all the drugs and cell lines. Potential signature genes correlated with drug response (sensitive or resistant) in different cancer types were identified. Using signature genes, our collaborative filtering-based drug response prediction model outperformed the 44 algorithms submitted to the DREAM competition on breast cancer cells. The functions of the identified drug response related signature genes were carefully analyzed at the pathway level and the synthetic lethality level. Furthermore, we validated these signature genes by applying them to the classification of the different subtypes of the TCGA tumor samples, and further uncovered their in vivo implications using clinical patient data. Our work may have promise in translating genomic data into customized marker genes relevant to the response of specific drugs for a specific cancer type of individual patients.

  12. Expression map of a complete set of gustatory receptor genes in chemosensory organs of Bombyx mori.

    PubMed

    Guo, Huizhen; Cheng, Tingcai; Chen, Zhiwei; Jiang, Liang; Guo, Youbing; Liu, Jianqiu; Li, Shenglong; Taniai, Kiyoko; Asaoka, Kiyoshi; Kadono-Okuda, Keiko; Arunkumar, Kallare P; Wu, Jiaqi; Kishino, Hirohisa; Zhang, Huijie; Seth, Rakesh K; Gopinathan, Karumathil P; Montagné, Nicolas; Jacquin-Joly, Emmanuelle; Goldsmith, Marian R; Xia, Qingyou; Mita, Kazuei

    2017-03-01

    Most lepidopteran species are herbivores, and interaction with host plants affects their gene expression and behavior as well as their genome evolution. Gustatory receptors (Grs) are expected to mediate host plant selection, feeding, oviposition and courtship behavior. However, due to their high diversity, sequence divergence and extremely low level of expression it has been difficult to identify precisely a complete set of Grs in Lepidoptera. By manual annotation and BAC sequencing, we improved annotation of 43 gene sequences compared with previously reported Grs in the most studied lepidopteran model, the silkworm, Bombyx mori, and identified 7 new tandem copies of BmGr30 on chromosome 7, bringing the total number of BmGrs to 76. Among these, we mapped 68 genes to chromosomes in a newly constructed chromosome distribution map and 8 genes to scaffolds; we also found new evidence for large clusters of BmGrs, especially from the bitter receptor family. RNA-seq analysis of diverse BmGr expression patterns in chemosensory organs of larvae and adults enabled us to draw a precise organ specific map of BmGr expression. Interestingly, most of the clustered genes were expressed in the same tissues and more than half of the genes were expressed in larval maxillae, larval thoracic legs and adult legs. For example, BmGr63 showed high expression levels in all organs in both larval and adult stages. By contrast, some genes showed expression limited to specific developmental stages or organs and tissues. BmGr19 was highly expressed in larval chemosensory organs (especially antennae and thoracic legs), the single exon genes BmGr53 and BmGr67 were expressed exclusively in larval tissues, the BmGr27-BmGr31 gene cluster on chr7 displayed a high expression level limited to adult legs and the candidate CO 2 receptor BmGr2 was highly expressed in adult antennae, where few other Grs were expressed. Transcriptional analysis of the Grs in B. mori provides a valuable new reference for

  13. A Multiomics Approach to Identify Genes Associated with Childhood Asthma Risk and Morbidity.

    PubMed

    Forno, Erick; Wang, Ting; Yan, Qi; Brehm, John; Acosta-Perez, Edna; Colon-Semidey, Angel; Alvarez, Maria; Boutaoui, Nadia; Cloutier, Michelle M; Alcorn, John F; Canino, Glorisa; Chen, Wei; Celedón, Juan C

    2017-10-01

    Childhood asthma is a complex disease. In this study, we aim to identify genes associated with childhood asthma through a multiomics "vertical" approach that integrates multiple analytical steps using linear and logistic regression models. In a case-control study of childhood asthma in Puerto Ricans (n = 1,127), we used adjusted linear or logistic regression models to evaluate associations between several analytical steps of omics data, including genome-wide (GW) genotype data, GW methylation, GW expression profiling, cytokine levels, asthma-intermediate phenotypes, and asthma status. At each point, only the top genes/single-nucleotide polymorphisms/probes/cytokines were carried forward for subsequent analysis. In step 1, asthma modified the gene expression-protein level association for 1,645 genes; pathway analysis showed an enrichment of these genes in the cytokine signaling system (n = 269 genes). In steps 2-3, expression levels of 40 genes were associated with intermediate phenotypes (asthma onset age, forced expiratory volume in 1 second, exacerbations, eosinophil counts, and skin test reactivity); of those, methylation of seven genes was also associated with asthma. Of these seven candidate genes, IL5RA was also significant in analytical steps 4-8. We then measured plasma IL-5 receptor α levels, which were associated with asthma age of onset and moderate-severe exacerbations. In addition, in silico database analysis showed that several of our identified IL5RA single-nucleotide polymorphisms are associated with transcription factors related to asthma and atopy. This approach integrates several analytical steps and is able to identify biologically relevant asthma-related genes, such as IL5RA. It differs from other methods that rely on complex statistical models with various assumptions.

  14. Mechanical Unloading of Mouse Bone in Microgravity Significantly Alters Cell Cycle Gene Set Expression

    NASA Astrophysics Data System (ADS)

    Blaber, Elizabeth; Dvorochkin, Natalya; Almeida, Eduardo; Kaplan, Warren; Burns, Brnedan

    2012-07-01

    unloading in spaceflight, we conducted genome wide microarray analysis of total RNA isolated from the mouse pelvis. Specifically, 16 week old mice were subjected to 15 days spaceflight onboard NASA's STS-131 space shuttle mission. The pelvis of the mice was dissected, the bone marrow was flushed and the bones were briefly stored in RNAlater. The pelvii were then homogenized, and RNA was isolated using TRIzol. RNA concentration and quality was measured using a Nanodrop spectrometer, and 0.8% agarose gel electrophoresis. Samples of cDNA were analyzed using an Affymetrix GeneChip\\S Gene 1.0 ST (Sense Target) Array System for Mouse and GenePattern Software. We normalized the ST gene arrays using Robust Multichip Average (RMA) normalization, which summarizes perfectly matched spots on the array through the median polish algorithm, rather than normalizing according to mismatched spots. We also used Limma for statistical analysis, using the BioConductor Limma Library by Gordon Smyth, and differential expression analysis to identify genes with significant changes in expression between the two experimental conditions. Finally we used GSEApreRanked for Gene Set Enrichment Analysis (GSEA), with Kolmogorov-Smirnov style statistics to identify groups of genes that are regulated together using the t-statistics derived from Limma. Preliminary results show that 6,603 genes expressed in pelvic bone had statistically significant alterations in spaceflight compared to ground controls. These prominently included cell cycle arrest molecules p21, and p18, cell survival molecule Crbp1, and cell cycle molecules cyclin D1, and Cdk1. Additionally, GSEA results indicated alterations in molecular targets of cyclin D1 and Cdk4, senescence pathways resulting from abnormal laminin maturation, cell-cell contacts via E-cadherin, and several pathways relating to protein translation and metabolism. In total 111 gene sets out of 2,488, about 4%, showed statistically significant set alterations. These

  15. A large-scale RNA interference screen identifies genes that regulate autophagy at different stages.

    PubMed

    Guo, Sujuan; Pridham, Kevin J; Virbasius, Ching-Man; He, Bin; Zhang, Liqing; Varmark, Hanne; Green, Michael R; Sheng, Zhi

    2018-02-12

    Dysregulated autophagy is central to the pathogenesis and therapeutic development of cancer. However, how autophagy is regulated in cancer is not well understood and genes that modulate cancer autophagy are not fully defined. To gain more insights into autophagy regulation in cancer, we performed a large-scale RNA interference screen in K562 human chronic myeloid leukemia cells using monodansylcadaverine staining, an autophagy-detecting approach equivalent to immunoblotting of the autophagy marker LC3B or fluorescence microscopy of GFP-LC3B. By coupling monodansylcadaverine staining with fluorescence-activated cell sorting, we successfully isolated autophagic K562 cells where we identified 336 short hairpin RNAs. After candidate validation using Cyto-ID fluorescence spectrophotometry, LC3B immunoblotting, and quantitative RT-PCR, 82 genes were identified as autophagy-regulating genes. 20 genes have been reported previously and the remaining 62 candidates are novel autophagy mediators. Bioinformatic analyses revealed that most candidate genes were involved in molecular pathways regulating autophagy, rather than directly participating in the autophagy process. Further autophagy flux assays revealed that 57 autophagy-regulating genes suppressed autophagy initiation, whereas 21 candidates promoted autophagy maturation. Our RNA interference screen identifies identified genes that regulate autophagy at different stages, which helps decode autophagy regulation in cancer and offers novel avenues to develop autophagy-related therapies for cancer.

  16. G20210A prothrombin gene mutation identified in patients with venous leg ulcers.

    PubMed

    Jebeleanu, G; Procopciuc, L

    2001-01-01

    The G20210A mutation variant of prothrombin gene is the second most frequent mutation identified in patients with deep venous thrombosis, after factor V Leiden. The risk for developing deep venous thrombosis is high in patients identified as heterozygous for G20210A mutation. In order to identify this polymorphism in the gene coding prothrombin, the 345bp fragment in the 3'- untranslated region of the prothrombin gene was amplified using amplification by polymerase chain reaction and enzymatic digestion by HindIII (restriction endonuclease enzyme). The products of amplification and enzymatic's digestion were analized using agarose gel electrophoresis. We investigated 20 patients with venous leg ulcers and we found 2 heterozygous (10%) for G20210A mutation. None of the patients in the control group had G20210A mutation. Our study confirms the presence of G20210A mutation in the Romanian population. Our study also shows the link between venous leg ulcers and this polymorphism in the prothrombin gene.

  17. Epidemiology and healthcare costs of incident Clostridium difficile infections identified in the outpatient healthcare setting.

    PubMed

    Kuntz, Jennifer L; Johnson, Eric S; Raebel, Marsha A; Petrik, Amanda F; Yang, Xiuhai; Thorp, Micah L; Spindel, Steven J; Neil, Nancy; Smith, David H

    2012-10-01

    To describe the epidemiology and healthcare costs of Clostridium difficile infection (CDI) identified in the outpatient setting. Population-based, retrospective cohort study. Kaiser Permanente Colorado and Kaiser Permanente Northwest members between June 1, 2005, and September 30, 2008. We identified persons with incident CDI and classified CDI by whether it was identified in the outpatient or inpatient healthcare setting. We collected information about baseline variables and follow-up healthcare utilization, costs, and outcomes among patients with CDI. We compared characteristics of patients with CDI identified in the outpatient versus inpatient setting. We identified 3,067 incident CDIs; 56% were identified in the outpatient setting. Few strong, independent predictors of diagnostic setting were identified, although a previous stay in a nonacute healthcare institution (odds ratio [OR], 1.45 [95% confidence interval (CI), 1.13-1.86]) was statistically associated with outpatient-identified CDI, as was age from 50 to 59 years (OR, 1.64 [95% CI, 1.18-2.29]), 60 to 69 years (OR, 1.37 [95% CI, 1.03-1.82]), and 70 to 79 years (OR, 1.36 [95% CI, 1.06-1.74]), when compared with persons aged 80-89 years. We found that more than one-half of incident CDIs in this population were identified in the outpatient setting. Patients with outpatient-identified CDI were younger with fewer comorbidities, although they frequently had previous exposure to healthcare. These data suggest that practitioners should be aware of CDI and obtain appropriate diagnostic testing on outpatients with CDI symptoms.

  18. A Novel Yeast Genomics Method for Identifying New Breast Cancer Susceptibility Genes

    DTIC Science & Technology

    2007-05-01

    find new candidate genes for breast cancer susceptibility in women and identifying these human genes can further improve monitoring and treatment...breast cancer susceptibility genes in humans that are currently unknown and not deducible from current methodologies. It is a fundamental...template to faithfully repair the broken strand. In human cancer it is loss of HR, rather than NHEJ, that is more important in increasing cancer

  19. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    PubMed Central

    Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean

    2006-01-01

    Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810

  20. Next-generation sequencing to identify candidate genes and develop diagnostic markers for a novel Phytophthora resistance gene, RpsHC18, in soybean.

    PubMed

    Zhong, Chao; Sun, Suli; Li, Yinping; Duan, Canxing; Zhu, Zhendong

    2018-03-01

    A novel Phytophthora sojae resistance gene RpsHC18 was identified and finely mapped on soybean chromosome 3. Two NBS-LRR candidate genes were identified and two diagnostic markers of RpsHC18 were developed. Phytophthora root rot caused by Phytophthora sojae is a destructive disease of soybean. The most effective disease-control strategy is to deploy resistant cultivars carrying Phytophthora-resistant Rps genes. The soybean cultivar Huachun 18 has a broad and distinct resistance spectrum to 12 P. sojae isolates. Quantitative trait loci sequencing (QTL-seq), based on the whole-genome resequencing (WGRS) of two extreme resistant and susceptible phenotype bulks from an F 2:3 population, was performed, and one 767-kb genomic region with ΔSNP-index ≥ 0.9 on chromosome 3 was identified as the RpsHC18 candidate region in Huachun 18. The candidate region was reduced to a 146-kb region by fine mapping. Nonsynonymous SNP and haplotype analyses were carried out in the 146-kb region among ten soybean genotypes using WGRS. Four specific nonsynonymous SNPs were identified in two nucleotide-binding sites-leucine-rich repeat (NBS-LRR) genes, RpsHC18-NBL1 and RpsHC18-NBL2, which were considered to be the candidate genes. Finally, one specific SNP marker in each candidate gene was successfully developed using a tetra-primer ARMS-PCR assay, and the two markers were verified to be specific for RpsHC18 and to effectively distinguish other known Rps genes. In this study, we applied an integrated genomic-based strategy combining WGRS with traditional genetic mapping to identify RpsHC18 candidate genes and develop diagnostic markers. These results suggest that next-generation sequencing is a precise, rapid and cost-effective way to identify candidate genes and develop diagnostic markers, and it can accelerate Rps gene cloning and marker-assisted selection for breeding of P. sojae-resistant soybean cultivars.

  1. Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting.

    PubMed

    Zhao, Wei; Ware, Erin B; He, Zihuai; Kardia, Sharon L R; Faul, Jessica D; Smith, Jennifer A

    2017-09-29

    Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) ( p = 0.07).

  2. Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting

    PubMed Central

    Zhao, Wei; He, Zihuai; Kardia, Sharon L. R.; Faul, Jessica D.

    2017-01-01

    Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) (p = 0.07). PMID:28961216

  3. Triplication of a four-gene set during evolution of the goat beta-globin locus produced three genes now expressed differentially during development.

    PubMed Central

    Townes, T M; Fitzgerald, M C; Lingrel, J B

    1984-01-01

    Distinct hemoglobins are synthesized in goats at different stages of development, similar to humans. Embryonic hemoglobins (zeta 2 epsilon 2 and alpha 2 epsilon 2) are synthesized initially and are followed sequentially by fetal (alpha 2 beta F2), preadult (alpha 2 beta C2), and adult (alpha 2 beta A2) hemoglobins. To help understand the basis of these switches, the genes of the beta-globin locus have been cloned and their linkage arrangement has been determined by the isolation of lambda phage carrying overlapping inserts of genomic goat DNA. The locus extends over 120 kilobase pairs and consists of 12 genes arranged in the following order: epsilon I-epsilon II-psi beta X-beta C-epsilon III-epsilon IV-psi beta Z-beta A-epsilon V-epsilon VI-psi beta Y-beta F. Comparison of the nucleotide sequence of the 12 genes shows that the locus is organized into three homologous four-gene sets that presumably evolved by the triplication of an ancestral set of four genes (epsilon-epsilon-psi beta-beta). Interestingly, the three genes (beta C, beta A, and beta F) located at the ends of the four-gene sets are expressed at different stages of development. Therefore, the goat beta F-, beta C-, and beta A-globin genes appear to have evolved by a mechanism that includes the triplication of 40-50 kilobase pairs of DNA and the recruitment of newly formed genes for expression in fetal, preadult, and adult life. PMID:6593719

  4. EviNet: a web platform for network enrichment analysis with flexible definition of gene sets.

    PubMed

    Jeggari, Ashwini; Alekseenko, Zhanna; Petrov, Iurii; Dias, José M; Ericson, Johan; Alexeyenko, Andrey

    2018-06-09

    The new web resource EviNet provides an easily run interface to network enrichment analysis for exploration of novel, experimentally defined gene sets. The major advantages of this analysis are (i) applicability to any genes found in the global network rather than only to those with pathway/ontology term annotations, (ii) ability to connect genes via different molecular mechanisms rather than within one high-throughput platform, and (iii) statistical power sufficient to detect enrichment of very small sets, down to individual genes. The users' gene sets are either defined prior to upload or derived interactively from an uploaded file by differential expression criteria. The pathways and networks used in the analysis can be chosen from the collection menu. The calculation is typically done within seconds or minutes and the stable URL is provided immediately. The results are presented in both visual (network graphs) and tabular formats using jQuery libraries. Uploaded data and analysis results are kept in separated project directories not accessible by other users. EviNet is available at https://www.evinet.org/.

  5. Exome sequencing of a large family identifies potential candidate genes contributing risk to bipolar disorder.

    PubMed

    Zhang, Tianxiao; Hou, Liping; Chen, David T; McMahon, Francis J; Wang, Jen-Chyong; Rice, John P

    2018-03-01

    Bipolar disorder is a mental illness with lifetime prevalence of about 1%. Previous genetic studies have identified multiple chromosomal linkage regions and candidate genes that might be associated with bipolar disorder. The present study aimed to identify potential susceptibility variants for bipolar disorder using 6 related case samples from a four-generation family. A combination of exome sequencing and linkage analysis was performed to identify potential susceptibility variants for bipolar disorder. Our study identified a list of five potential candidate genes for bipolar disorder. Among these five genes, GRID1(Glutamate Receptor Delta-1 Subunit), which was previously reported to be associated with several psychiatric disorders and brain related traits, is particularly interesting. Variants with functional significance in this gene were identified from two cousins in our bipolar disorder pedigree. Our findings suggest a potential role for these genes and the related rare variants in the onset and development of bipolar disorder in this one family. Additional research is needed to replicate these findings and evaluate their patho-biological significance. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. A Meta-Analysis of Multiple Matched Copy Number and Transcriptomics Data Sets for Inferring Gene Regulatory Relationships

    PubMed Central

    Newton, Richard; Wernisch, Lorenz

    2014-01-01

    Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247

  7. Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1.

    PubMed

    Gutiérrez, Rodrigo A; Stokes, Trevor L; Thum, Karen; Xu, Xiaodong; Obertello, Mariana; Katari, Manpreet S; Tanurdzic, Milos; Dean, Alexis; Nero, Damion C; McClung, C Robertson; Coruzzi, Gloria M

    2008-03-25

    Understanding how nutrients affect gene expression will help us to understand the mechanisms controlling plant growth and development as a function of nutrient availability. Nitrate has been shown to serve as a signal for the control of gene expression in Arabidopsis. There is also evidence, on a gene-by-gene basis, that downstream products of nitrogen (N) assimilation such as glutamate (Glu) or glutamine (Gln) might serve as signals of organic N status that in turn regulate gene expression. To identify genome-wide responses to such organic N signals, Arabidopsis seedlings were transiently treated with ammonium nitrate in the presence or absence of MSX, an inhibitor of glutamine synthetase, resulting in a block of Glu/Gln synthesis. Genes that responded to organic N were identified as those whose response to ammonium nitrate treatment was blocked in the presence of MSX. We showed that some genes previously identified to be regulated by nitrate are under the control of an organic N-metabolite. Using an integrated network model of molecular interactions, we uncovered a subnetwork regulated by organic N that included CCA1 and target genes involved in N-assimilation. We validated some of the predicted interactions and showed that regulation of the master clock control gene CCA1 by Glu or a Glu-derived metabolite in turn regulates the expression of key N-assimilatory genes. Phase response curve analysis shows that distinct N-metabolites can advance or delay the CCA1 phase. Regulation of CCA1 by organic N signals may represent a novel input mechanism for N-nutrients to affect plant circadian clock function.

  8. Weighted gene co-expression network analysis of expression data of monozygotic twins identifies specific modules and hub genes related to BMI.

    PubMed

    Wang, Weijing; Jiang, Wenjie; Hou, Lin; Duan, Haiping; Wu, Yili; Xu, Chunsheng; Tan, Qihua; Li, Shuxia; Zhang, Dongfeng

    2017-11-13

    The therapeutic management of obesity is challenging, hence further elucidating the underlying mechanisms of obesity development and identifying new diagnostic biomarkers and therapeutic targets are urgent and necessary. Here, we performed differential gene expression analysis and weighted gene co-expression network analysis (WGCNA) to identify significant genes and specific modules related to BMI based on gene expression profile data of 7 discordant monozygotic twins. In the differential gene expression analysis, it appeared that 32 differentially expressed genes (DEGs) were with a trend of up-regulation in twins with higher BMI when compared to their siblings. Categories of positive regulation of nitric-oxide synthase biosynthetic process, positive regulation of NF-kappa B import into nucleus, and peroxidase activity were significantly enriched within GO database and NF-kappa B signaling pathway within KEGG database. DEGs of NAMPT, TLR9, PTGS2, HBD, and PCSK1N might be associated with obesity. In the WGCNA, among the total 20 distinct co-expression modules identified, coral1 module (68 genes) had the strongest positive correlation with BMI (r = 0.56, P = 0.04) and disease status (r = 0.56, P = 0.04). Categories of positive regulation of phospholipase activity, high-density lipoprotein particle clearance, chylomicron remnant clearance, reverse cholesterol transport, intermediate-density lipoprotein particle, chylomicron, low-density lipoprotein particle, very-low-density lipoprotein particle, voltage-gated potassium channel complex, cholesterol transporter activity, and neuropeptide hormone activity were significantly enriched within GO database for this module. And alcoholism and cell adhesion molecules pathways were significantly enriched within KEGG database. Several hub genes, such as GAL, ASB9, NPPB, TBX2, IL17C, APOE, ABCG4, and APOC2 were also identified. The module eigengene of saddlebrown module (212 genes) was also significantly

  9. Analysis of global gene expression profiles to identify differentially expressed genes critical for embryo development in Brassica rapa.

    PubMed

    Zhang, Yu; Peng, Lifang; Wu, Ya; Shen, Yanyue; Wu, Xiaoming; Wang, Jianbo

    2014-11-01

    Embryo development represents a crucial developmental period in the life cycle of flowering plants. To gain insights into the genetic programs that control embryo development in Brassica rapa L., RNA sequencing technology was used to perform transcriptome profiling analysis of B. rapa developing embryos. The results generated 42,906,229 sequence reads aligned with 32,941 genes. In total, 27,760, 28,871, 28,384, and 25,653 genes were identified from embryos at globular, heart, early cotyledon, and mature developmental stages, respectively, and analysis between stages revealed a subset of stage-specific genes. We next investigated 9,884 differentially expressed genes with more than fivefold changes in expression and false discovery rate ≤ 0.001 from three adjacent-stage comparisons; 1,514, 3,831, and 6,633 genes were detected between globular and heart stage embryo libraries, heart stage and early cotyledon stage, and early cotyledon and mature stage, respectively. Large numbers of genes related to cellular process, metabolism process, response to stimulus, and biological process were expressed during the early and middle stages of embryo development. Fatty acid biosynthesis, biosynthesis of secondary metabolites, and photosynthesis-related genes were expressed predominantly in embryos at the middle stage. Genes for lipid metabolism and storage proteins were highly expressed in the middle and late stages of embryo development. We also identified 911 transcription factor genes that show differential expression across embryo developmental stages. These results increase our understanding of the complex molecular and cellular events during embryo development in B. rapa and provide a foundation for future studies on other oilseed crops.

  10. Germline whole exome sequencing and large-scale replication identifies FANCM as a likely high grade serous ovarian cancer susceptibility gene.

    PubMed

    Dicks, Ed; Song, Honglin; Ramus, Susan J; Oudenhove, Elke Van; Tyrer, Jonathan P; Intermaggio, Maria P; Kar, Siddhartha; Harrington, Patricia; Bowtell, David D; Group, Aocs Study; Cicek, Mine S; Cunningham, Julie M; Fridley, Brooke L; Alsop, Jennifer; Jimenez-Linan, Mercedes; Piskorz, Anna; Goranova, Teodora; Kent, Emma; Siddiqui, Nadeem; Paul, James; Crawford, Robin; Poblete, Samantha; Lele, Shashi; Sucheston-Campbell, Lara; Moysich, Kirsten B; Sieh, Weiva; McGuire, Valerie; Lester, Jenny; Odunsi, Kunle; Whittemore, Alice S; Bogdanova, Natalia; Dürst, Matthias; Hillemanns, Peter; Karlan, Beth Y; Gentry-Maharaj, Aleksandra; Menon, Usha; Tischkowitz, Marc; Levine, Douglas; Brenton, James D; Dörk, Thilo; Goode, Ellen L; Gayther, Simon A; Pharoah, D P Paul

    2017-08-01

    We analyzed whole exome sequencing data in germline DNA from 412 high grade serous ovarian cancer (HGSOC) cases from The Cancer Genome Atlas Project and identified 5,517 genes harboring a predicted deleterious germline coding mutation in at least one HGSOC case. Gene-set enrichment analysis showed enrichment for genes involved in DNA repair (p = 1.8×10 -3 ). Twelve DNA repair genes - APEX1, APLF, ATX, EME1, FANCL, FANCM, MAD2L2, PARP2, PARP3, POLN, RAD54L and SMUG1 - were prioritized for targeted sequencing in up to 3,107 HGSOC cases, 1,491 cases of other epithelial ovarian cancer (EOC) subtypes and 3,368 unaffected controls of European origin. We estimated mutation prevalence for each gene and tested for associations with disease risk. Mutations were identified in both cases and controls in all genes except MAD2L2 , where we found no evidence of mutations in controls. In FANCM we observed a higher mutation frequency in HGSOC cases compared to controls (29/3,107 cases, 0.96 percent; 13/3,368 controls, 0.38 percent; P=0.008) with little evidence for association with other subtypes (6/1,491, 0.40 percent; P=0.82). The relative risk of HGSOC associated with deleterious FANCM mutations was estimated to be 2.5 (95% CI 1.3 - 5.0; P=0.006). In summary, whole exome sequencing of EOC cases with large-scale replication in case-control studies has identified FANCM as a likely novel susceptibility gene for HGSOC, with mutations associated with a moderate increase in risk. These data may have clinical implications for risk prediction and prevention approaches for high-grade serous ovarian cancer in the future and a significant impact on reducing disease mortality.

  11. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

    PubMed

    Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

    2013-06-03

    Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

  12. Gene expression profiling to identify the toxicities and potentially relevant human disease outcomes associated with environmental heavy metal exposure.

    PubMed

    Korashy, Hesham M; Attafi, Ibraheem M; Famulski, Konrad S; Bakheet, Saleh A; Hafez, Mohammed M; Alsaad, Abdulaziz M S; Al-Ghadeer, Abdul Rahman M

    2017-02-01

    Heavy metals are the most commonly encountered toxic substances that increase susceptibility to various diseases after prolonged exposure. We have previously shown that healthy volunteers living near a mining area had significant contamination with heavy metals associated with significant changes in the expression of some detoxifying genes, xenobiotic metabolizing enzymes, and DNA repair genes. However, alterations of most of the molecular target genes associated with diseases are still unknown. Thus, the aims of this study were to (a) evaluate the gene expression profile and (b) identify the toxicities and potentially relevant human disease outcomes associated with long-term human exposure to environmental heavy metals in mining area using microarray analysis. For this purpose, 40 healthy male volunteers who were residents of a heavy metal-polluted area (Mahd Al-Dhahab city, Saudi Arabia) and 20 healthy male volunteers who were residents of a non-heavy metal-polluted area were included in the study. Total RNA was isolated from whole blood using PAXgene Blood RNA tubes and then reversed transcribed and hybridized to the gene array using the Affymetrix U219 GeneChip. Microarray analysis showed about 2129 genes were identified and differentially altered, among which a shared set of 425 genes was differentially expressed in the heavy metal-exposed groups. Ingenuity pathway analysis revealed that the most altered gene-regulated diseases in heavy metal-exposed groups included hematological and developmental disorders and mostly renal and urological diseases. Quantitative real-time polymerase chain reaction closely matched the microarray data for some genes tested. Importantly, changes in gene-related diseases were attributed to alterations in the genes encoded for protein synthesis. Renal and urological diseases were the diseases that were most frequently associated with the heavy metal-exposed group. Therefore, there is a need for further studies to validate these

  13. De novo transcriptome sequencing in Frankliniella occidentalis to identify genes involved in plant virus transmission and insecticide resistance.

    PubMed

    Zhang, Zhijun; Zhang, Pengjun; Li, Weidi; Zhang, Jinming; Huang, Fang; Yang, Jian; Bei, Yawei; Lu, Yaobin

    2013-05-01

    The western flower thrips (WFT), Frankliniella occidentalis, a world-wide invasive insect, causes agricultural damage by directly feeding and by indirectly vectoring Tospoviruses, such as Tomato spotted wilt virus (TSWV). We characterized the transcriptome of WFT and analyzed global gene expression of WFT response to TSWV infection using Illumina sequencing platform. We compiled 59,932 unigenes, and identified 36,339 unigenes by similarity analysis against public databases, most of which were annotated using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. Within these annotated transcripts, we collected 278 sequences related to insecticide resistance. GO and KEGG analysis of different expression genes between TSWV-infected and non-infected WFT population revealed that TSWV can regulate cellular process and immune response, which might lead to low virus titers in thrips cells and no detrimental effects on F. occidentalis. This data-set not only enriches genomic resource for WFT, but also benefits research into its molecular genetics and functional genomics. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention.

    PubMed

    Johnston, Iain G; Williams, Ben P

    2016-02-24

    Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    PubMed

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  16. Determining Semantically Related Significant Genes.

    PubMed

    Taha, Kamal

    2014-01-01

    GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.

  17. Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery.

    PubMed

    Jia, Zhilong; Liu, Ying; Guan, Naiyang; Bo, Xiaochen; Luo, Zhigang; Barnes, Michael R

    2016-05-27

    Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the "black box" nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate this using a psoriatic skin transcriptome, as an exemplar, and recover two widely used Psoriasis drugs (Methotrexate and Ciclosporin) with distinct modes of action. Cogena out-performs the results of Connectivity Map and NFFinder webservers in similar disease transcriptome analyses. Furthermore, we investigated the literature support for the other top-ranked compounds to treat psoriasis and showed how the outputs of cogena analysis can contribute new insight to support the progression of drugs into the clinic. We have made cogena freely available within Bioconductor or https://github.com/zhilongjia/cogena . In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and

  18. Coalitional game theory as a promising approach to identify candidate autism genes.

    PubMed

    Gupta, Anika; Sun, Min Woo; Paskov, Kelley Marie; Stockham, Nate Tyler; Jung, Jae-Yoon; Wall, Dennis Paul

    2018-01-01

    Despite mounting evidence for the strong role of genetics in the phenotypic manifestation of Autism Spectrum Disorder (ASD), the specific genes responsible for the variable forms of ASD remain undefined. ASD may be best explained by a combinatorial genetic model with varying epistatic interactions across many small effect mutations. Coalitional or cooperative game theory is a technique that studies the combined effects of groups of players, known as coalitions, seeking to identify players who tend to improve the performance--the relationship to a specific disease phenotype--of any coalition they join. This method has been previously shown to boost biologically informative signal in gene expression data but to-date has not been applied to the search for cooperative mutations among putative ASD genes. We describe our approach to highlight genes relevant to ASD using coalitional game theory on alteration data of 1,965 fully sequenced genomes from 756 multiplex families. Alterations were encoded into binary matrices for ASD (case) and unaffected (control) samples, indicating likely gene-disrupting, inherited mutations in altered genes. To determine individual gene contributions given an ASD phenotype, a "player" metric, referred to as the Shapley value, was calculated for each gene in the case and control cohorts. Sixty seven genes were found to have significantly elevated player scores and likely represent significant contributors to the genetic coordination underlying ASD. Using network and cross-study analysis, we found that these genes are involved in biological pathways known to be affected in the autism cases and that a subset directly interact with several genes known to have strong associations to autism. These findings suggest that coalitional game theory can be applied to large-scale genomic data to identify hidden yet influential players in complex polygenic disorders such as autism.

  19. An efficient graph theory based method to identify every minimal reaction set in a metabolic network

    PubMed Central

    2014-01-01

    Background Development of cells with minimal metabolic functionality is gaining importance due to their efficiency in producing chemicals and fuels. Existing computational methods to identify minimal reaction sets in metabolic networks are computationally expensive. Further, they identify only one of the several possible minimal reaction sets. Results In this paper, we propose an efficient graph theory based recursive optimization approach to identify all minimal reaction sets. Graph theoretical insights offer systematic methods to not only reduce the number of variables in math programming and increase its computational efficiency, but also provide efficient ways to find multiple optimal solutions. The efficacy of the proposed approach is demonstrated using case studies from Escherichia coli and Saccharomyces cerevisiae. In case study 1, the proposed method identified three minimal reaction sets each containing 38 reactions in Escherichia coli central metabolic network with 77 reactions. Analysis of these three minimal reaction sets revealed that one of them is more suitable for developing minimal metabolism cell compared to other two due to practically achievable internal flux distribution. In case study 2, the proposed method identified 256 minimal reaction sets from the Saccharomyces cerevisiae genome scale metabolic network with 620 reactions. The proposed method required only 4.5 hours to identify all the 256 minimal reaction sets and has shown a significant reduction (approximately 80%) in the solution time when compared to the existing methods for finding minimal reaction set. Conclusions Identification of all minimal reactions sets in metabolic networks is essential since different minimal reaction sets have different properties that effect the bioprocess development. The proposed method correctly identified all minimal reaction sets in a both the case studies. The proposed method is computationally efficient compared to other methods for finding minimal

  20. Transposon mutagenesis identifies genes that cooperate with mutant Pten in breast cancer progression

    PubMed Central

    Rangel, Roberto; Lee, Song-Choon; Hon-Kim Ban, Kenneth; Guzman-Rojas, Liliana; Mann, Michael B.; Newberg, Justin Y.; McNoe, Leslie A.; Selvanesan, Luxmanan; Ward, Jerrold M.; Rust, Alistair G.; Chin, Kuan-Yew; Black, Michael A.; Jenkins, Nancy A.; Copeland, Neal G.

    2016-01-01

    Triple-negative breast cancer (TNBC) has the worst prognosis of any breast cancer subtype. To better understand the genetic forces driving TNBC, we performed a transposon mutagenesis screen in a phosphatase and tensin homolog (Pten) mutant mice and identified 12 candidate trunk drivers and a much larger number of progression genes. Validation studies identified eight TNBC tumor suppressor genes, including the GATA-like transcriptional repressor TRPS1. Down-regulation of TRPS1 in TNBC cells promoted epithelial-to-mesenchymal transition (EMT) by deregulating multiple EMT pathway genes, in addition to increasing the expression of SERPINE1 and SERPINB2 and the subsequent migration, invasion, and metastasis of tumor cells. Transposon mutagenesis has thus provided a better understanding of the genetic forces driving TNBC and discovered genes with potential clinical importance in TNBC. PMID:27849608

  1. Identifying Novel Transcriptional and Epigenetic Features of Nuclear Lamina-associated Genes.

    PubMed

    Wu, Feinan; Yao, Jie

    2017-03-07

    Because a large portion of the mammalian genome is associated with the nuclear lamina (NL), it is interesting to study how native genes resided there are transcribed and regulated. In this study, we report unique transcriptional and epigenetic features of nearly 3,500 NL-associated genes (NL genes). Promoter regions of active NL genes are often excluded from NL-association, suggesting that NL-promoter interactions may repress transcription. Active NL genes with higher RNA polymerase II (Pol II) recruitment levels tend to display Pol II promoter-proximal pausing, while Pol II recruitment and Pol II pausing are not correlated among non-NL genes. At the genome-wide scale, NL-association and H3K27me3 distinguishes two large gene classes with low transcriptional activities. Notably, NL-association is anti-correlated with both transcription and active histone mark levels among genes not significantly enriched with H3K9me3 or H3K27me3, suggesting that NL-association may represent a novel gene repression pathway. Interestingly, an NL gene subgroup is not significantly enriched with H3K9me3 or H3K27me3 and is transcribed at higher levels than the rest of NL genes. Furthermore, we identified distal enhancers associated with active NL genes and reported their epigenetic features.

  2. Using reporter gene assays to identify cis regulatory differences between humans and chimpanzees.

    PubMed

    Chabot, Adrien; Shrit, Ralla A; Blekhman, Ran; Gilad, Yoav

    2007-08-01

    Most phenotypic differences between human and chimpanzee are likely to result from differences in gene regulation, rather than changes to protein-coding regions. To date, however, only a handful of human-chimpanzee nucleotide differences leading to changes in gene regulation have been identified. To hone in on differences in regulatory elements between human and chimpanzee, we focused on 10 genes that were previously found to be differentially expressed between the two species. We then designed reporter gene assays for the putative human and chimpanzee promoters of the 10 genes. Of seven promoters that we found to be active in human liver cell lines, human and chimpanzee promoters had significantly different activity in four cases, three of which recapitulated the gene expression difference seen in the microarray experiment. For these three genes, we were therefore able to demonstrate that a change in cis influences expression differences between humans and chimpanzees. Moreover, using site-directed mutagenesis on one construct, the promoter for the DDA3 gene, we were able to identify three nucleotides that together lead to a cis regulatory difference between the species. High-throughput application of this approach can provide a map of regulatory element differences between humans and our close evolutionary relatives.

  3. Genome-Wide Association Study in African Americans with Acute Respiratory Distress Syndrome Identifies the Selectin P Ligand Gene as a Risk Factor.

    PubMed

    Bime, Christian; Pouladi, Nima; Sammani, Saad; Batai, Ken; Casanova, Nancy; Zhou, Tong; Kempf, Carrie L; Sun, Xiaoguang; Camp, Sara M; Wang, Ting; Kittles, Rick A; Lussier, Yves A; Jones, Tiffanie K; Reilly, John P; Meyer, Nuala J; Christie, Jason D; Karnes, Jason H; Gonzalez-Garay, Manuel; Christiani, David C; Yates, Charles R; Wurfel, Mark M; Meduri, Gianfranco U; Garcia, Joe G N

    2018-06-01

    Genetic factors are involved in acute respiratory distress syndrome (ARDS) susceptibility. Identification of novel candidate genes associated with increased risk and severity will improve our understanding of ARDS pathophysiology and enhance efforts to develop novel preventive and therapeutic approaches. To identify genetic susceptibility targets for ARDS. A genome-wide association study was performed on 232 African American patients with ARDS and 162 at-risk control subjects. The Identify Candidate Causal SNPs and Pathways platform was used to infer the association of known gene sets with the top prioritized intragenic SNPs. Preclinical validation of SELPLG (selectin P ligand gene) was performed using mouse models of LPS- and ventilator-induced lung injury. Exonic variation within SELPLG distinguishing patients with ARDS from sepsis control subjects was confirmed in an independent cohort. Pathway prioritization analysis identified a nonsynonymous coding SNP (rs2228315) within SELPLG, encoding P-selectin glycoprotein ligand 1, to be associated with increased susceptibility. In an independent cohort, two exonic SELPLG SNPs were significantly associated with ARDS susceptibility. Additional support for SELPLG as an ARDS candidate gene was derived from preclinical ARDS models where SELPLG gene expression in lung tissues was significantly increased in both ventilator-induced (twofold increase) and LPS-induced (5.7-fold increase) murine lung injury models compared with controls. Furthermore, Selplg -/- mice exhibited significantly reduced LPS-induced inflammatory lung injury compared with wild-type C57/B6 mice. Finally, an antibody that neutralizes P-selectin glycoprotein ligand 1 significantly attenuated LPS-induced lung inflammation. These findings identify SELPLG as a novel ARDS susceptibility gene among individuals of European and African descent.

  4. Potential Susceptibility Loci Identified for Renal Cell Carcinoma by Targeting Obesity-Related Genes.

    PubMed

    Shu, Xiang; Purdue, Mark P; Ye, Yuanqing; Tu, Huakang; Wood, Christopher G; Tannir, Nizar M; Wang, Zhaoming; Albanes, Demetrius; Gapstur, Susan M; Stevens, Victoria L; Rothman, Nathaniel; Chanock, Stephen J; Wu, Xifeng

    2017-09-01

    Background: Obesity is an established risk factor for renal cell carcinoma (RCC). Although genome-wide association studies (GWAS) of RCC have identified several susceptibility loci, additional variants might be missed due to the highly conservative selection. Methods: We conducted a multiphase study utilizing three independent genome-wide scans at MD Anderson Cancer Center (MDA RCC GWAS and MDA RCC OncoArray) and National Cancer Institute (NCI RCC GWAS), which consisted of a total of 3,530 cases and 5,714 controls, to investigate genetic variations in obesity-related genes and RCC risk. Results: In the discovery phase, 32,946 SNPs located at ±10 kb of 2,001 obesity-related genes were extracted from MDA RCC GWAS and analyzed using multivariable logistic regression. Proxies ( R 2 > 0.8) were searched or imputation was performed if SNPs were not directly genotyped in the validation sets. Twenty-one SNPs with P < 0.05 in both MDA RCC GWAS and NCI RCC GWAS were subsequently evaluated in MDA RCC OncoArray. In the overall meta-analysis, significant ( P < 0.05) associations with RCC risk were observed for SNP mapping to IL1RAPL2 [rs10521506-G: OR meta = 0.87 (0.81-0.93), P meta = 2.33 × 10 -5 ], PLIN2 [rs2229536-A: OR meta = 0.87 (0.81-0.93), P meta = 2.33 × 10 -5 ], SMAD3 [rs4601989-A: OR meta = 0.86 (0.80-0.93), P meta = 2.71 × 10 -4 ], MED13L [rs10850596-A: OR meta = 1.14 (1.07-1.23), P meta = 1.50 × 10 -4 ], and TSC1 [rs3761840-G: OR meta = 0.90 (0.85-0.97), P meta = 2.47 × 10 -3 ]. We did not observe any significant cis-expression quantitative trait loci effect for these SNPs in the TCGA KIRC data. Conclusions: Taken together, we found that genetic variation of obesity-related genes could influence RCC susceptibility. Impact: The five identified loci may provide new insights into disease etiology that reveal importance of obesity-related genes in RCC development. Cancer Epidemiol Biomarkers Prev; 26(9); 1436-42. ©2017 AACR . ©2017 American Association for

  5. Comprehensive Analysis of Gene Expression Profiles of Sepsis-Induced Multiorgan Failure Identified Its Valuable Biomarkers.

    PubMed

    Wang, Yumei; Yin, Xiaoling; Yang, Fang

    2018-02-01

    Sepsis is an inflammatory-related disease, and severe sepsis would induce multiorgan dysfunction, which is the most common cause of death of patients in noncoronary intensive care units. Progression of novel therapeutic strategies has proven to be of little impact on the mortality of severe sepsis, and unfortunately, its mechanisms still remain poorly understood. In this study, we analyzed gene expression profiles of severe sepsis with failure of lung, kidney, and liver for the identification of potential biomarkers. We first downloaded the gene expression profiles from the Gene Expression Omnibus and performed preprocessing of raw microarray data sets and identification of differential expression genes (DEGs) through the R programming software; then, significantly enriched functions of DEGs in lung, kidney, and liver failure sepsis samples were obtained from the Database for Annotation, Visualization, and Integrated Discovery; finally, protein-protein interaction network was constructed for DEGs based on the STRING database, and network modules were also obtained through the MCODE cluster method. As a result, lung failure sepsis has the highest number of DEGs of 859, whereas the number of DEGs in kidney and liver failure sepsis samples is 178 and 175, respectively. In addition, 17 overlaps were obtained among the three lists of DEGs. Biological processes related to immune and inflammatory response were found to be significantly enriched in DEGs. Network and module analysis identified four gene clusters in which all or most of genes were upregulated. The expression changes of Icam1 and Socs3 were further validated through quantitative PCR analysis. This study should shed light on the development of sepsis and provide potential therapeutic targets for sepsis-induced multiorgan failure.

  6. GENE EXPRESSION PROFILING TO IDENTIFY MECHANISMS OF MALE REPRODUCTIVE TOXICITY

    EPA Science Inventory

    Gene Expression Profiling to Identify Mechanisms of Male Reproductive Toxicity
    David J. Dix
    National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA.
    Ab...

  7. Gene expression profiling following NRF2 and KEAP1 siRNA knockdown in human lung fibroblasts identifies CCL11/Eotaxin-1 as a novel NRF2 regulated gene.

    PubMed

    Fourtounis, Jimmy; Wang, I-Ming; Mathieu, Marie-Claude; Claveau, David; Loo, Tenneille; Jackson, Aimee L; Peters, Mette A; Therien, Alex G; Boie, Yves; Crackower, Michael A

    2012-10-12

    Oxidative Stress contributes to the pathogenesis of many diseases. The NRF2/KEAP1 axis is a key transcriptional regulator of the anti-oxidant response in cells. Nrf2 knockout mice have implicated this pathway in regulating inflammatory airway diseases such as asthma and COPD. To better understand the role the NRF2 pathway has on respiratory disease we have taken a novel approach to define NRF2 dependent gene expression in a relevant lung system. Normal human lung fibroblasts were transfected with siRNA specific for NRF2 or KEAP1. Gene expression changes were measured at 30 and 48 hours using a custom Affymetrix Gene array. Changes in Eotaxin-1 gene expression and protein secretion were further measured under various inflammatory conditions with siRNAs and pharmacological tools. An anti-correlated gene set (inversely regulated by NRF2 and KEAP1 RNAi) that reflects specific NRF2 regulated genes was identified. Gene annotations show that NRF2-mediated oxidative stress response is the most significantly regulated pathway, followed by heme metabolism, metabolism of xenobiotics by Cytochrome P450 and O-glycan biosynthesis. Unexpectedly the key eosinophil chemokine Eotaxin-1/CCL11 was found to be up-regulated when NRF2 was inhibited and down-regulated when KEAP1 was inhibited. This transcriptional regulation leads to modulation of Eotaxin-1 secretion from human lung fibroblasts under basal and inflammatory conditions, and is specific to Eotaxin-1 as NRF2 or KEAP1 knockdown had no effect on the secretion of a set of other chemokines and cytokines. Furthermore, the known NRF2 small molecule activators CDDO and Sulphoraphane can also dose dependently inhibit Eotaxin-1 release from human lung fibroblasts. These data uncover a previously unknown role for NRF2 in regulating Eotaxin-1 expression and further the mechanistic understanding of this pathway in modulating inflammatory lung disease.

  8. Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population

    PubMed Central

    Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

    2016-01-01

    Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. PMID:27189991

  9. Identifying genes affectng stress response in rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Genomic analyses have the potential to impact aquaculture production traits by identifying markers as proxies for traits which are expensive or difficult to measure and characterizing genetic variation and biochemical mechanisms underlying phenotypic variation. One such set of traits are the respon...

  10. DiRE: identifying distant regulatory elements of co-expressed genes

    PubMed Central

    Gotea, Valer; Ovcharenko, Ivan

    2008-01-01

    Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org. PMID:18487623

  11. A systems approach to identifying correlated gene targets for the loss of colour pigmentation in plants

    PubMed Central

    2011-01-01

    Background The numerous diverse metabolic pathways by which plant compounds can be produced make it difficult to predict how colour pigmentation is lost for different tissues and plants. This study employs mathematical and in silico methods to identify correlated gene targets for the loss of colour pigmentation in plants from a whole cell perspective based on the full metabolic network of Arabidopsis. This involves extracting a self-contained flavonoid subnetwork from the AraCyc database and calculating feasible metabolic routes or elementary modes (EMs) for it. Those EMs leading to anthocyanin compounds are taken to constitute the anthocyanin biosynthetic pathway (ABP) and their interplay with the rest of the EMs is used to study the minimal cut sets (MCSs), which are different combinations of reactions to block for eliminating colour pigmentation. By relating the reactions to their corresponding genes, the MCSs are used to explore the phenotypic roles of the ABP genes, their relevance to the ABP and the impact their eliminations would have on other processes in the cell. Results Simulation and prediction results of the effect of different MCSs for eliminating colour pigmentation correspond with existing experimental observations. Two examples are: i) two MCSs which require the simultaneous suppression of genes DFR and ANS to eliminate colour pigmentation, correspond to observational results of the same genes being co-regulated for eliminating floral pigmentation in Aquilegia and; ii) the impact of another MCS requiring CHS suppression, corresponds to findings where the suppression of the early gene CHS eliminated nearly all flavonoids but did not affect the production of volatile benzenoids responsible for floral scent. Conclusions From the various MCSs identified for eliminating colour pigmentation, several correlate to existing experimental observations, indicating that different MCSs are suitable for different plants, different cells, and different conditions

  12. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    PubMed Central

    2013-01-01

    Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509

  13. The SET1 Complex Selects Actively Transcribed Target Genes via Multivalent Interaction with CpG Island Chromatin.

    PubMed

    Brown, David A; Di Cerbo, Vincenzo; Feldmann, Angelika; Ahn, Jaewoo; Ito, Shinsuke; Blackledge, Neil P; Nakayama, Manabu; McClellan, Michael; Dimitrova, Emilia; Turberfield, Anne H; Long, Hannah K; King, Hamish W; Kriaucionis, Skirmantas; Schermelleh, Lothar; Kutateladze, Tatiana G; Koseki, Haruhiko; Klose, Robert J

    2017-09-05

    Chromatin modifications and the promoter-associated epigenome are important for the regulation of gene expression. However, the mechanisms by which chromatin-modifying complexes are targeted to the appropriate gene promoters in vertebrates and how they influence gene expression have remained poorly defined. Here, using a combination of live-cell imaging and functional genomics, we discover that the vertebrate SET1 complex is targeted to actively transcribed gene promoters through CFP1, which engages in a form of multivalent chromatin reading that involves recognition of non-methylated DNA and histone H3 lysine 4 trimethylation (H3K4me3). CFP1 defines SET1 complex occupancy on chromatin, and its multivalent interactions are required for the SET1 complex to place H3K4me3. In the absence of CFP1, gene expression is perturbed, suggesting that normal targeting and function of the SET1 complex are central to creating an appropriately functioning vertebrate promoter-associated epigenome. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. A data mining paradigm for identifying key factors in biological processes using gene expression data.

    PubMed

    Li, Jin; Zheng, Le; Uchiyama, Akihiko; Bin, Lianghua; Mauro, Theodora M; Elias, Peter M; Pawelczyk, Tadeusz; Sakowicz-Burkiewicz, Monika; Trzeciak, Magdalena; Leung, Donald Y M; Morasso, Maria I; Yu, Peng

    2018-06-13

    A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.

  15. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  16. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    PubMed Central

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases. PMID:25644381

  17. A conserved BDNF, glutamate- and GABA-enriched gene module related to human depression identified by coexpression meta-analysis and DNA variant genome-wide association studies.

    PubMed

    Chang, Lun-Ching; Jamain, Stephane; Lin, Chien-Wei; Rujescu, Dan; Tseng, George C; Sibille, Etienne

    2014-01-01

    Large scale gene expression (transcriptome) analysis and genome-wide association studies (GWAS) for single nucleotide polymorphisms have generated a considerable amount of gene- and disease-related information, but heterogeneity and various sources of noise have limited the discovery of disease mechanisms. As systematic dataset integration is becoming essential, we developed methods and performed meta-clustering of gene coexpression links in 11 transcriptome studies from postmortem brains of human subjects with major depressive disorder (MDD) and non-psychiatric control subjects. We next sought enrichment in the top 50 meta-analyzed coexpression modules for genes otherwise identified by GWAS for various sets of disorders. One coexpression module of 88 genes was consistently and significantly associated with GWAS for MDD, other neuropsychiatric disorders and brain functions, and for medical illnesses with elevated clinical risk of depression, but not for other diseases. In support of the superior discriminative power of this novel approach, we observed no significant enrichment for GWAS-related genes in coexpression modules extracted from single studies or in meta-modules using gene expression data from non-psychiatric control subjects. Genes in the identified module encode proteins implicated in neuronal signaling and structure, including glutamate metabotropic receptors (GRM1, GRM7), GABA receptors (GABRA2, GABRA4), and neurotrophic and development-related proteins [BDNF, reelin (RELN), Ephrin receptors (EPHA3, EPHA5)]. These results are consistent with the current understanding of molecular mechanisms of MDD and provide a set of putative interacting molecular partners, potentially reflecting components of a functional module across cells and biological pathways that are synchronously recruited in MDD, other brain disorders and MDD-related illnesses. Collectively, this study demonstrates the importance of integrating transcriptome data, gene coexpression modules

  18. GeneChip Resequencing of the Smallpox Virus Genome Can Identify Novel Strains: a Biodefense Application▿

    PubMed Central

    Sulaiman, Irshad M.; Tang, Kevin; Osborne, John; Sammons, Scott; Wohlhueter, Robert M.

    2007-01-01

    We developed a set of seven resequencing GeneChips, based on the complete genome sequences of 24 strains of smallpox virus (variola virus), for rapid characterization of this human-pathogenic virus. Each GeneChip was designed to analyze a divergent segment of approximately 30,000 bases of the smallpox virus genome. This study includes the hybridization results of 14 smallpox virus strains. Of the 14 smallpox virus strains hybridized, only 7 had sequence information included in the design of the smallpox virus resequencing GeneChips; similar information for the remaining strains was not tiled as a reference in these GeneChips. By use of variola virus-specific primers and long-range PCR, 22 overlapping amplicons were amplified to cover nearly the complete genome and hybridized with the smallpox virus resequencing GeneChip set. These GeneChips were successful in generating nucleotide sequences for all 14 of the smallpox virus strains hybridized. Analysis of the data indicated that the GeneChip resequencing by hybridization was fast and reproducible and that the smallpox virus resequencing GeneChips could differentiate the 14 smallpox virus strains characterized. This study also suggests that high-density resequencing GeneChips have potential biodefense applications and may be used as an alternate tool for rapid identification of smallpox virus in the future. PMID:17182757

  19. A Genome-Wide Knockout Screen to Identify Genes Involved in Acquired Carboplatin Resistance

    DTIC Science & Technology

    2016-07-01

    library screen to identify genes that when knocked out render human ovarian cells > 2.5-fold resistant to CBDCA; 2) Validate the ability of...a GeCKOv2 library screen to identify genes that when knocked out render human ovarian cells > 2.5-fold resistant to CBDCA; 2) validate the ability of...resistance in either cell lines or clinical samples. The CRIPSR-cas9 technology now provides us with a major new tool to introduce knock out mutations

  20. Identifying candidate driver genes by integrative ovarian cancer genomics data

    NASA Astrophysics Data System (ADS)

    Lu, Xinguo; Lu, Jibo

    2017-08-01

    Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.

  1. NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.

    PubMed

    Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun

    2017-09-21

    High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.

  2. Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network.

    PubMed

    Li, Zhandong; An, Lifeng; Li, Hao; Wang, ShaoPeng; Zhou, You; Yuan, Fei; Li, Lin

    2016-05-05

    Nasopharyngeal cancer or nasopharyngeal carcinoma (NPC) is the most common cancer originating in the nasopharynx. The factors that induce nasopharyngeal cancer are still not clear. Additional information about the chemicals or genes related to nasopharyngeal cancer will promote a better understanding of the pathogenesis of this cancer and the factors that induce it. Thus, a computational method NPC-RGCP was proposed in this study to identify the possible relevant chemicals and genes based on the presently known chemicals and genes related to nasopharyngeal cancer. To extensively utilize the functional associations between proteins and chemicals, a heterogeneous network was constructed based on interactions of proteins and chemicals. The NPC-RGCP included two stages: the searching stage and the screening stage. The former stage is for finding new possible genes and chemicals in the heterogeneous network, while the latter stage is for screening and removing false discoveries and selecting the core genes and chemicals. As a result, five putative genes, CXCR3, IRF1, CDK1, GSTP1, and CDH2, and seven putative chemicals, iron, propionic acid, dimethyl sulfoxide, isopropanol, erythrose 4-phosphate, β-D-Fructose 6-phosphate, and flavin adenine dinucleotide, were identified by NPC-RGCP. Extensive analyses provided confirmation that the putative genes and chemicals have significant associations with nasopharyngeal cancer.

  3. Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network

    PubMed Central

    Li, Zhandong; An, Lifeng; Li, Hao; Wang, ShaoPeng; Zhou, You; Yuan, Fei; Li, Lin

    2016-01-01

    Nasopharyngeal cancer or nasopharyngeal carcinoma (NPC) is the most common cancer originating in the nasopharynx. The factors that induce nasopharyngeal cancer are still not clear. Additional information about the chemicals or genes related to nasopharyngeal cancer will promote a better understanding of the pathogenesis of this cancer and the factors that induce it. Thus, a computational method NPC-RGCP was proposed in this study to identify the possible relevant chemicals and genes based on the presently known chemicals and genes related to nasopharyngeal cancer. To extensively utilize the functional associations between proteins and chemicals, a heterogeneous network was constructed based on interactions of proteins and chemicals. The NPC-RGCP included two stages: the searching stage and the screening stage. The former stage is for finding new possible genes and chemicals in the heterogeneous network, while the latter stage is for screening and removing false discoveries and selecting the core genes and chemicals. As a result, five putative genes, CXCR3, IRF1, CDK1, GSTP1, and CDH2, and seven putative chemicals, iron, propionic acid, dimethyl sulfoxide, isopropanol, erythrose 4-phosphate, β-D-Fructose 6-phosphate, and flavin adenine dinucleotide, were identified by NPC-RGCP. Extensive analyses provided confirmation that the putative genes and chemicals have significant associations with nasopharyngeal cancer. PMID:27149165

  4. A whole-blood transcriptome meta-analysis identifies gene expression signatures of cigarette smoking

    PubMed Central

    Huan, Tianxiao; Joehanes, Roby; Schurmann, Claudia; Schramm, Katharina; Pilling, Luke C.; Peters, Marjolein J.; Mägi, Reedik; DeMeo, Dawn; O'Connor, George T.; Ferrucci, Luigi; Teumer, Alexander; Homuth, Georg; Biffar, Reiner; Völker, Uwe; Herder, Christian; Waldenberger, Melanie; Peters, Annette; Zeilinger, Sonja; Metspalu, Andres; Hofman, Albert; Uitterlinden, André G.; Hernandez, Dena G.; Singleton, Andrew B.; Bandinelli, Stefania; Munson, Peter J.; Lin, Honghuang; Benjamin, Emelia J.; Esko, Tõnu; Grabe, Hans J.; Prokisch, Holger; van Meurs, Joyce B.J.; Melzer, David; Levy, Daniel

    2016-01-01

    Abstract Cigarette smoking is a leading modifiable cause of death worldwide. We hypothesized that cigarette smoking induces extensive transcriptomic changes that lead to target-organ damage and smoking-related diseases. We performed a meta-analysis of transcriptome-wide gene expression using whole blood-derived RNA from 10,233 participants of European ancestry in six cohorts (including 1421 current and 3955 former smokers) to identify associations between smoking and altered gene expression levels. At a false discovery rate (FDR) <0.1, we identified 1270 differentially expressed genes in current vs. never smokers, and 39 genes in former vs. never smokers. Expression levels of 12 genes remained elevated up to 30 years after smoking cessation, suggesting that the molecular consequence of smoking may persist for decades. Gene ontology analysis revealed enrichment of smoking-related genes for activation of platelets and lymphocytes, immune response, and apoptosis. Many of the top smoking-related differentially expressed genes, including LRRN3 and GPR15, have DNA methylation loci in promoter regions that were recently reported to be hypomethylated among smokers. By linking differential gene expression with smoking-related disease phenotypes, we demonstrated that stroke and pulmonary function show enrichment for smoking-related gene expression signatures. Mediation analysis revealed the expression of several genes (e.g. ALAS2) to be putative mediators of the associations between smoking and inflammatory biomarkers (IL6 and C-reactive protein levels). Our transcriptomic study provides potential insights into the effects of cigarette smoking on gene expression in whole blood and their relations to smoking-related diseases. The results of such analyses may highlight attractive targets for treating or preventing smoking-related health effects. PMID:28158590

  5. Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

    PubMed

    Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

    1998-08-01

    The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.

  6. Gene-environment interaction involving recently identified colorectal cancer susceptibility loci

    PubMed Central

    Kantor, Elizabeth D.; Hutter, Carolyn M.; Minnier, Jessica; Berndt, Sonja I.; Brenner, Hermann; Caan, Bette J.; Campbell, Peter T.; Carlson, Christopher S.; Casey, Graham; Chan, Andrew T.; Chang-Claude, Jenny; Chanock, Stephen J.; Cotterchio, Michelle; Du, Mengmeng; Duggan, David; Fuchs, Charles S.; Giovannucci, Edward L.; Gong, Jian; Harrison, Tabitha A.; Hayes, Richard B.; Henderson, Brian E.; Hoffmeister, Michael; Hopper, John L.; Jenkins, Mark A.; Jiao, Shuo; Kolonel, Laurence N.; Le Marchand, Loic; Lemire, Mathieu; Ma, Jing; Newcomb, Polly A.; Ochs-Balcom, Heather M.; Pflugeisen, Bethann M.; Potter, John D.; Rudolph, Anja; Schoen, Robert E.; Seminara, Daniela; Slattery, Martha L.; Stelling, Deanna L.; Thomas, Fridtjof; Thornquist, Mark; Ulrich, Cornelia M.; Warnick, Greg S.; Zanke, Brent W.; Peters, Ulrike; Hsu, Li; White, Emily

    2014-01-01

    BACKGROUND Genome-wide association studies have identified several single nucleotide polymorphisms (SNPs) that are associated with risk of colorectal cancer (CRC). Prior research has evaluated the presence of gene-environment interaction involving the first 10 identified susceptibility loci, but little work has been conducted on interaction involving SNPs at recently identified susceptibility loci, including: rs10911251, rs6691170, rs6687758, rs11903757, rs10936599, rs647161, rs1321311, rs719725, rs1665650, rs3824999, rs7136702, rs11169552, rs59336, rs3217810, rs4925386, and rs2423279. METHODS Data on 9160 cases and 9280 controls from the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) and Colon Cancer Family Registry (CCFR) were used to evaluate the presence of interaction involving the above-listed SNPs and sex, body mass index (BMI), alcohol consumption, smoking, aspirin use, post-menopausal hormone (PMH) use, as well as intake of dietary calcium, dietary fiber, dietary folate, red meat, processed meat, fruit, and vegetables. Interaction was evaluated using a fixed-effects meta-analysis of an efficient Empirical Bayes estimator, and permutation was used to account for multiple comparisons. RESULTS None of the permutation-adjusted p-values reached statistical significance. CONCLUSIONS The associations between recently identified genetic susceptibility loci and CRC are not strongly modified by sex, BMI, alcohol, smoking, aspirin, PMH use, and various dietary factors. IMPACT Results suggest no evidence of strong gene-environment interactions involving the recently identified 16 susceptibility loci for CRC taken one at a time. PMID:24994789

  7. Glutamatergic and GABAergic gene sets in attention-deficit/hyperactivity disorder: association to overlapping traits in ADHD and autism

    PubMed Central

    Naaijen, J; Bralten, J; Poelmans, G; Faraone, Stephen; Asherson, Philip; Banaschewski, Tobias; Buitelaar, Jan; Franke, Barbara; P Ebstein, Richard; Gill, Michael; Miranda, Ana; D Oades, Robert; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph; Sonuga-Barke, Edmund; Anney, Richard; Mulas, Fernando; Steinhausen, Hans-Christoph; Glennon, J C; Franke, B; Buitelaar, J K

    2017-01-01

    Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD. PMID:28072412

  8. Glutamatergic and GABAergic gene sets in attention-deficit/hyperactivity disorder: association to overlapping traits in ADHD and autism.

    PubMed

    Naaijen, J; Bralten, J; Poelmans, G; Glennon, J C; Franke, B; Buitelaar, J K

    2017-01-10

    Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD.

  9. The application of artificial intelligence to microarray data: identification of a novel gene signature to identify bladder cancer progression.

    PubMed

    Catto, James W F; Abbod, Maysam F; Wild, Peter J; Linkens, Derek A; Pilarsky, Christian; Rehman, Ishtiaq; Rosario, Derek J; Denzinger, Stefan; Burger, Maximilian; Stoehr, Robert; Knuechel, Ruth; Hartmann, Arndt; Hamdy, Freddie C

    2010-03-01

    New methods for identifying bladder cancer (BCa) progression are required. Gene expression microarrays can reveal insights into disease biology and identify novel biomarkers. However, these experiments produce large datasets that are difficult to interpret. To develop a novel method of microarray analysis combining two forms of artificial intelligence (AI): neurofuzzy modelling (NFM) and artificial neural networks (ANN) and validate it in a BCa cohort. We used AI and statistical analyses to identify progression-related genes in a microarray dataset (n=66 tumours, n=2800 genes). The AI-selected genes were then investigated in a second cohort (n=262 tumours) using immunohistochemistry. We compared the accuracy of AI and statistical approaches to identify tumour progression. AI identified 11 progression-associated genes (odds ratio [OR]: 0.70; 95% confidence interval [CI], 0.56-0.87; p=0.0004), and these were more discriminate than genes chosen using statistical analyses (OR: 1.24; 95% CI, 0.96-1.60; p=0.09). The expression of six AI-selected genes (LIG3, FAS, KRT18, ICAM1, DSG2, and BRCA2) was determined using commercial antibodies and successfully identified tumour progression (concordance index: 0.66; log-rank test: p=0.01). AI-selected genes were more discriminate than pathologic criteria at determining progression (Cox multivariate analysis: p=0.01). Limitations include the use of statistical correlation to identify 200 genes for AI analysis and that we did not compare regression identified genes with immunohistochemistry. AI and statistical analyses use different techniques of inference to determine gene-phenotype associations and identify distinct prognostic gene signatures that are equally valid. We have identified a prognostic gene signature whose members reflect a variety of carcinogenic pathways that could identify progression in non-muscle-invasive BCa. 2009 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  10. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  11. Salmonella Persistence in Tomatoes Requires a Distinct Set of Metabolic Functions Identified by Transposon Insertion Sequencing.

    PubMed

    de Moraes, Marcos H; Desai, Prerak; Porwollik, Steffen; Canals, Rocio; Perez, Daniel R; Chu, Weiping; McClelland, Michael; Teplitski, Max

    2017-03-01

    Human enteric pathogens, such as Salmonella spp. and verotoxigenic Escherichia coli , are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables. Persistence in plants represents an important part of the life cycle of these pathogens. The identification of the full complement of Salmonella genes involved in the colonization of the model plant (tomato) was carried out using transposon insertion sequencing analysis. With this approach, 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in fitness, followed by validation of the screen results using competition assays of the isogenic mutants against the wild type. A comparison with studies in animals revealed a distinct plant-associated set of genes, which only partially overlaps with the genes required to elicit disease in animals. De novo biosynthesis of amino acids was critical to persistence within tomatoes, while amino acid scavenging was prevalent in animal infections. Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant, which hyperaccumulates certain amino acids, suggesting that these nutrients remain unavailable to Salmonella spp. within plants. Salmonella lipopolysaccharide (LPS) was required for persistence in both animals and plants, exemplifying some shared pathogenesis-related mechanisms in animal and plant hosts. Similarly to phytopathogens, Salmonella spp. required biosynthesis of amino acids, LPS, and nucleotides to colonize tomatoes. Overall, however, it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions, colonization of tomatoes represents a distinct strategy, highlighting this pathogen's flexible metabolism. IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin, with tomatoes being

  12. Salmonella Persistence in Tomatoes Requires a Distinct Set of Metabolic Functions Identified by Transposon Insertion Sequencing

    PubMed Central

    Desai, Prerak; Porwollik, Steffen; Canals, Rocio; Perez, Daniel R.; Chu, Weiping; McClelland, Michael; Teplitski, Max

    2016-01-01

    ABSTRACT Human enteric pathogens, such as Salmonella spp. and verotoxigenic Escherichia coli, are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables. Persistence in plants represents an important part of the life cycle of these pathogens. The identification of the full complement of Salmonella genes involved in the colonization of the model plant (tomato) was carried out using transposon insertion sequencing analysis. With this approach, 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in fitness, followed by validation of the screen results using competition assays of the isogenic mutants against the wild type. A comparison with studies in animals revealed a distinct plant-associated set of genes, which only partially overlaps with the genes required to elicit disease in animals. De novo biosynthesis of amino acids was critical to persistence within tomatoes, while amino acid scavenging was prevalent in animal infections. Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant, which hyperaccumulates certain amino acids, suggesting that these nutrients remain unavailable to Salmonella spp. within plants. Salmonella lipopolysaccharide (LPS) was required for persistence in both animals and plants, exemplifying some shared pathogenesis-related mechanisms in animal and plant hosts. Similarly to phytopathogens, Salmonella spp. required biosynthesis of amino acids, LPS, and nucleotides to colonize tomatoes. Overall, however, it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions, colonization of tomatoes represents a distinct strategy, highlighting this pathogen's flexible metabolism. IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin, with tomatoes

  13. De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube.

    PubMed

    Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2016-01-01

    In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.

  14. Selection on plant male function genes identifies candidates for reproductive isolation of yellow monkeyflowers.

    PubMed

    Aagaard, Jan E; George, Renee D; Fishman, Lila; Maccoss, Michael J; Swanson, Willie J

    2013-01-01

    Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (Mimulus spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of M. guttatus flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of M. guttatus, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the

  15. Genome-wide gene by lead exposure interaction analysis identifies UNC5D as a candidate gene for neurodevelopment.

    PubMed

    Wang, Zhaoxi; Claus Henn, Birgit; Wang, Chaolong; Wei, Yongyue; Su, Li; Sun, Ryan; Chen, Han; Wagner, Peter J; Lu, Quan; Lin, Xihong; Wright, Robert; Bellinger, David; Kile, Molly; Mazumdar, Maitreyi; Tellez-Rojo, Martha Maria; Schnaas, Lourdes; Christiani, David C

    2017-07-28

    Neurodevelopment is a complex process involving both genetic and environmental factors. Prenatal exposure to lead (Pb) has been associated with lower performance on neurodevelopmental tests. Adverse neurodevelopmental outcomes are more frequent and/or more severe when toxic exposures interact with genetic susceptibility. To explore possible loci associated with increased susceptibility to prenatal Pb exposure, we performed a genome-wide gene-environment interaction study (GWIS) in young children from Mexico (n = 390) and Bangladesh (n = 497). Prenatal Pb exposure was estimated by cord blood Pb concentration. Neurodevelopment was assessed using the Bayley Scales of Infant Development. We identified a locus on chromosome 8, containing UNC5D, and demonstrated evidence of its genome-wide significance with mental composite scores (rs9642758, p meta  = 4.35 × 10 -6 ). Within this locus, the joint effects of two independent single nucleotide polymorphisms (SNPs, rs9642758 and rs10503970) had a p-value of 4.38 × 10 -9 for mental composite scores. Correlating GWIS results with in vitro transcriptomic profiles identified one common gene, SLC1A5, which is involved in synaptic function, neuronal development, and excitotoxicity. Further analysis revealed interconnected interactions that formed a large network of 52 genes enriched with oxidative stress genes and neurodevelopmental genes. Our findings suggest that certain genetic polymorphisms within/near genes relevant to neurodevelopment might modify the toxic effects of Pb exposure via oxidative stress.

  16. Sorghum Expressed Sequence Tags Identify Signature Genes for Drought, Pathogenesis, and Skotomorphogenesis from a Milestone Set of 16,801 Unique Transcripts1[w

    PubMed Central

    Pratt, Lee H.; Liang, Chun; Shah, Manish; Sun, Feng; Wang, Haiming; Reid, St. Patrick; Gingle, Alan R.; Paterson, Andrew H.; Wing, Rod; Dean, Ralph; Klein, Robert; Nguyen, Henry T.; Ma, Hong-mei; Zhao, Xin; Morishige, Daryl T.; Mullet, John E.; Cordonnier-Pratt, Marie-Michèle

    2005-01-01

    Improved knowledge of the sorghum transcriptome will enhance basic understanding of how plants respond to stresses and serve as a source of genes of value to agriculture. Toward this goal, Sorghum bicolor L. Moench cDNA libraries were prepared from light- and dark-grown seedlings, drought-stressed plants, Colletotrichum-infected seedlings and plants, ovaries, embryos, and immature panicles. Other libraries were prepared with meristems from Sorghum propinquum (Kunth) Hitchc. that had been photoperiodically induced to flower, and with rhizomes from S. propinquum and johnsongrass (Sorghum halepense L. Pers.). A total of 117,682 expressed sequence tags (ESTs) were obtained representing both 3′ and 5′ sequences from about half that number of cDNA clones. A total of 16,801 unique transcripts, representing tentative UniScripts (TUs), were identified from 55,783 3′ ESTs. Of these TUs, 9,032 are represented by two or more ESTs. Collectively, these libraries were predicted to contain a total of approximately 31,000 TUs. Individual libraries, however, were predicted to contain no more than about 6,000 to 9,000, with the exception of light-grown seedlings, which yielded an estimate of close to 13,000. In addition, each library exhibits about the same level of complexity with respect to both the number of TUs preferentially expressed in that library and the frequency with which two or more ESTs is found in only that library. These results indicate that the sorghum genome is expressed in highly selective fashion in the individual organs and in response to the environmental conditions surveyed here. Close to 2,000 differentially expressed TUs were identified among the cDNA libraries examined, of which 775 were differentially expressed at a confidence level of 98%. From these 775 TUs, signature genes were identified defining drought, Colletotrichum infection, skotomorphogenesis (etiolation), ovary, immature panicle, and embryo. PMID:16169961

  17. Preferential Allele Expression Analysis Identifies Shared Germline and Somatic Driver Genes in Advanced Ovarian Cancer

    PubMed Central

    Halabi, Najeeb M.; Martinez, Alejandra; Al-Farsi, Halema; Mery, Eliane; Puydenus, Laurence; Pujol, Pascal; Khalak, Hanif G.; McLurcan, Cameron; Ferron, Gwenael; Querleu, Denis; Al-Azwani, Iman; Al-Dous, Eman; Mohamoud, Yasmin A.; Malek, Joel A.; Rafii, Arash

    2016-01-01

    Identifying genes where a variant allele is preferentially expressed in tumors could lead to a better understanding of cancer biology and optimization of targeted therapy. However, tumor sample heterogeneity complicates standard approaches for detecting preferential allele expression. We therefore developed a novel approach combining genome and transcriptome sequencing data from the same sample that corrects for sample heterogeneity and identifies significant preferentially expressed alleles. We applied this analysis to epithelial ovarian cancer samples consisting of matched primary ovary and peritoneum and lymph node metastasis. We find that preferentially expressed variant alleles include germline and somatic variants, are shared at a relatively high frequency between patients, and are in gene networks known to be involved in cancer processes. Analysis at a patient level identifies patient-specific preferentially expressed alleles in genes that are targets for known drugs. Analysis at a site level identifies patterns of site specific preferential allele expression with similar pathways being impacted in the primary and metastasis sites. We conclude that genes with preferentially expressed variant alleles can act as cancer drivers and that targeting those genes could lead to new therapeutic strategies. PMID:26735499

  18. An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data.

    PubMed

    Liu, Jian; Cheng, Yuhu; Wang, Xuesong; Zhang, Lin; Liu, Hui

    2017-08-17

    It is urgent to diagnose colorectal cancer in the early stage. Some feature genes which are important to colorectal cancer development have been identified. However, for the early stage of colorectal cancer, less is known about the identity of specific cancer genes that are associated with advanced clinical stage. In this paper, we conducted a feature extraction method named Optimal Mean based Block Robust Feature Extraction method (OMBRFE) to identify feature genes associated with advanced colorectal cancer in clinical stage by using the integrated colorectal cancer data. Firstly, based on the optimal mean and L 2,1 -norm, a novel feature extraction method called Optimal Mean based Robust Feature Extraction method (OMRFE) is proposed to identify feature genes. Then the OMBRFE method which introduces the block ideology into OMRFE method is put forward to process the colorectal cancer integrated data which includes multiple genomic data: copy number alterations, somatic mutations, methylation expression alteration, as well as gene expression changes. Experimental results demonstrate that the OMBRFE is more effective than previous methods in identifying the feature genes. Moreover, genes identified by OMBRFE are verified to be closely associated with advanced colorectal cancer in clinical stage.

  19. Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population.

    PubMed

    Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

    2016-06-02

    Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

    PubMed

    Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

    2015-03-27

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. Copyright © 2015, American Association for the Advancement of Science.

  1. Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J

    2009-01-01

    Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less

  2. BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.

    PubMed

    Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong

    2015-10-19

    Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps

  3. MADGiC: a model-based approach for identifying driver genes in cancer

    PubMed Central

    Korthauer, Keegan D.; Kendziorski, Christina

    2015-01-01

    Motivation: Identifying and prioritizing somatic mutations is an important and challenging area of cancer research that can provide new insights into gene function as well as new targets for drug development. Most methods for prioritizing mutations rely primarily on frequency-based criteria, where a gene is identified as having a driver mutation if it is altered in significantly more samples than expected according to a background model. Although useful, frequency-based methods are limited in that all mutations are treated equally. It is well known, however, that some mutations have no functional consequence, while others may have a major deleterious impact. The spatial pattern of mutations within a gene provides further insight into their functional consequence. Properly accounting for these factors improves both the power and accuracy of inference. Also important is an accurate background model. Results: Here, we develop a Model-based Approach for identifying Driver Genes in Cancer (termed MADGiC) that incorporates both frequency and functional impact criteria and accommodates a number of factors to improve the background model. Simulation studies demonstrate advantages of the approach, including a substantial increase in power over competing methods. Further advantages are illustrated in an analysis of ovarian and lung cancer data from The Cancer Genome Atlas (TCGA) project. Availability and implementation: R code to implement this method is available at http://www.biostat.wisc.edu/ kendzior/MADGiC/. Contact: kendzior@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573922

  4. Experimentally-Derived Fibroblast Gene Signatures Identify Molecular Pathways Associated with Distinct Subsets of Systemic Sclerosis Patients in Three Independent Cohorts

    PubMed Central

    Johnson, Michael E.; Mahoney, J. Matthew; Taroni, Jaclyn; Sargent, Jennifer L.; Marmarelis, Eleni; Wu, Ming-Ru; Varga, John; Hinchcliff, Monique E.; Whitfield, Michael L.

    2015-01-01

    Genome-wide expression profiling in systemic sclerosis (SSc) has identified four ‘intrinsic’ subsets of disease (fibroproliferative, inflammatory, limited, and normal-like), each of which shows deregulation of distinct signaling pathways; however, the full set of pathways contributing to this differential gene expression has not been fully elucidated. Here we examine experimentally derived gene expression signatures in dermal fibroblasts for thirteen different signaling pathways implicated in SSc pathogenesis. These data show distinct and overlapping sets of genes induced by each pathway, allowing for a better understanding of the molecular relationship between profibrotic and immune signaling networks. Pathway-specific gene signatures were analyzed across a compendium of microarray datasets consisting of skin biopsies from three independent cohorts representing 80 SSc patients, 4 morphea, and 26 controls. IFNα signaling showed a strong association with early disease, while TGFβ signaling spanned the fibroproliferative and inflammatory subsets, was associated with worse MRSS, and was higher in lesional than non-lesional skin. The fibroproliferative subset was most strongly associated with PDGF signaling, while the inflammatory subset demonstrated strong activation of innate immune pathways including TLR signaling upstream of NF-κB. The limited and normal-like subsets did not show associations with fibrotic and inflammatory mediators such as TGFβ and TNFα. The normal-like subset showed high expression of genes associated with lipid signaling, which was absent in the inflammatory and limited subsets. Together, these data suggest a model by which IFNα is involved in early disease pathology, and disease severity is associated with active TGFβ signaling. PMID:25607805

  5. A Sleeping Beauty forward genetic screen identifies new genes and pathways driving osteosarcoma development and metastasis

    PubMed Central

    Moriarity, Branden S; Otto, George M; Rahrmann, Eric P; Rathe, Susan K; Wolf, Natalie K; Weg, Madison T; Manlove, Luke A; LaRue, Rebecca S; Temiz, Nuri A; Molyneux, Sam D; Choi, Kwangmin; Holly, Kevin J; Sarver, Aaron L; Scott, Milcah C; Forster, Colleen L; Modiano, Jaime F; Khanna, Chand; Hewitt, Stephen M; Khokha, Rama; Yang, Yi; Gorlick, Richard; Dyer, Michael A; Largaespada, David A

    2016-01-01

    Osteosarcomas are sarcomas of the bone, derived from osteoblasts or their precursors, with a high propensity to metastasize. Osteosarcoma is associated with massive genomic instability, making it problematic to identify driver genes using human tumors or prototypical mouse models, many of which involve loss of Trp53 function. To identify the genes driving osteosarcoma development and metastasis, we performed a Sleeping Beauty (SB) transposon-based forward genetic screen in mice with and without somatic loss of Trp53. Common insertion site (CIS) analysis of 119 primary tumors and 134 metastatic nodules identified 232 sites associated with osteosarcoma development and 43 sites associated with metastasis, respectively. Analysis of CIS-associated genes identified numerous known and new osteosarcoma-associated genes enriched in the ErbB, PI3K-AKT-mTOR and MAPK signaling pathways. Lastly, we identified several oncogenes involved in axon guidance, including Sema4d and Sema6d, which we functionally validated as oncogenes in human osteosarcoma. PMID:25961939

  6. Male specific genes from dioecious white campion identified by fluorescent differential display.

    PubMed

    Scutt, Charles P; Jenkins, Tom; Furuya, Masaki; Gilmartin, Philip M

    2002-05-01

    Fluorescent differential display (FDD) has been used to screen for cDNAs that are differentially up-regulated in male flowers of the dioecious plant Silene latifolia in which an X/Y chromosome system of sex determination operates. To adapt FDD to the cloning of large numbers of differential cDNAs, a novel method of confirming the differential expression of these has been devised. FDD gels were Southern electro-blotted and probed with mixtures of individual cDNA clones derived from different FDD product ligation reactions. These Southern blots were then stripped and re-probed with further mixtures of individual cloned FDD products to identify the maximum number of recombinant clones carrying the true differential amplification products. Of 135 differential bands identified by FDD, 56 differential amplification products were confirmed; these represent 23 unique differentially expressed genes as determined by virtual Northern analysis and two genes expressed at or below the level of detection by virtual Northern analysis. These two low expressed genes show bands of hybridization on genomic Southern blots that are specific to male plants, indicating that they are derived from, or closely related to, Y chromosome genes.

  7. The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

    PubMed Central

    Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth

    2006-01-01

    Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In

  8. Graphite Web: web tool for gene set analysis exploiting pathway topology

    PubMed Central

    Sales, Gabriele; Calura, Enrica; Martini, Paolo; Romualdi, Chiara

    2013-01-01

    Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills. Here we present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases. Graphite Web is freely available at http://graphiteweb.bio.unipd.it/. PMID:23666626

  9. Evaluation of endogenous control genes for gene expression studies across multiple tissues and in the specific sets of fat- and muscle-type samples of the pig.

    PubMed

    Gu, Y R; Li, M Z; Zhang, K; Chen, L; Jiang, A A; Wang, J Y; Li, X W

    2011-08-01

    To normalize a set of quantitative real-time PCR (q-PCR) data, it is essential to determine an optimal number/set of housekeeping genes, as the abundance of housekeeping genes can vary across tissues or cells during different developmental stages, or even under certain environmental conditions. In this study, of the 20 commonly used endogenous control genes, 13, 18 and 17 genes exhibited credible stability in 56 different tissues, 10 types of adipose tissue and five types of muscle tissue, respectively. Our analysis clearly showed that three optimal housekeeping genes are adequate for an accurate normalization, which correlated well with the theoretical optimal number (r ≥ 0.94). In terms of economical and experimental feasibility, we recommend the use of the three most stable housekeeping genes for calculating the normalization factor. Based on our results, the three most stable housekeeping genes in all analysed samples (TOP2B, HSPCB and YWHAZ) are recommended for accurate normalization of q-PCR data. We also suggest that two different sets of housekeeping genes are appropriate for 10 types of adipose tissue (the HSPCB, ALDOA and GAPDH genes) and five types of muscle tissue (the TOP2B, HSPCB and YWHAZ genes), respectively. Our report will serve as a valuable reference for other studies aimed at measuring tissue-specific mRNA abundance in porcine samples. © 2011 Blackwell Verlag GmbH.

  10. Exome Sequencing and Linkage Analysis Identified Novel Candidate Genes in Recessive Intellectual Disability Associated with Ataxia.

    PubMed

    Jazayeri, Roshanak; Hu, Hao; Fattahi, Zohreh; Musante, Luciana; Abedini, Seyedeh Sedigheh; Hosseini, Masoumeh; Wienker, Thomas F; Ropers, Hans Hilger; Najmabadi, Hossein; Kahrizi, Kimia

    2015-10-01

    Intellectual disability (ID) is a neuro-developmental disorder which causes considerable socio-economic problems. Some ID individuals are also affected by ataxia, and the condition includes different mutations affecting several genes. We used whole exome sequencing (WES) in combination with homozygosity mapping (HM) to identify the genetic defects in five consanguineous families among our cohort study, with two affected children with ID and ataxia as major clinical symptoms. We identified three novel candidate genes, RIPPLY1, MRPL10, SNX14, and a new mutation in known gene SURF1. All are autosomal genes, except RIPPLY1, which is located on the X chromosome. Two are housekeeping genes, implicated in transcription and translation regulation and intracellular trafficking, and two encode mitochondrial proteins. The pathogenesis of these variants was evaluated by mutation classification, bioinformatic methods, review of medical and biological relevance, co-segregation studies in the particular family, and a normal population study. Linkage analysis and exome sequencing of a small number of affected family members is a powerful new technique which can be used to decrease the number of candidate genes in heterogenic disorders such as ID, and may even identify the responsible gene(s).

  11. High-Throughput Screening to Identify Regulators of Meiosis-Specific Gene Expression in Saccharomyces cerevisiae.

    PubMed

    Kassir, Yona

    2017-01-01

    Meiosis and gamete formation are processes that are essential for sexual reproduction in all eukaryotic organisms. Multiple intracellular and extracellular signals feed into pathways that converge on transcription factors that induce the expression of meiosis-specific genes. Once triggered the meiosis-specific gene expression program proceeds in a cascade that drives progress through the events of meiosis and gamete formation. Meiosis-specific gene expression is tightly controlled by a balance of positive and negative regulatory factors that respond to a plethora of signaling pathways. The budding yeast Saccharomyces cerevisiae has proven to be an outstanding model for the dissection of gametogenesis owing to the sophisticated genetic manipulations that can be performed with the cells. It is possible to use a variety selection and screening methods to identify genes and their functions. High-throughput screening technology has been developed to allow an array of all viable yeast gene deletion mutants to be screened for phenotypes and for regulators of gene expression. This chapter describes a protocol that has been used to screen a library of homozygous diploid yeast deletion strains to identify regulators of the meiosis-specific IME1 gene.

  12. Transposon mutagenesis identifies genes and cellular processes driving epithelial-mesenchymal transition in hepatocellular carcinoma

    PubMed Central

    Kodama, Takahiro; Newberg, Justin Y.; Kodama, Michiko; Rangel, Roberto; Yoshihara, Kosuke; Tien, Jean C.; Parsons, Pamela H.; Wu, Hao; Finegold, Milton J.; Copeland, Neal G.; Jenkins, Nancy A.

    2016-01-01

    Epithelial-mesenchymal transition (EMT) is thought to contribute to metastasis and chemoresistance in patients with hepatocellular carcinoma (HCC), leading to their poor prognosis. The genes driving EMT in HCC are not yet fully understood, however. Here, we show that mobilization of Sleeping Beauty (SB) transposons in immortalized mouse hepatoblasts induces mesenchymal liver tumors on transplantation to nude mice. These tumors show significant down-regulation of epithelial markers, along with up-regulation of mesenchymal markers and EMT-related transcription factors (EMT-TFs). Sequencing of transposon insertion sites from tumors identified 233 candidate cancer genes (CCGs) that were enriched for genes and cellular processes driving EMT. Subsequent trunk driver analysis identified 23 CCGs that are predicted to function early in tumorigenesis and whose mutation or alteration in patients with HCC is correlated with poor patient survival. Validation of the top trunk drivers identified in the screen, including MET (MET proto-oncogene, receptor tyrosine kinase), GRB2-associated binding protein 1 (GAB1), HECT, UBA, and WWE domain containing 1 (HUWE1), lysine-specific demethylase 6A (KDM6A), and protein-tyrosine phosphatase, nonreceptor-type 12 (PTPN12), showed that deregulation of these genes activates an EMT program in human HCC cells that enhances tumor cell migration. Finally, deregulation of these genes in human HCC was found to confer sorafenib resistance through apoptotic tolerance and reduced proliferation, consistent with recent studies showing that EMT contributes to the chemoresistance of tumor cells. Our unique cell-based transposon mutagenesis screen appears to be an excellent resource for discovering genes involved in EMT in human HCC and potentially for identifying new drug targets. PMID:27247392

  13. Transcriptomic Analysis Using Olive Varieties and Breeding Progenies Identifies Candidate Genes Involved in Plant Architecture.

    PubMed

    González-Plaza, Juan J; Ortiz-Martín, Inmaculada; Muñoz-Mérida, Antonio; García-López, Carmen; Sánchez-Sevilla, José F; Luque, Francisco; Trelles, Oswaldo; Bejarano, Eduardo R; De La Rosa, Raúl; Valpuesta, Victoriano; Beuzón, Carmen R

    2016-01-01

    Plant architecture is a critical trait in fruit crops that can significantly influence yield, pruning, planting density and harvesting. Little is known about how plant architecture is genetically determined in olive, were most of the existing varieties are traditional with an architecture poorly suited for modern growing and harvesting systems. In the present study, we have carried out microarray analysis of meristematic tissue to compare expression profiles of olive varieties displaying differences in architecture, as well as seedlings from their cross pooled on the basis of their sharing architecture-related phenotypes. The microarray used, previously developed by our group has already been applied to identify candidates genes involved in regulating juvenile to adult transition in the shoot apex of seedlings. Varieties with distinct architecture phenotypes and individuals from segregating progenies displaying opposite architecture features were used to link phenotype to expression. Here, we identify 2252 differentially expressed genes (DEGs) associated to differences in plant architecture. Microarray results were validated by quantitative RT-PCR carried out on genes with functional annotation likely related to plant architecture. Twelve of these genes were further analyzed in individual seedlings of the corresponding pool. We also examined Arabidopsis mutants in putative orthologs of these targeted candidate genes, finding altered architecture for most of them. This supports a functional conservation between species and potential biological relevance of the candidate genes identified. This study is the first to identify genes associated to plant architecture in olive, and the results obtained could be of great help in future programs aimed at selecting phenotypes adapted to modern cultivation practices in this species.

  14. Specific PCR primers directed to identify cryI and cryIII genes within a Bacillus thuringiensis strain collection.

    PubMed Central

    Cerón, J; Ortíz, A; Quintero, R; Güereca, L; Bravo, A

    1995-01-01

    In this paper we describe a PCR strategy that can be used to rapidly identify Bacillus thuringiensis strains that harbor any of the known cryI or cryIII genes. Four general PCR primers which amplify DNA fragments from the known cryI or cryIII genes were selected from conserved regions. Once a strain was identified as an organism that contains a particular type of cry gene, it could be easily characterized by performing additional PCR with specific cryI and cryIII primers selected from variable regions. The method described in this paper can be used to identify the 10 different cryI genes and the five different cryIII genes. One feature of this screening method is that each cry gene is expected to produce a PCR product having a precise molecular weight. The genes which produce PCR products having different sizes probably represent strains that harbor a potentially novel cry gene. Finally, we present evidence that novel crystal genes can be identified by the method described in this paper. PMID:8526493

  15. Transcriptome Sequencing Identified Genes and Gene Ontologies Associated with Early Freezing Tolerance in Maize

    PubMed Central

    Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu

    2016-01-01

    Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095

  16. Digital transcriptome profiling of normal and glioblastoma-derived neural stem cells identifies genes associated with patient survival

    PubMed Central

    2012-01-01

    Background Glioblastoma multiforme, the most common type of primary brain tumor in adults, is driven by cells with neural stem (NS) cell characteristics. Using derivation methods developed for NS cells, it is possible to expand tumorigenic stem cells continuously in vitro. Although these glioblastoma-derived neural stem (GNS) cells are highly similar to normal NS cells, they harbor mutations typical of gliomas and initiate authentic tumors following orthotopic xenotransplantation. Here, we analyzed GNS and NS cell transcriptomes to identify gene expression alterations underlying the disease phenotype. Methods Sensitive measurements of gene expression were obtained by high-throughput sequencing of transcript tags (Tag-seq) on adherent GNS cell lines from three glioblastoma cases and two normal NS cell lines. Validation by quantitative real-time PCR was performed on 82 differentially expressed genes across a panel of 16 GNS and 6 NS cell lines. The molecular basis and prognostic relevance of expression differences were investigated by genetic characterization of GNS cells and comparison with public data for 867 glioma biopsies. Results Transcriptome analysis revealed major differences correlated with glioma histological grade, and identified misregulated genes of known significance in glioblastoma as well as novel candidates, including genes associated with other malignancies or glioma-related pathways. This analysis further detected several long non-coding RNAs with expression profiles similar to neighboring genes implicated in cancer. Quantitative PCR validation showed excellent agreement with Tag-seq data (median Pearson r = 0.91) and discerned a gene set robustly distinguishing GNS from NS cells across the 22 lines. These expression alterations include oncogene and tumor suppressor changes not detected by microarray profiling of tumor tissue samples, and facilitated the identification of a GNS expression signature strongly associated with patient survival (P = 1e

  17. De novo transcriptome sequencing in Bixa orellana to identify genes involved in methylerythritol phosphate, carotenoid and bixin biosynthesis

    DOE PAGES

    Cárdenas-Conejo, Yair; Carballo-Uicab, Víctor; Lieberman, Meric; ...

    2015-10-28

    Bixin or annatto is a commercially important natural orange-red pigment derived from lycopene that is produced and stored in seeds of Bixa orellana L. An enzymatic pathway for bixin biosynthesis was inferred from homology of putative proteins encoded by differentially expressed seed cDNAs. Some activities were later validated in a heterologous system. Nevertheless, much of the pathway remains to be clarified. For example, it is essential to identify the methylerythritol phosphate (MEP) and carotenoid pathways genes. In order to investigate the MEP, carotenoid, and bixin pathways genes, total RNA from young leaves and two different developmental stages of seeds frommore » B. orellana were used for the construction of indexed mRNA libraries, sequenced on the Illumina HiSeq 2500 platform and assembled de novo using Velvet, CLC Genomics Workbench and CAP3 software. A total of 52,549 contigs were obtained with average length of 1,924 bp. Two phylogenetic analyses of inferred proteins, in one case encoded by thirteen general, single-copy cDNAs, in the other from carotenoid and MEP cDNAs, indicated that B. orellana is closely related to sister Malvales species cacao and cotton. Using homology, we identified 7 and 14 core gene products from the MEP and carotenoid pathways, respectively. Surprisingly, previously defined bixin pathway cDNAs were not present in our transcriptome. Here we propose a new set of gene products involved in bixin pathway. In conclusion, the identification and qRT-PCR quantification of cDNAs involved in annatto production suggest a hypothetical model for bixin biosynthesis that involve coordinated activation of some MEP, carotenoid and bixin pathway genes. These findings provide a better understanding of the mechanisms regulating these pathways and will facilitate the genetic improvement of B. orellana.« less

  18. De novo transcriptome sequencing in Bixa orellana to identify genes involved in methylerythritol phosphate, carotenoid and bixin biosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cárdenas-Conejo, Yair; Carballo-Uicab, Víctor; Lieberman, Meric

    Bixin or annatto is a commercially important natural orange-red pigment derived from lycopene that is produced and stored in seeds of Bixa orellana L. An enzymatic pathway for bixin biosynthesis was inferred from homology of putative proteins encoded by differentially expressed seed cDNAs. Some activities were later validated in a heterologous system. Nevertheless, much of the pathway remains to be clarified. For example, it is essential to identify the methylerythritol phosphate (MEP) and carotenoid pathways genes. In order to investigate the MEP, carotenoid, and bixin pathways genes, total RNA from young leaves and two different developmental stages of seeds frommore » B. orellana were used for the construction of indexed mRNA libraries, sequenced on the Illumina HiSeq 2500 platform and assembled de novo using Velvet, CLC Genomics Workbench and CAP3 software. A total of 52,549 contigs were obtained with average length of 1,924 bp. Two phylogenetic analyses of inferred proteins, in one case encoded by thirteen general, single-copy cDNAs, in the other from carotenoid and MEP cDNAs, indicated that B. orellana is closely related to sister Malvales species cacao and cotton. Using homology, we identified 7 and 14 core gene products from the MEP and carotenoid pathways, respectively. Surprisingly, previously defined bixin pathway cDNAs were not present in our transcriptome. Here we propose a new set of gene products involved in bixin pathway. In conclusion, the identification and qRT-PCR quantification of cDNAs involved in annatto production suggest a hypothetical model for bixin biosynthesis that involve coordinated activation of some MEP, carotenoid and bixin pathway genes. These findings provide a better understanding of the mechanisms regulating these pathways and will facilitate the genetic improvement of B. orellana.« less

  19. Gene expression profiling following NRF2 and KEAP1 siRNA knockdown in human lung fibroblasts identifies CCL11/Eotaxin-1 as a novel NRF2 regulated gene

    PubMed Central

    2012-01-01

    Background Oxidative Stress contributes to the pathogenesis of many diseases. The NRF2/KEAP1 axis is a key transcriptional regulator of the anti-oxidant response in cells. Nrf2 knockout mice have implicated this pathway in regulating inflammatory airway diseases such as asthma and COPD. To better understand the role the NRF2 pathway has on respiratory disease we have taken a novel approach to define NRF2 dependent gene expression in a relevant lung system. Methods Normal human lung fibroblasts were transfected with siRNA specific for NRF2 or KEAP1. Gene expression changes were measured at 30 and 48 hours using a custom Affymetrix Gene array. Changes in Eotaxin-1 gene expression and protein secretion were further measured under various inflammatory conditions with siRNAs and pharmacological tools. Results An anti-correlated gene set (inversely regulated by NRF2 and KEAP1 RNAi) that reflects specific NRF2 regulated genes was identified. Gene annotations show that NRF2-mediated oxidative stress response is the most significantly regulated pathway, followed by heme metabolism, metabolism of xenobiotics by Cytochrome P450 and O-glycan biosynthesis. Unexpectedly the key eosinophil chemokine Eotaxin-1/CCL11 was found to be up-regulated when NRF2 was inhibited and down-regulated when KEAP1 was inhibited. This transcriptional regulation leads to modulation of Eotaxin-1 secretion from human lung fibroblasts under basal and inflammatory conditions, and is specific to Eotaxin-1 as NRF2 or KEAP1 knockdown had no effect on the secretion of a set of other chemokines and cytokines. Furthermore, the known NRF2 small molecule activators CDDO and Sulphoraphane can also dose dependently inhibit Eotaxin-1 release from human lung fibroblasts. Conclusions These data uncover a previously unknown role for NRF2 in regulating Eotaxin-1 expression and further the mechanistic understanding of this pathway in modulating inflammatory lung disease. PMID:23061798

  20. Gene Network for Identifying the Entropy Changes of Different Modules in Pediatric Sepsis.

    PubMed

    Yang, Jing; Zhang, Pingli; Wang, Lumin

    2016-01-01

    Pediatric sepsis is a disease that threatens life of children. The incidence of pediatric sepsis is higher in developing countries due to various reasons, such as insufficient immunization and nutrition, water and air pollution, etc. Exploring the potential genes via different methods is of significance for the prevention and treatment of pediatric sepsis. This study aimed to identify potential genes associated with pediatric sepsis utilizing analysis of gene network and entropy. The mRNA expression in the blood samples collected from 20 septic children and 30 healthy controls was quantified by using Affymetrix HG-U133A microarray. Two condition-specific protein-protein interaction networks (PINs), one for the healthy control and the other one for the children with sepsis, were deduced by combining the fundamental human PINs with gene expression profiles in the two phenotypes. Subsequently, distinct modules from the two conditional networks were extracted by adopting a maximal clique-merging approach. Delta entropy (ΔS) was calculated between sepsis and control modules. Then, key genes displaying changes in gene composition were identified by matching the control and sepsis modules. Two objective modules were obtained, in which ribosomal protein RPL4 and RPL9 as well as TOP2A were probably considered as the key genes differentiating sepsis from healthy controls. According to previous reports and this work, TOP2A is the potential gene therapy target for pediatric sepsis. The relationship between pediatric sepsis and RPL4 and RPL9 needs further investigation. © 2016 The Author(s) Published by S. Karger AG, Basel.

  1. Evolutionary analysis of vision genes identifies potential drivers of visual differences between giraffe and okapi

    PubMed Central

    Agaba, Morris; Cavener, Douglas R.

    2017-01-01

    Background The capacity of visually oriented species to perceive and respond to visual signal is integral to their evolutionary success. Giraffes are closely related to okapi, but the two species have broad range of phenotypic differences including their visual capacities. Vision studies rank giraffe’s visual acuity higher than all other artiodactyls despite sharing similar vision ecological determinants with many of them. The extent to which the giraffe’s unique visual capacity and its difference with okapi is reflected by changes in their vision genes is not understood. Methods The recent availability of giraffe and okapi genomes provided opportunity to identify giraffe and okapi vision genes. Multiple strategies were employed to identify thirty-six candidate mammalian vision genes in giraffe and okapi genomes. Quantification of selection pressure was performed by a combination of branch-site tests of positive selection and clade models of selection divergence through comparing giraffe and okapi vision genes and orthologous sequences from other mammals. Results Signatures of selection were identified in key genes that could potentially underlie giraffe and okapi visual adaptations. Importantly, some genes that contribute to optical transparency of the eye and those that are critical in light signaling pathway were found to show signatures of adaptive evolution or selection divergence. Comparison between giraffe and other ruminants identifies significant selection divergence in CRYAA and OPN1LW. Significant selection divergence was identified in SAG while positive selection was detected in LUM when okapi is compared with ruminants and other mammals. Sequence analysis of OPN1LW showed that at least one of the sites known to affect spectral sensitivity of the red pigment is uniquely divergent between giraffe and other ruminants. Discussion By taking a systemic approach to gene function in vision, the results provide the first molecular clues associated with

  2. Evolutionary analysis of vision genes identifies potential drivers of visual differences between giraffe and okapi.

    PubMed

    Ishengoma, Edson; Agaba, Morris; Cavener, Douglas R

    2017-01-01

    The capacity of visually oriented species to perceive and respond to visual signal is integral to their evolutionary success. Giraffes are closely related to okapi, but the two species have broad range of phenotypic differences including their visual capacities. Vision studies rank giraffe's visual acuity higher than all other artiodactyls despite sharing similar vision ecological determinants with many of them. The extent to which the giraffe's unique visual capacity and its difference with okapi is reflected by changes in their vision genes is not understood. The recent availability of giraffe and okapi genomes provided opportunity to identify giraffe and okapi vision genes. Multiple strategies were employed to identify thirty-six candidate mammalian vision genes in giraffe and okapi genomes. Quantification of selection pressure was performed by a combination of branch-site tests of positive selection and clade models of selection divergence through comparing giraffe and okapi vision genes and orthologous sequences from other mammals. Signatures of selection were identified in key genes that could potentially underlie giraffe and okapi visual adaptations. Importantly, some genes that contribute to optical transparency of the eye and those that are critical in light signaling pathway were found to show signatures of adaptive evolution or selection divergence. Comparison between giraffe and other ruminants identifies significant selection divergence in CRYAA and OPN1LW . Significant selection divergence was identified in SAG while positive selection was detected in LUM when okapi is compared with ruminants and other mammals. Sequence analysis of OPN1LW showed that at least one of the sites known to affect spectral sensitivity of the red pigment is uniquely divergent between giraffe and other ruminants. By taking a systemic approach to gene function in vision, the results provide the first molecular clues associated with giraffe and okapi vision adaptations. At

  3. A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records.

    PubMed

    Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter

    2014-09-24

    Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data

  4. High-Resolution Melting Curve Analysis of the 16S Ribosomal Gene to Detect and Identify Pathogenic and Saprophytic Leptospira species in Colombian Isolates

    PubMed Central

    Peláez Sánchez, Ronald G.; Quintero, Juan Álvaro López; Pereira, Martha María; Agudelo-Flórez, Piedad

    2017-01-01

    It is important to identify the circulating Leptospira agent to enhance the performance of serodiagnostic tests by incorporating specific antigens of native species, develop vaccines that take into account the species/serovars circulating in different regions, and optimize prevention and control strategies. The objectives of this study were to develop a polymerase chain reaction (PCR)–high-resolution melting (HRM) assay for differentiating between species of the genus Leptospira and to verify its usefulness in identifying unknown samples to species level. A set of primers from the initial region of the 16S ribosomal gene was designed to detect and differentiate the 22 species of Leptospira. Eleven reference strains were used as controls to establish the reference species and differential melting curves. Twenty-five Colombian Leptospira isolates were studied to evaluate the usefulness of the PCR–HRM assay in identifying unknown samples to species level. This identification was confirmed by sequencing and phylogenetic analysis of the 16S ribosomal gene. Eleven Leptospira species were successfully identified, except for Leptospira meyeri/Leptospira yanagawae because the sequences were 100% identical. The 25 isolates from humans, animals, and environmental water sources were identified as Leptospira santarosai (twelve), Leptospira interrogans (nine), and L. meyeri/L. yanagawae (four). The species verification was 100% concordant between PCR–HRM and phylogenetic analysis of the 16S ribosomal gene. The PCR–HRM assay designed in this study is a useful tool for identifying Leptospira species from isolates. PMID:28500802

  5. Genome-Wide and Gene-Based Meta-Analyses Identify Novel Loci Influencing Blood Pressure Response to Hydrochlorothiazide.

    PubMed

    Salvi, Erika; Wang, Zhiying; Rizzi, Federica; Gong, Yan; McDonough, Caitrin W; Padmanabhan, Sandosh; Hiltunen, Timo P; Lanzani, Chiara; Zaninello, Roberta; Chittani, Martina; Bailey, Kent R; Sarin, Antti-Pekka; Barcella, Matteo; Melander, Olle; Chapman, Arlene B; Manunta, Paolo; Kontula, Kimmo K; Glorioso, Nicola; Cusi, Daniele; Dominiczak, Anna F; Johnson, Julie A; Barlassina, Cristina; Boerwinkle, Eric; Cooper-DeHoff, Rhonda M; Turner, Stephen T

    2017-01-01

    This study aimed to identify novel loci influencing the antihypertensive response to hydrochlorothiazide monotherapy. A genome-wide meta-analysis of blood pressure (BP) response to hydrochlorothiazide was performed in 1739 white hypertensives from 6 clinical trials within the International Consortium for Antihypertensive Pharmacogenomics Studies, making it the largest study to date of its kind. No signals reached genome-wide significance (P<5×10 - 8 ), and the suggestive regions (P<10 -5 ) were cross-validated in 2 black cohorts treated with hydrochlorothiazide. In addition, a gene-based analysis was performed on candidate genes with previous evidence of involvement in diuretic response, in BP regulation, or in hypertension susceptibility. Using the genome-wide meta-analysis approach, with validation in blacks, we identified 2 suggestive regulatory regions linked to gap junction protein α1 gene (GJA1) and forkhead box A1 gene (FOXA1), relevant for cardiovascular and kidney function. With the gene-based approach, we identified hydroxy-delta-5-steroid dehydrogenase, 3 β- and steroid δ-isomerase 1 gene (HSD3B1) as significantly associated with BP response (P<2.28×10 - 4 ). HSD3B1 encodes the 3β-hydroxysteroid dehydrogenase enzyme and plays a crucial role in the biosynthesis of aldosterone and endogenous ouabain. By amassing all of the available pharmacogenomic studies of BP response to hydrochlorothiazide, and using 2 different analytic approaches, we identified 3 novel loci influencing BP response to hydrochlorothiazide. The gene-based analysis, never before applied to pharmacogenomics of antihypertensive drugs to our knowledge, provided a powerful strategy to identify a locus of interest, which was not identified in the genome-wide meta-analysis because of high allelic heterogeneity. These data pave the way for future investigations on new pathways and drug targets to enhance the current understanding of personalized antihypertensive treatment. © 2016

  6. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer.

    PubMed

    Wu, Lang; Shi, Wei; Long, Jirong; Guo, Xingyi; Michailidou, Kyriaki; Beesley, Jonathan; Bolla, Manjeet K; Shu, Xiao-Ou; Lu, Yingchang; Cai, Qiuyin; Al-Ejeh, Fares; Rozali, Esdy; Wang, Qin; Dennis, Joe; Li, Bingshan; Zeng, Chenjie; Feng, Helian; Gusev, Alexander; Barfield, Richard T; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Aronson, Kristan J; Auer, Paul L; Barrdahl, Myrto; Baynes, Caroline; Beckmann, Matthias W; Benitez, Javier; Bermisheva, Marina; Blomqvist, Carl; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Brinton, Louise; Broberg, Per; Brucker, Sara Y; Burwinkel, Barbara; Caldés, Trinidad; Canzian, Federico; Carter, Brian D; Castelao, J Esteban; Chang-Claude, Jenny; Chen, Xiaoqing; Cheng, Ting-Yuan David; Christiansen, Hans; Clarke, Christine L; Collée, Margriet; Cornelissen, Sten; Couch, Fergus J; Cox, David; Cox, Angela; Cross, Simon S; Cunningham, Julie M; Czene, Kamila; Daly, Mary B; Devilee, Peter; Doheny, Kimberly F; Dörk, Thilo; Dos-Santos-Silva, Isabel; Dumont, Martine; Dwek, Miriam; Eccles, Diana M; Eilber, Ursula; Eliassen, A Heather; Engel, Christoph; Eriksson, Mikael; Fachal, Laura; Fasching, Peter A; Figueroa, Jonine; Flesch-Janys, Dieter; Fletcher, Olivia; Flyger, Henrik; Fritschi, Lin; Gabrielson, Marike; Gago-Dominguez, Manuela; Gapstur, Susan M; García-Closas, Montserrat; Gaudet, Mia M; Ghoussaini, Maya; Giles, Graham G; Goldberg, Mark S; Goldgar, David E; González-Neira, Anna; Guénel, Pascal; Hahnen, Eric; Haiman, Christopher A; Håkansson, Niclas; Hall, Per; Hallberg, Emily; Hamann, Ute; Harrington, Patricia; Hein, Alexander; Hicks, Belynda; Hillemanns, Peter; Hollestelle, Antoinette; Hoover, Robert N; Hopper, John L; Huang, Guanmengqian; Humphreys, Keith; Hunter, David J; Jakubowska, Anna; Janni, Wolfgang; John, Esther M; Johnson, Nichola; Jones, Kristine; Jones, Michael E; Jung, Audrey; Kaaks, Rudolf; Kerin, Michael J; Khusnutdinova, Elza; Kosma, Veli-Matti; Kristensen, Vessela N; Lambrechts, Diether; Le Marchand, Loic; Li, Jingmei; Lindström, Sara; Lissowska, Jolanta; Lo, Wing-Yee; Loibl, Sibylle; Lubinski, Jan; Luccarini, Craig; Lux, Michael P; MacInnis, Robert J; Maishman, Tom; Kostovska, Ivana Maleva; Mannermaa, Arto; Manson, JoAnn E; Margolin, Sara; Mavroudis, Dimitrios; Meijers-Heijboer, Hanne; Meindl, Alfons; Menon, Usha; Meyer, Jeffery; Mulligan, Anna Marie; Neuhausen, Susan L; Nevanlinna, Heli; Neven, Patrick; Nielsen, Sune F; Nordestgaard, Børge G; Olopade, Olufunmilayo I; Olson, Janet E; Olsson, Håkan; Peterlongo, Paolo; Peto, Julian; Plaseska-Karanfilska, Dijana; Prentice, Ross; Presneau, Nadege; Pylkäs, Katri; Rack, Brigitte; Radice, Paolo; Rahman, Nazneen; Rennert, Gad; Rennert, Hedy S; Rhenius, Valerie; Romero, Atocha; Romm, Jane; Rudolph, Anja; Saloustros, Emmanouil; Sandler, Dale P; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Schneeweiss, Andreas; Scott, Rodney J; Scott, Christopher G; Seal, Sheila; Shah, Mitul; Shrubsole, Martha J; Smeets, Ann; Southey, Melissa C; Spinelli, John J; Stone, Jennifer; Surowy, Harald; Swerdlow, Anthony J; Tamimi, Rulla M; Tapper, William; Taylor, Jack A; Terry, Mary Beth; Tessier, Daniel C; Thomas, Abigail; Thöne, Kathrin; Tollenaar, Rob A E M; Torres, Diana; Truong, Thérèse; Untch, Michael; Vachon, Celine; Van Den Berg, David; Vincent, Daniel; Waisfisz, Quinten; Weinberg, Clarice R; Wendt, Camilla; Whittemore, Alice S; Wildiers, Hans; Willett, Walter C; Winqvist, Robert; Wolk, Alicja; Xia, Lucy; Yang, Xiaohong R; Ziogas, Argyrios; Ziv, Elad; Dunning, Alison M; Pharoah, Paul D P; Simard, Jacques; Milne, Roger L; Edwards, Stacey L; Kraft, Peter; Easton, Douglas F; Chenevix-Trench, Georgia; Zheng, Wei

    2018-06-18

    The breast cancer risk variants identified in genome-wide association studies explain only a small fraction of the familial relative risk, and the genes responsible for these associations remain largely unknown. To identify novel risk loci and likely causal genes, we performed a transcriptome-wide association study evaluating associations of genetically predicted gene expression with breast cancer risk in 122,977 cases and 105,974 controls of European ancestry. We used data from the Genotype-Tissue Expression Project to establish genetic models to predict gene expression in breast tissue and evaluated model performance using data from The Cancer Genome Atlas. Of the 8,597 genes evaluated, significant associations were identified for 48 at a Bonferroni-corrected threshold of P < 5.82 × 10 -6 , including 14 genes at loci not yet reported for breast cancer. We silenced 13 genes and showed an effect for 11 on cell proliferation and/or colony-forming efficiency. Our study provides new insights into breast cancer genetics and biology.

  7. Identifying signatures of positive selection in pigmentation genes in two South Asian populations.

    PubMed

    Jonnalagadda, Manjari; Bharti, Neeraj; Patil, Yatish; Ozarkar, Shantanu; K, Sunitha Manjari; Joshi, Rajendra; Norton, Heather

    2017-09-10

    Skin pigmentation is a polygenic trait showing wide phenotypic variations among global populations. While numerous pigmentation genes have been identified to be under positive selection among European and East populations, genes contributing to phenotypic variation in skin pigmentation within and among South Asian populations are still poorly understood. The present study uses data from the Phase 3 of the 1000 genomes project focusing on two South Asian populations-GIH (Gujarati Indian from Houston, Texas) and ITU (Indian Telugu from UK), so as to decode the genetic architecture involved in adaptation to ultraviolet radiation in South Asian populations. Statistical tests included were (1) tests to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), (2) tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb), and (3) tests based on genetic differentiation between populations (LSBL). Twenty-two pigmentation genes fall in the top 1% for at least one statistic in the GIH population, 5 of which (LYST, OCA2, SLC24A5, SLC45A2, and TYR) have been previously associated with normal variation in skin, hair, or eye color. In comparison, 17 genes fall in the top 1% for at least one statistic in the ITU population. Twelve loci which are identified as outliers in the ITU scan were also identified in the GIH population. These results suggest that selection may have affected these loci broadly across the region. © 2017 Wiley Periodicals, Inc.

  8. Transcriptomic Analysis Using Olive Varieties and Breeding Progenies Identifies Candidate Genes Involved in Plant Architecture

    PubMed Central

    González-Plaza, Juan J.; Ortiz-Martín, Inmaculada; Muñoz-Mérida, Antonio; García-López, Carmen; Sánchez-Sevilla, José F.; Luque, Francisco; Trelles, Oswaldo; Bejarano, Eduardo R.; De La Rosa, Raúl; Valpuesta, Victoriano; Beuzón, Carmen R.

    2016-01-01

    Plant architecture is a critical trait in fruit crops that can significantly influence yield, pruning, planting density and harvesting. Little is known about how plant architecture is genetically determined in olive, were most of the existing varieties are traditional with an architecture poorly suited for modern growing and harvesting systems. In the present study, we have carried out microarray analysis of meristematic tissue to compare expression profiles of olive varieties displaying differences in architecture, as well as seedlings from their cross pooled on the basis of their sharing architecture-related phenotypes. The microarray used, previously developed by our group has already been applied to identify candidates genes involved in regulating juvenile to adult transition in the shoot apex of seedlings. Varieties with distinct architecture phenotypes and individuals from segregating progenies displaying opposite architecture features were used to link phenotype to expression. Here, we identify 2252 differentially expressed genes (DEGs) associated to differences in plant architecture. Microarray results were validated by quantitative RT-PCR carried out on genes with functional annotation likely related to plant architecture. Twelve of these genes were further analyzed in individual seedlings of the corresponding pool. We also examined Arabidopsis mutants in putative orthologs of these targeted candidate genes, finding altered architecture for most of them. This supports a functional conservation between species and potential biological relevance of the candidate genes identified. This study is the first to identify genes associated to plant architecture in olive, and the results obtained could be of great help in future programs aimed at selecting phenotypes adapted to modern cultivation practices in this species. PMID:26973682

  9. Robust Principal Component Analysis Regularized by Truncated Nuclear Norm for Identifying Differentially Expressed Genes.

    PubMed

    Wang, Ya-Xuan; Gao, Ying-Lian; Liu, Jin-Xing; Kong, Xiang-Zhen; Li, Hai-Jun

    2017-09-01

    Identifying differentially expressed genes from the thousands of genes is a challenging task. Robust principal component analysis (RPCA) is an efficient method in the identification of differentially expressed genes. RPCA method uses nuclear norm to approximate the rank function. However, theoretical studies showed that the nuclear norm minimizes all singular values, so it may not be the best solution to approximate the rank function. The truncated nuclear norm is defined as the sum of some smaller singular values, which may achieve a better approximation of the rank function than nuclear norm. In this paper, a novel method is proposed by replacing nuclear norm of RPCA with the truncated nuclear norm, which is named robust principal component analysis regularized by truncated nuclear norm (TRPCA). The method decomposes the observation matrix of genomic data into a low-rank matrix and a sparse matrix. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Thus, the differentially expressed genes can be identified according to the sparse matrix. The experimental results on The Cancer Genome Atlas data illustrate that the TRPCA method outperforms other state-of-the-art methods in the identification of differentially expressed genes.

  10. Genome-Wide Association Study Identifies Candidate Genes That Affect Plant Height in Chinese Elite Maize (Zea mays L.) Inbred Lines

    PubMed Central

    Wang, Jianjun; Liu, Changlin; Li, Mingshun; Zhang, Degui; Bai, Li; Zhang, Shihuang; Li, Xinhai

    2011-01-01

    Background The harvest index for many crops can be improved through introduction of dwarf stature to increase lodging resistance, combined with early maturity. The inbred line Shen5003 has been widely used in maize breeding in China as a key donor line for the dwarf trait. Also, one major quantitative trait locus (QTL) controlling plant height has been identified in bin 5.05–5.06, across several maize bi-parental populations. With the progress of publicly available maize genome sequence, the objective of this work was to identify the candidate genes that affect plant height among Chinese maize inbred lines with genome wide association studies (GWAS). Methods and Findings A total of 284 maize inbred lines were genotyped using over 55,000 evenly spaced SNPs, from which a set of 41,101 SNPs were filtered with stringent quality control for further data analysis. With the population structure controlled in a mixed linear model (MLM) implemented with the software TASSEL, we carried out a genome-wide association study (GWAS) for plant height. A total of 204 SNPs (P≤0.0001) and 105 genomic loci harboring coding regions were identified. Four loci containing genes associated with gibberellin (GA), auxin, and epigenetic pathways may be involved in natural variation that led to a dwarf phenotype in elite maize inbred lines. Among them, a favorable allele for dwarfing on chromosome 5 (SNP PZE-105115518) was also identified in six Shen5003 derivatives. Conclusions The fact that a large number of previously identified dwarf genes are missing from our study highlights the discovery of the consistently significant association of the gene harboring the SNP PZE-105115518 with plant height (P = 8.91e-10) and its confirmation in the Shen5003 introgression lines. Results from this study suggest that, in the maize breeding schema in China, specific alleles were selected, that have played important roles in maize production. PMID:22216221

  11. Understanding Setting Events: What They Are and How to Identify Them

    ERIC Educational Resources Information Center

    Iovannone, Rose; Anderson, Cynthia; Scott, Terrance

    2017-01-01

    A functional behavior assessment is a process for identifying events in the environment that reliably precede (i.e., antecedents) and follow (i.e., consequences) problem behavior. This information is used to develop an intervention plan. There are two types of antecedents--triggers and setting events. Triggers are antecedent events that happen…

  12. DNA methylome profiling identifies novel methylated genes in African American patients with colorectal neoplasia.

    PubMed

    Ashktorab, Hassan; Daremipouran, M; Goel, Ajay; Varma, Sudhir; Leavitt, R; Sun, Xueguang; Brim, Hassan

    2014-04-01

    The identification of genes that are differentially methylated in colorectal cancer (CRC) has potential value for both diagnostic and therapeutic interventions specifically in high-risk populations such as African Americans (AAs). However, DNA methylation patterns in CRC, especially in AAs, have not been systematically explored and remain poorly understood. Here, we performed DNA methylome profiling to identify the methylation status of CpG islands within candidate genes involved in critical pathways important in the initiation and development of CRC. We used reduced representation bisulfite sequencing (RRBS) in colorectal cancer and adenoma tissues that were compared with DNA methylome from a healthy AA subject's colon tissue and peripheral blood DNA. The identified methylation markers were validated in fresh frozen CRC tissues and corresponding normal tissues from AA patients diagnosed with CRC at Howard University Hospital. We identified and validated the methylation status of 355 CpG sites located within 16 gene promoter regions associated with CpG islands. Fifty CpG sites located within CpG islands-in genes ATXN7L1 (2), BMP3 (7), EID3 (15), GAS7 (1), GPR75 (24), and TNFAIP2 (1)-were significantly hypermethylated in tumor vs. normal tissues (P<0.05). The methylation status of BMP3, EID3, GAS7, and GPR75 was confirmed in an independent, validation cohort. Ingenuity pathway analysis mapped three of these markers (GAS7, BMP3 and GPR) in the insulin and TGF-β1 network-the two key pathways in CRC. In addition to hypermethylated genes, our analysis also revealed that LINE-1 repeat elements were progressively hypomethylated in the normal-adenoma-cancer sequence. We conclude that DNA methylome profiling based on RRBS is an effective method for screening aberrantly methylated genes in CRC. While previous studies focused on the limited identification of hypermethylated genes, ours is the first study to systematically and comprehensively identify novel hypermethylated

  13. Reliable pre-eclampsia pathways based on multiple independent microarray data sets.

    PubMed

    Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo

    2015-02-01

    Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre

  14. Dissecting the Gene Network of Dietary Restriction to Identify Evolutionarily Conserved Pathways and New Functional Genes

    PubMed Central

    Wuttke, Daniel; Connor, Richard; Vora, Chintan; Craig, Thomas; Li, Yang; Wood, Shona; Vasieva, Olga; Shmookler Reis, Robert; Tang, Fusheng; de Magalhães, João Pedro

    2012-01-01

    Dietary restriction (DR), limiting nutrient intake from diet without causing malnutrition, delays the aging process and extends lifespan in multiple organisms. The conserved life-extending effect of DR suggests the involvement of fundamental mechanisms, although these remain a subject of debate. To help decipher the life-extending mechanisms of DR, we first compiled a list of genes that if genetically altered disrupt or prevent the life-extending effects of DR. We called these DR–essential genes and identified more than 100 in model organisms such as yeast, worms, flies, and mice. In order for other researchers to benefit from this first curated list of genes essential for DR, we established an online database called GenDR (http://genomics.senescence.info/diet/). To dissect the interactions of DR–essential genes and discover the underlying lifespan-extending mechanisms, we then used a variety of network and systems biology approaches to analyze the gene network of DR. We show that DR–essential genes are more conserved at the molecular level and have more molecular interactions than expected by chance. Furthermore, we employed a guilt-by-association method to predict novel DR–essential genes. In budding yeast, we predicted nine genes related to vacuolar functions; we show experimentally that mutations deleting eight of those genes prevent the life-extending effects of DR. Three of these mutants (OPT2, FRE6, and RCR2) had extended lifespan under ad libitum, indicating that the lack of further longevity under DR is not caused by a general compromise of fitness. These results demonstrate how network analyses of DR using GenDR can be used to make phenotypically relevant predictions. Moreover, gene-regulatory circuits reveal that the DR–induced transcriptional signature in yeast involves nutrient-sensing, stress responses and meiotic transcription factors. Finally, comparing the influence of gene expression changes during DR on the interactomes of multiple

  15. A stratified transcriptomics analysis of polygenic fat and lean mouse adipose tissues identifies novel candidate obesity genes.

    PubMed

    Morton, Nicholas M; Nelson, Yvonne B; Michailidou, Zoi; Di Rollo, Emma M; Ramage, Lynne; Hadoke, Patrick W F; Seckl, Jonathan R; Bunger, Lutz; Horvat, Simon; Kenyon, Christopher J; Dunbar, Donald R

    2011-01-01

    Obesity and metabolic syndrome results from a complex interaction between genetic and environmental factors. In addition to brain-regulated processes, recent genome wide association studies have indicated that genes highly expressed in adipose tissue affect the distribution and function of fat and thus contribute to obesity. Using a stratified transcriptome gene enrichment approach we attempted to identify adipose tissue-specific obesity genes in the unique polygenic Fat (F) mouse strain generated by selective breeding over 60 generations for divergent adiposity from a comparator Lean (L) strain. To enrich for adipose tissue obesity genes a 'snap-shot' pooled-sample transcriptome comparison of key fat depots and non adipose tissues (muscle, liver, kidney) was performed. Known obesity quantitative trait loci (QTL) information for the model allowed us to further filter genes for increased likelihood of being causal or secondary for obesity. This successfully identified several genes previously linked to obesity (C1qr1, and Np3r) as positional QTL candidate genes elevated specifically in F line adipose tissue. A number of novel obesity candidate genes were also identified (Thbs1, Ppp1r3d, Tmepai, Trp53inp2, Ttc7b, Tuba1a, Fgf13, Fmr) that have inferred roles in fat cell function. Quantitative microarray analysis was then applied to the most phenotypically divergent adipose depot after exaggerating F and L strain differences with chronic high fat feeding which revealed a distinct gene expression profile of line, fat depot and diet-responsive inflammatory, angiogenic and metabolic pathways. Selected candidate genes Npr3 and Thbs1, as well as Gys2, a non-QTL gene that otherwise passed our enrichment criteria were characterised, revealing novel functional effects consistent with a contribution to obesity. A focussed candidate gene enrichment strategy in the unique F and L model has identified novel adipose tissue-enriched genes contributing to obesity.

  16. A screen to identify Drosophila genes required for integrin-mediated adhesion.

    PubMed Central

    Walsh, E P; Brown, N H

    1998-01-01

    Drosophila integrins have essential adhesive roles during development, including adhesion between the two wing surfaces. Most position-specific integrin mutations cause lethality, and clones of homozygous mutant cells in the wing do not adhere to the apposing surface, causing blisters. We have used FLP-FRT induced mitotic recombination to generate clones of randomly induced mutations in the F1 generation and screened for mutations that cause wing blisters. This phenotype is highly selective, since only 14 lethal complementation groups were identified in screens of the five major chromosome arms. Of the loci identified, 3 are PS integrin genes, 2 are blistered and bloated, and the remaining 9 appear to be newly characterized loci. All 11 nonintegrin loci are required on both sides of the wing, in contrast to integrin alpha subunit genes. Mutations in 8 loci only disrupt adhesion in the wing, similar to integrin mutations, while mutations in the 3 other loci cause additional wing defects. Mutations in 4 loci, like the strongest integrin mutations, cause a "tail-up" embryonic lethal phenotype, and mutant alleles of 1 of these loci strongly enhance an integrin mutation. Thus several of these loci are good candidates for genes encoding cytoplasmic proteins required for integrin function. PMID:9755209

  17. Genomic Analyses Yield Markers for Identifying Agronomically Important Genes in Potato

    USDA-ARS?s Scientific Manuscript database

    This study explores the genetic architecture underling the potato evolution through a comprehensive assessment of wild and cultivated potato species based on the re-sequencing of 201 accessions of Solanum section Petota with >12 × genome coverage. We identified 450 domesticated genes, which showed e...

  18. Engineering and Functional Characterization of Fusion Genes Identifies Novel Oncogenic Drivers of Cancer.

    PubMed

    Lu, Hengyu; Villafane, Nicole; Dogruluk, Turgut; Grzeskowiak, Caitlin L; Kong, Kathleen; Tsang, Yiu Huen; Zagorodna, Oksana; Pantazi, Angeliki; Yang, Lixing; Neill, Nicholas J; Kim, Young Won; Creighton, Chad J; Verhaak, Roel G; Mills, Gordon B; Park, Peter J; Kucherlapati, Raju; Scott, Kenneth L

    2017-07-01

    Oncogenic gene fusions drive many human cancers, but tools to more quickly unravel their functional contributions are needed. Here we describe methodology permitting fusion gene construction for functional evaluation. Using this strategy, we engineered the known fusion oncogenes, BCR-ABL1, EML4-ALK , and ETV6-NTRK3, as well as 20 previously uncharacterized fusion genes identified in The Cancer Genome Atlas datasets. In addition to confirming oncogenic activity of the known fusion oncogenes engineered by our construction strategy, we validated five novel fusion genes involving MET, NTRK2 , and BRAF kinases that exhibited potent transforming activity and conferred sensitivity to FDA-approved kinase inhibitors. Our fusion construction strategy also enabled domain-function studies of BRAF fusion genes. Our results confirmed other reports that the transforming activity of BRAF fusions results from truncation-mediated loss of inhibitory domains within the N-terminus of the BRAF protein. BRAF mutations residing within this inhibitory region may provide a means for BRAF activation in cancer, therefore we leveraged the modular design of our fusion gene construction methodology to screen N-terminal domain mutations discovered in tumors that are wild-type at the BRAF mutation hotspot, V600. We identified an oncogenic mutation, F247L, whose expression robustly activated the MAPK pathway and sensitized cells to BRAF and MEK inhibitors. When applied broadly, these tools will facilitate rapid fusion gene construction for subsequent functional characterization and translation into personalized treatment strategies. Cancer Res; 77(13); 3502-12. ©2017 AACR . ©2017 American Association for Cancer Research.

  19. Novel linkage disequilibrium clustering algorithm identifies new lupus genes on meta-analysis of GWAS datasets.

    PubMed

    Saeed, Mohammad

    2017-05-01

    Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.

  20. An unbiased approach to identify genes involved in development in a turtle with temperature-dependent sex determination.

    PubMed

    Chojnowski, Jena L; Braun, Edward L

    2012-07-15

    Many reptiles exhibit temperature-dependent sex determination (TSD). The initial cue in TSD is incubation temperature, unlike genotypic sex determination (GSD) where it is determined by the presence of specific alleles (or genetic loci). We used patterns of gene expression to identify candidates for genes with a role in TSD and other developmental processes without making a priori assumptions about the identity of these genes (ortholog-based approach). We identified genes with sexually dimorphic mRNA accumulation during the temperature sensitive period of development in the Red-eared slider turtle (Trachemys scripta), a turtle with TSD. Genes with differential mRNA accumulation in response to estrogen (estradiol-17β; E(2)) exposure and developmental stages were also identified. Sequencing 767 clones from three suppression-subtractive hybridization libraries yielded a total of 581 unique sequences. Screening a macroarray with a subset of those sequences revealed a total of 26 genes that exhibited differential mRNA accumulation: 16 female biased and 10 male biased. Additional analyses revealed that C16ORF62 (an unknown gene) and MALAT1 (a long noncoding RNA) exhibited increased mRNA accumulation at the male producing temperature relative to the female producing temperature during embryonic sexual development. Finally, we identified four genes (C16ORF62, CCT3, MMP2, and NFIB) that exhibited a stage effect and five genes (C16ORF62, CCT3, MMP2, NFIB and NOTCH2) showed a response to E(2) exposure. Here we report a survey of genes identified using patterns of mRNA accumulation during embryonic development in a turtle with TSD. Many previous studies have focused on examining the turtle orthologs of genes involved in mammalian development. Although valuable, the limitations of this approach are exemplified by our identification of two genes (MALAT1 and C16ORF62) that are sexually dimorphic during embryonic development. MALAT1 is a noncoding RNA that has not been implicated

  1. An ant colony optimization based algorithm for identifying gene regulatory elements.

    PubMed

    Liu, Wei; Chen, Hanwu; Chen, Ling

    2013-08-01

    It is one of the most important tasks in bioinformatics to identify the regulatory elements in gene sequences. Most of the existing algorithms for identifying regulatory elements are inclined to converge into a local optimum, and have high time complexity. Ant Colony Optimization (ACO) is a meta-heuristic method based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of real ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper designs and implements an ACO based algorithm named ACRI (ant-colony-regulatory-identification) for identifying all possible binding sites of transcription factor from the upstream of co-expressed genes. To accelerate the ants' searching process, a strategy of local optimization is presented to adjust the ants' start positions on the searched sequences. By exploiting the powerful optimization ability of ACO, the algorithm ACRI can not only improve precision of the results, but also achieve a very high speed. Experimental results on real world datasets show that ACRI can outperform other traditional algorithms in the respects of speed and quality of solutions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Similarity of markers identified from cancer gene expression studies: observations from GEO.

    PubMed

    Shi, Xingjie; Shen, Shihao; Liu, Jin; Huang, Jian; Zhou, Yong; Ma, Shuangge

    2014-09-01

    Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  3. Use of an activated beta-catenin to identify Wnt pathway target genes in caenorhabditis elegans, including a subset of collagen genes expressed in late larval development.

    PubMed

    Jackson, Belinda M; Abete-Luzi, Patricia; Krause, Michael W; Eisenmann, David M

    2014-04-16

    The Wnt signaling pathway plays a fundamental role during metazoan development, where it regulates diverse processes, including cell fate specification, cell migration, and stem cell renewal. Activation of the beta-catenin-dependent/canonical Wnt pathway up-regulates expression of Wnt target genes to mediate a cellular response. In the nematode Caenorhabditis elegans, a canonical Wnt signaling pathway regulates several processes during larval development; however, few target genes of this pathway have been identified. To address this deficit, we used a novel approach of conditionally activated Wnt signaling during a defined stage of larval life by overexpressing an activated beta-catenin protein, then used microarray analysis to identify genes showing altered expression compared with control animals. We identified 166 differentially expressed genes, of which 104 were up-regulated. A subset of the up-regulated genes was shown to have altered expression in mutants with decreased or increased Wnt signaling; we consider these genes to be bona fide C. elegans Wnt pathway targets. Among these was a group of six genes, including the cuticular collagen genes, bli-1 col-38, col-49, and col-71. These genes show a peak of expression in the mid L4 stage during normal development, suggesting a role in adult cuticle formation. Consistent with this finding, reduction of function for several of the genes causes phenotypes suggestive of defects in cuticle function or integrity. Therefore, this work has identified a large number of putative Wnt pathway target genes during larval life, including a small subset of Wnt-regulated collagen genes that may function in synthesis of the adult cuticle.

  4. Genomic characterization of biliary tract cancers identifies driver genes and predisposing mutations.

    PubMed

    Wardell, Christopher P; Fujita, Masashi; Yamada, Toru; Simbolo, Michele; Fassan, Matteo; Karlic, Rosa; Polak, Paz; Kim, Jaegil; Hatanaka, Yutaka; Maejima, Kazuhiro; Lawlor, Rita T; Nakanishi, Yoshitsugu; Mitsuhashi, Tomoko; Fujimoto, Akihiro; Furuta, Mayuko; Ruzzenente, Andrea; Conci, Simone; Oosawa, Ayako; Sasaki-Oku, Aya; Nakano, Kaoru; Tanaka, Hiroko; Yamamoto, Yujiro; Michiaki, Kubo; Kawakami, Yoshiiku; Aikata, Hiroshi; Ueno, Masaki; Hayami, Shinya; Gotoh, Kunihito; Ariizumi, Shun-Ichi; Yamamoto, Masakazu; Yamaue, Hiroki; Chayama, Kazuaki; Miyano, Satoru; Getz, Gad; Scarpa, Aldo; Hirano, Satoshi; Nakamura, Toru; Nakagawa, Hidewaki

    2018-05-01

    Biliary tract cancers (BTCs) are clinically and pathologically heterogeneous and respond poorly to treatment. Genomic profiling can offer a clearer understanding of their carcinogenesis, classification and treatment strategy. We performed large-scale genome sequencing analyses on BTCs to investigate their somatic and germline driver events and characterize their genomic landscape. We analyzed 412 BTC samples from Japanese and Italian populations, 107 by whole-exome sequencing (WES), 39 by whole-genome sequencing (WGS), and a further 266 samples by targeted sequencing. The subtypes were 136 intrahepatic cholangiocarcinomas (ICCs), 101 distal cholangiocarcinomas (DCCs), 109 peri-hilar type cholangiocarcinomas (PHCs), and 66 gallbladder or cystic duct cancers (GBCs/CDCs). We identified somatic alterations and searched for driver genes in BTCs, finding pathogenic germline variants of cancer-predisposing genes. We predicted cell-of-origin for BTCs by combining somatic mutation patterns and epigenetic features. We identified 32 significantly and commonly mutated genes including TP53, KRAS, SMAD4, NF1, ARID1A, PBRM1, and ATR, some of which negatively affected patient prognosis. A novel deletion of MUC17 at 7q22.1 affected patient prognosis. Cell-of-origin predictions using WGS and epigenetic features suggest hepatocyte-origin of hepatitis-related ICCs. Deleterious germline mutations of cancer-predisposing genes such as BRCA1, BRCA2, RAD51D, MLH1, or MSH2 were detected in 11% (16/146) of BTC patients. BTCs have distinct genetic features including somatic events and germline predisposition. These findings could be useful to establish treatment and diagnostic strategies for BTCs based on genetic information. We here analyzed genomic features of 412 BTC samples from Japanese and Italian populations. A total of 32 significantly and commonly mutated genes were identified, some of which negatively affected patient prognosis, including a novel deletion of MUC17 at 7q22.1. Cell

  5. Literature mining, gene-set enrichment and pathway analysis for target identification in Behçet's disease.

    PubMed

    Wilson, Paul; Larminie, Christopher; Smith, Rona

    2016-01-01

    To use literature mining to catalogue Behçet's associated genes, and advanced computational methods to improve the understanding of the pathways and signalling mechanisms that lead to the typical clinical characteristics of Behçet's patients. To extend this technique to identify potential treatment targets for further experimental validation. Text mining methods combined with gene enrichment tools, pathway analysis and causal analysis algorithms. This approach identified 247 human genes associated with Behçet's disease and the resulting disease map, comprising 644 nodes and 19220 edges, captured important details of the relationships between these genes and their associated pathways, as described in diverse data repositories. Pathway analysis has identified how Behçet's associated genes are likely to participate in innate and adaptive immune responses. Causal analysis algorithms have identified a number of potential therapeutic strategies for further investigation. Computational methods have captured pertinent features of the prominent disease characteristics presented in Behçet's disease and have highlighted NOD2, ICOS and IL18 signalling as potential therapeutic strategies.

  6. Identifying marker genes in transcription profiling data using a mixture of feature relevance experts.

    PubMed

    Chow, M L; Moler, E J; Mian, I S

    2001-03-08

    Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of

  7. Integrative strategies to identify candidate genes in rodent models of human alcoholism.

    PubMed

    Treadwell, Julie A

    2006-01-01

    The search for genes underlying alcohol-related behaviours in rodent models of human alcoholism has been ongoing for many years with only limited success. Recently, new strategies that integrate several of the traditional approaches have provided new insights into the molecular mechanisms underlying ethanol's actions in the brain. We have used alcohol-preferring C57BL/6J (B6) and alcohol-avoiding DBA/2J (D2) genetic strains of mice in an integrative strategy combining high-throughput gene expression screening, genetic segregation analysis, and mapping to previously published quantitative trait loci to uncover candidate genes for the ethanol-preference phenotype. In our study, 2 genes, retinaldehyde binding protein 1 (Rlbp1) and syntaxin 12 (Stx12), were found to be strong candidates for ethanol preference. Such experimental approaches have the power and the potential to greatly speed up the laborious process of identifying candidate genes for the animal models of human alcoholism.

  8. Methods for identifying an essential gene in a prokaryotic microorganism

    DOEpatents

    Shizuya, Hiroaki

    2006-01-31

    Methods are provided for the rapid identification of essential or conditionally essential DNA segments in any species of haploid cell (one copy chromosome per cell) that is capable of being transformed by artificial means and is capable of undergoing DNA recombination. This system offers an enhanced means of identifying essential function genes in diploid pathogens, such as gram-negative and gram-positive bacteria.

  9. Large-Scale Gene-Centric Analysis Identifies Novel Variants for Coronary Artery Disease

    PubMed Central

    2011-01-01

    Coronary artery disease (CAD) has a significant genetic contribution that is incompletely characterized. To complement genome-wide association (GWA) studies, we conducted a large and systematic candidate gene study of CAD susceptibility, including analysis of many uncommon and functional variants. We examined 49,094 genetic variants in ∼2,100 genes of cardiovascular relevance, using a customised gene array in 15,596 CAD cases and 34,992 controls (11,202 cases and 30,733 controls of European descent; 4,394 cases and 4,259 controls of South Asian origin). We attempted to replicate putative novel associations in an additional 17,121 CAD cases and 40,473 controls. Potential mechanisms through which the novel variants could affect CAD risk were explored through association tests with vascular risk factors and gene expression. We confirmed associations of several previously known CAD susceptibility loci (eg, 9p21.3:p<10−33; LPA:p<10−19; 1p13.3:p<10−17) as well as three recently discovered loci (COL4A1/COL4A2, ZC3HC1, CYP17A1:p<5×10−7). However, we found essentially null results for most previously suggested CAD candidate genes. In our replication study of 24 promising common variants, we identified novel associations of variants in or near LIPA, IL5, TRIB1, and ABCG5/ABCG8, with per-allele odds ratios for CAD risk with each of the novel variants ranging from 1.06–1.09. Associations with variants at LIPA, TRIB1, and ABCG5/ABCG8 were supported by gene expression data or effects on lipid levels. Apart from the previously reported variants in LPA, none of the other ∼4,500 low frequency and functional variants showed a strong effect. Associations in South Asians did not differ appreciably from those in Europeans, except for 9p21.3 (per-allele odds ratio: 1.14 versus 1.27 respectively; P for heterogeneity = 0.003). This large-scale gene-centric analysis has identified several novel genes for CAD that relate to diverse biochemical and cellular functions and

  10. Large-scale gene-centric analysis identifies novel variants for coronary artery disease.

    PubMed

    2011-09-01

    Coronary artery disease (CAD) has a significant genetic contribution that is incompletely characterized. To complement genome-wide association (GWA) studies, we conducted a large and systematic candidate gene study of CAD susceptibility, including analysis of many uncommon and functional variants. We examined 49,094 genetic variants in ∼2,100 genes of cardiovascular relevance, using a customised gene array in 15,596 CAD cases and 34,992 controls (11,202 cases and 30,733 controls of European descent; 4,394 cases and 4,259 controls of South Asian origin). We attempted to replicate putative novel associations in an additional 17,121 CAD cases and 40,473 controls. Potential mechanisms through which the novel variants could affect CAD risk were explored through association tests with vascular risk factors and gene expression. We confirmed associations of several previously known CAD susceptibility loci (eg, 9p21.3:p<10(-33); LPA:p<10(-19); 1p13.3:p<10(-17)) as well as three recently discovered loci (COL4A1/COL4A2, ZC3HC1, CYP17A1:p<5×10(-7)). However, we found essentially null results for most previously suggested CAD candidate genes. In our replication study of 24 promising common variants, we identified novel associations of variants in or near LIPA, IL5, TRIB1, and ABCG5/ABCG8, with per-allele odds ratios for CAD risk with each of the novel variants ranging from 1.06-1.09. Associations with variants at LIPA, TRIB1, and ABCG5/ABCG8 were supported by gene expression data or effects on lipid levels. Apart from the previously reported variants in LPA, none of the other ∼4,500 low frequency and functional variants showed a strong effect. Associations in South Asians did not differ appreciably from those in Europeans, except for 9p21.3 (per-allele odds ratio: 1.14 versus 1.27 respectively; P for heterogeneity = 0.003). This large-scale gene-centric analysis has identified several novel genes for CAD that relate to diverse biochemical and cellular functions and

  11. Gene co-expression analysis identifies gene clusters associated with isotropic and polarized growth in Aspergillus fumigatus conidia.

    PubMed

    Baltussen, Tim J H; Coolen, Jordy P M; Zoll, Jan; Verweij, Paul E; Melchers, Willem J G

    2018-04-26

    Aspergillus fumigatus is a saprophytic fungus that extensively produces conidia. These microscopic asexually reproductive structures are small enough to reach the lungs. Germination of conidia followed by hyphal growth inside human lungs is a key step in the establishment of infection in immunocompromised patients. RNA-Seq was used to analyze the transcriptome of dormant and germinating A. fumigatus conidia. Construction of a gene co-expression network revealed four gene clusters (modules) correlated with a growth phase (dormant, isotropic growth, polarized growth). Transcripts levels of genes encoding for secondary metabolites were high in dormant conidia. During isotropic growth, transcript levels of genes involved in cell wall modifications increased. Two modules encoding for growth and cell cycle/DNA processing were associated with polarized growth. In addition, the co-expression network was used to identify highly connected intermodular hub genes. These genes may have a pivotal role in the respective module and could therefore be compelling therapeutic targets. Generally, cell wall remodeling is an important process during isotropic and polarized growth, characterized by an increase of transcripts coding for hyphal growth and cell cycle/DNA processing when polarized growth is initiated. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Targeted sequencing identifies 91 neurodevelopmental disorder risk genes with autism and developmental disability biases

    PubMed Central

    Stessman, Holly A. F.; Xiong, Bo; Coe, Bradley P.; Wang, Tianyun; Hoekzema, Kendra; Fenckova, Michaela; Kvarnung, Malin; Gerdts, Jennifer; Trinh, Sandy; Cosemans, Nele; Vives, Laura; Lin, Janice; Turner, Tychele N.; Santen, Gijs; Ruivenkamp, Claudia; Kriek, Marjolein; van Haeringen, Arie; Aten, Emmelien; Friend, Kathryn; Liebelt, Jan; Barnett, Christopher; Haan, Eric; Shaw, Marie; Gecz, Jozef; Anderlid, Britt-Marie; Nordgren, Ann; Lindstrand, Anna; Schwartz, Charles; Kooy, R. Frank; Vandeweyer, Geert; Helsmoortel, Celine; Romano, Corrado; Alberti, Antonino; Vinci, Mirella; Avola, Emanuela; Giusto, Stefania; Courchesne, Eric; Pramparo, Tiziano; Pierce, Karen; Nalabolu, Srinivasa; Amaral, David; Scheffer, Ingrid E.; Delatycki, Martin B.; Lockhart, Paul J.; Hormozdiari, Fereydoun; Harich, Benjamin; Castells-Nobau, Anna; Xia, Kun; Peeters, Hilde; Nordenskjöld, Magnus; Schenck, Annette; Bernier, Raphael A.; Eichler, Evan E.

    2017-01-01

    Gene-disruptive mutations contribute to the biology of neurodevelopmental disorders (NDDs), but most pathogenic genes are not known. We sequenced 208 candidate genes from >11,730 patients and >2,867 controls. We report 91 genes with an excess of de novo mutations or private disruptive mutations in 5.7% of patients, including 38 novel NDD genes. Drosophila functional assays of a subset bolster their involvement in NDDs. We identify 25 genes that show a bias for autism versus intellectual disability and highlight a network associated with high-functioning autism (FSIQ>100). Clinical follow-up for NAA15, KMT5B, and ASH1L reveals novel syndromic and non-syndromic forms of disease. PMID:28191889

  13. Microarray analysis identified Puccinia striiformis f. sp. tritici genes involved in infection and sporulation.

    USDA-ARS?s Scientific Manuscript database

    Puccinia striiformis f. sp. tritici (Pst) causes stripe rust, one of the most important diseases of wheat worldwide. To identify Pst genes involved in infection and sporulation, a custom oligonucleotide Genechip was made using sequences of 442 genes selected from Pst cDNA libraries. Microarray analy...

  14. Digital gene expression profiling of flax (Linum usitatissimum L.) stem peel identifies genes enriched in fiber-bearing phloem tissue.

    PubMed

    Guo, Yuan; Qiu, Caisheng; Long, Songhua; Chen, Ping; Hao, Dongmei; Preisner, Marta; Wang, Hui; Wang, Yufu

    2017-08-30

    To better understand the molecular mechanisms and gene expression characteristics associated with development of bast fiber cell within flax stem phloem, the gene expression profiling of flax stem peels and leaves were screened, using Illumina's Digital Gene Expression (DGE) analysis. Four DGE libraries (2 for stem peel and 2 for leaf), ranging from 6.7 to 9.2 million clean reads were obtained, which produced 7.0 million and 6.8 million mapped reads for flax stem peel and leave, respectively. By differential gene expression analysis, a total of 975 genes, of which 708 (73%) genes have protein-coding annotation, were identified as phloem enriched genes putatively involved in the processes of polysaccharide and cell wall metabolism. Differential expression genes (DEGs) was validated using quantitative RT-PCR, the expression pattern of all nine genes determined by qRT-PCR fitted in well with that obtained by sequencing analysis. Cluster and Gene Ontology (GO) analysis revealed that a large number of genes related to metabolic process, catalytic activity and binding category were expressed predominantly in the stem peels. The Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the phloem enriched genes suggested approximately 111 biological pathways. The large number of genes and pathways produced from DGE sequencing will expand our understanding of the complex molecular and cellular events in flax bast fiber development and provide a foundation for future studies on fiber development in other bast fiber crops. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Taxonomically Different Co-Microsymbionts of a Relict Legume, Oxytropis popoviana, Have Complementary Sets of Symbiotic Genes and Together Increase the Efficiency of Plant Nodulation.

    PubMed

    Safronova, Vera I; Belimov, Andrey A; Sazanova, Anna L; Chirak, Elizaveta R; Verkhozina, Alla V; Kuznetsova, Irina G; Andronov, Evgeny E; Puhalsky, Jan V; Tikhonovich, Igor A

    2018-06-20

    Ten rhizobial strains were isolated from root nodules of a relict legume Oxytropis popoviana Peschkova. For identification of the isolates, sequencing of rrs, the internal transcribed spacer region, and housekeeping genes recA, glnII, and rpoB was used. Nine fast-growing isolates were Mesorhizobium-related; eight strains were identified as M. japonicum and one isolate belonged to M. kowhaii. The only slow-growing isolate was identified as a Bradyrhizobium sp. Two strains, M. japonicum Opo-242 and Bradyrhizobium sp. strain Opo-243, were isolated from the same nodule. Symbiotic genes of these isolates were searched throughout the whole-genome sequences. The common nodABC genes and other symbiotic genes required for plant nodulation and nitrogen fixation were present in the isolate Opo-242. Strain Opo-243 did not contain the principal nod, nif, and fix genes; however, five genes (nodP, nodQ, nifL, nolK, and noeL) affecting the specificity of plant-rhizobia interactions but absent in isolate Opo-242 were detected. Strain Opo-243 could not induce nodules but significantly accelerated the root nodule formation after coinoculation with isolate Opo-242. Thus, we demonstrated that taxonomically different strains of the archaic symbiotic system can be co-microsymbionts infecting the same nodule and promoting the nodulation process due to complementary sets of symbiotic genes.

  16. Suppressors of systemin signaling identify genes in the tomato wound response pathway.

    PubMed Central

    Howe, G A; Ryan, C A

    1999-01-01

    In tomato plants, systemic induction of defense genes in response to herbivory or mechanical wounding is regulated by an 18-amino-acid peptide signal called systemin. Transgenic plants that overexpress prosystemin, the systemin precursor, from a 35S::prosystemin (35S::prosys) transgene exhibit constitutive expression of wound-inducible defense proteins including proteinase inhibitors and polyphenol oxidase. To study further the role of (pro)systemin in the wound response pathway, we isolated and characterized mutations that suppress 35S::prosys-mediated phenotypes. Ten recessive, extragenic suppressors were identified. Two of these define new alleles of def-1, a previously identified mutation that blocks both wound- and systemin-induced gene expression and renders plants susceptible to herbivory. The remaining mutants defined four loci designated Spr-1, Spr-2, Spr-3, and Spr-4 (for Suppressed in 35S::prosystemin-mediated responses). spr-3 and spr-4 mutants were not significantly affected in their response to either systemin or mechanical wounding. In contrast, spr-1 and spr-2 plants lacked systemic wound responses and were insensitive to systemin. These results confirm the function of (pro)systemin in the transduction of systemic wound signals and further establish that wounding, systemin, and 35S::prosys induce defensive gene expression through a common signaling pathway defined by at least three genes (Def-1, Spr-1, and Spr-2). PMID:10545469

  17. Systems Biology-Based Investigation of Cellular Antiviral Drug Targets Identified by Gene-Trap Insertional Mutagenesis.

    PubMed

    Cheng, Feixiong; Murray, James L; Zhao, Junfei; Sheng, Jinsong; Zhao, Zhongming; Rubin, Donald H

    2016-09-01

    Viruses require host cellular factors for successful replication. A comprehensive systems-level investigation of the virus-host interactome is critical for understanding the roles of host factors with the end goal of discovering new druggable antiviral targets. Gene-trap insertional mutagenesis is a high-throughput forward genetics approach to randomly disrupt (trap) host genes and discover host genes that are essential for viral replication, but not for host cell survival. In this study, we used libraries of randomly mutagenized cells to discover cellular genes that are essential for the replication of 10 distinct cytotoxic mammalian viruses, 1 gram-negative bacterium, and 5 toxins. We herein reported 712 candidate cellular genes, characterizing distinct topological network and evolutionary signatures, and occupying central hubs in the human interactome. Cell cycle phase-specific network analysis showed that host cell cycle programs played critical roles during viral replication (e.g. MYC and TAF4 regulating G0/1 phase). Moreover, the viral perturbation of host cellular networks reflected disease etiology in that host genes (e.g. CTCF, RHOA, and CDKN1B) identified were frequently essential and significantly associated with Mendelian and orphan diseases, or somatic mutations in cancer. Computational drug repositioning framework via incorporating drug-gene signatures from the Connectivity Map into the virus-host interactome identified 110 putative druggable antiviral targets and prioritized several existing drugs (e.g. ajmaline) that may be potential for antiviral indication (e.g. anti-Ebola). In summary, this work provides a powerful methodology with a tight integration of gene-trap insertional mutagenesis testing and systems biology to identify new antiviral targets and drugs for the development of broadly acting and targeted clinical antiviral therapeutics.

  18. Genome-Wide Gene Set Analysis for Identification of Pathways Associated with Alcohol Dependence

    PubMed Central

    Biernacka, Joanna M.; Geske, Jennifer; Jenkins, Gregory D.; Colby, Colin; Rider, David N.; Karpyak, Victor M.; Choi, Doo-Sup; Fridley, Brooke L.

    2013-01-01

    It is believed that multiple genetic variants with small individual effects contribute to the risk of alcohol dependence. Such polygenic effects are difficult to detect in genome-wide association studies that test for association of the phenotype with each single nucleotide polymorphism (SNP) individually. To overcome this challenge, gene set analysis (GSA) methods that jointly test for the effects of pre-defined groups of genes have been proposed. Rather than testing for association between the phenotype and individual SNPs, these analyses evaluate the global evidence of association with a set of related genes enabling the identification of cellular or molecular pathways or biological processes that play a role in development of the disease. It is hoped that by aggregating the evidence of association for all available SNPs in a group of related genes, these approaches will have enhanced power to detect genetic associations with complex traits. We performed GSA using data from a genome-wide study of 1165 alcohol dependent cases and 1379 controls from the Study of Addiction: Genetics and Environment (SAGE), for all 200 pathways listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results demonstrated a potential role of the “Synthesis and Degradation of Ketone Bodies” pathway. Our results also support the potential involvement of the “Neuroactive Ligand Receptor Interaction” pathway, which has previously been implicated in addictive disorders. These findings demonstrate the utility of GSA in the study of complex disease, and suggest specific directions for further research into the genetic architecture of alcohol dependence. PMID:22717047

  19. An integrative “omics” approach identifies new candidate genes to impact aroma volatiles in peach fruit

    PubMed Central

    2013-01-01

    Background Ever since the recent completion of the peach genome, the focus of genetic research in this area has turned to the identification of genes related to important traits, such as fruit aroma volatiles. Of the over 100 volatile compounds described in peach, lactones most likely have the strongest effect on fruit aroma, while esters, terpenoids, and aldehydes have minor, yet significant effects. The identification of key genes underlying the production of aroma compounds is of interest for any fruit-quality improvement strategy. Results Volatile (52 compounds) and gene expression (4348 genes) levels were profiled in peach fruit from a maturity time-course series belonging to two peach genotypes that showed considerable differences in maturation characteristics and postharvest ripening. This data set was analyzed by complementary correlation-based approaches to discover the genes related to the main aroma-contributing compounds: lactones, esters, and phenolic volatiles, among others. As a case study, one of the candidate genes was cloned and expressed in yeast to show specificity as an ω-6 Oleate desaturase, which may be involved in the production of a precursor of lactones/esters. Conclusions Our approach revealed a set of genes (an alcohol acyl transferase, fatty acid desaturases, transcription factors, protein kinases, cytochromes, etc.) that are highly associated with peach fruit volatiles, and which could prove useful in breeding or for biotechnological purposes. PMID:23701715

  20. A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection.

    PubMed

    Carey, Michelle; Ramírez, Juan Camilo; Wu, Shuang; Wu, Hulin

    2018-07-01

    A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

  1. A Genome-wide CRISPR Screen in Toxoplasma Identifies Essential Apicomplexan Genes.

    PubMed

    Sidik, Saima M; Huet, Diego; Ganesan, Suresh M; Huynh, My-Hang; Wang, Tim; Nasamu, Armiyaw S; Thiru, Prathapan; Saeij, Jeroen P J; Carruthers, Vern B; Niles, Jacquin C; Lourido, Sebastian

    2016-09-08

    Apicomplexan parasites are leading causes of human and livestock diseases such as malaria and toxoplasmosis, yet most of their genes remain uncharacterized. Here, we present the first genome-wide genetic screen of an apicomplexan. We adapted CRISPR/Cas9 to assess the contribution of each gene from the parasite Toxoplasma gondii during infection of human fibroblasts. Our analysis defines ∼200 previously uncharacterized, fitness-conferring genes unique to the phylum, from which 16 were investigated, revealing essential functions during infection of human cells. Secondary screens identify as an invasion factor the claudin-like apicomplexan microneme protein (CLAMP), which resembles mammalian tight-junction proteins and localizes to secretory organelles, making it critical to the initiation of infection. CLAMP is present throughout sequenced apicomplexan genomes and is essential during the asexual stages of the malaria parasite Plasmodium falciparum. These results provide broad-based functional information on T. gondii genes and will facilitate future approaches to expand the horizon of antiparasitic interventions. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Genetic regulation of gene expression in the lung identifies CST3 and CD22 as potential causal genes for airflow obstruction.

    PubMed

    Lamontagne, Maxime; Timens, Wim; Hao, Ke; Bossé, Yohan; Laviolette, Michel; Steiling, Katrina; Campbell, Joshua D; Couture, Christian; Conti, Massimo; Sherwood, Karen; Hogg, James C; Brandsma, Corry-Anke; van den Berge, Maarten; Sandford, Andrew; Lam, Stephen; Lenburg, Marc E; Spira, Avrum; Paré, Peter D; Nickle, David; Sin, Don D; Postma, Dirkje S

    2014-11-01

    COPD is a complex chronic disease with poorly understood pathogenesis. Integrative genomic approaches have the potential to elucidate the biological networks underlying COPD and lung function. We recently combined genome-wide genotyping and gene expression in 1111 human lung specimens to map expression quantitative trait loci (eQTL). To determine causal associations between COPD and lung function-associated single nucleotide polymorphisms (SNPs) and lung tissue gene expression changes in our lung eQTL dataset. We evaluated causality between SNPs and gene expression for three COPD phenotypes: FEV(1)% predicted, FEV(1)/FVC and COPD as a categorical variable. Different models were assessed in the three cohorts independently and in a meta-analysis. SNPs associated with a COPD phenotype and gene expression were subjected to causal pathway modelling and manual curation. In silico analyses evaluated functional enrichment of biological pathways among newly identified causal genes. Biologically relevant causal genes were validated in two separate gene expression datasets of lung tissues and bronchial airway brushings. High reliability causal relations were found in SNP-mRNA-phenotype triplets for FEV(1)% predicted (n=169) and FEV(1)/FVC (n=80). Several genes of potential biological relevance for COPD were revealed. eQTL-SNPs upregulating cystatin C (CST3) and CD22 were associated with worse lung function. Signalling pathways enriched with causal genes included xenobiotic metabolism, apoptosis, protease-antiprotease and oxidant-antioxidant balance. By using integrative genomics and analysing the relationships of COPD phenotypes with SNPs and gene expression in lung tissue, we identified CST3 and CD22 as potential causal genes for airflow obstruction. This study also augmented the understanding of previously described COPD pathways. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  3. A functional screen for copper homeostasis genes identifies a pharmacologically tractable cellular system

    PubMed Central

    2014-01-01

    Background Copper is essential for the survival of aerobic organisms. If copper is not properly regulated in the body however, it can be extremely cytotoxic and genetic mutations that compromise copper homeostasis result in severe clinical phenotypes. Understanding how cells maintain optimal copper levels is therefore highly relevant to human health. Results We found that addition of copper (Cu) to culture medium leads to increased respiratory growth of yeast, a phenotype which we then systematically and quantitatively measured in 5050 homozygous diploid deletion strains. Cu’s positive effect on respiratory growth was quantitatively reduced in deletion strains representing 73 different genes, the function of which identify increased iron uptake as a cause of the increase in growth rate. Conversely, these effects were enhanced in strains representing 93 genes. Many of these strains exhibited respiratory defects that were specifically rescued by supplementing the growth medium with Cu. Among the genes identified are known and direct regulators of copper homeostasis, genes required to maintain low vacuolar pH, and genes where evidence supporting a functional link with Cu has been heretofore lacking. Roughly half of the genes are conserved in man, and several of these are associated with Mendelian disorders, including the Cu-imbalance syndromes Menkes and Wilson’s disease. We additionally demonstrate that pharmacological agents, including the approved drug disulfiram, can rescue Cu-deficiencies of both environmental and genetic origin. Conclusions A functional screen in yeast has expanded the list of genes required for Cu-dependent fitness, revealing a complex cellular system with implications for human health. Respiratory fitness defects arising from perturbations in this system can be corrected with pharmacological agents that increase intracellular copper concentrations. PMID:24708151

  4. A novel approach to identify genes that determine grain protein deviation in cereals.

    PubMed

    Mosleth, Ellen F; Wan, Yongfang; Lysenko, Artem; Chope, Gemma A; Penson, Simon P; Shewry, Peter R; Hawkesford, Malcolm J

    2015-06-01

    Grain yield and protein content were determined for six wheat cultivars grown over 3 years at multiple sites and at multiple nitrogen (N) fertilizer inputs. Although grain protein content was negatively correlated with yield, some grain samples had higher protein contents than expected based on their yields, a trait referred to as grain protein deviation (GPD). We used novel statistical approaches to identify gene transcripts significantly related to GPD across environments. The yield and protein content were initially adjusted for nitrogen fertilizer inputs and then adjusted for yield (to remove the negative correlation with protein content), resulting in a parameter termed corrected GPD. Significant genetic variation in corrected GPD was observed for six cultivars grown over a range of environmental conditions (a total of 584 samples). Gene transcript profiles were determined in a subset of 161 samples of developing grain to identify transcripts contributing to GPD. Principal component analysis (PCA), analysis of variance (ANOVA) and means of scores regression (MSR) were used to identify individual principal components (PCs) correlating with GPD alone. Scores of the selected PCs, which were significantly related to GPD and protein content but not to the yield and significantly affected by cultivar, were identified as reflecting a multivariate pattern of gene expression related to genetic variation in GPD. Transcripts with consistent variation along the selected PCs were identified by an approach hereby called one-block means of scores regression (one-block MSR). © 2014 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  5. Immunogenetic mechanisms leading to thyroid autoimmunity: recent advances in identifying susceptibility genes and regions.

    PubMed

    Brand, Oliver J; Gough, Stephen C L

    2011-12-01

    The autoimmune thyroid diseases (AITD) include Graves' disease (GD) and Hashimoto's thyroiditis (HT), which are characterised by a breakdown in immune tolerance to thyroid antigens. Unravelling the genetic architecture of AITD is vital to better understanding of AITD pathogenesis, required to advance therapeutic options in both disease management and prevention. The early whole-genome linkage and candidate gene association studies provided the first evidence that the HLA region and CTLA-4 represented AITD risk loci. Recent improvements in; high throughput genotyping technologies, collection of larger disease cohorts and cataloguing of genome-scale variation have facilitated genome-wide association studies and more thorough screening of candidate gene regions. This has allowed identification of many novel AITD risk genes and more detailed association mapping. The growing number of confirmed AITD susceptibility loci, implicates a number of putative disease mechanisms most of which are tightly linked with aspects of immune system function. The unprecedented advances in genetic study will allow future studies to identify further novel disease risk genes and to identify aetiological variants within specific gene regions, which will undoubtedly lead to a better understanding of AITD patho-physiology.

  6. Immunogenetic Mechanisms Leading to Thyroid Autoimmunity: Recent Advances in Identifying Susceptibility Genes and Regions

    PubMed Central

    Brand, Oliver J; Gough, Stephen C.L

    2011-01-01

    The autoimmune thyroid diseases (AITD) include Graves’ disease (GD) and Hashimoto’s thyroiditis (HT), which are characterised by a breakdown in immune tolerance to thyroid antigens. Unravelling the genetic architecture of AITD is vital to better understanding of AITD pathogenesis, required to advance therapeutic options in both disease management and prevention. The early whole-genome linkage and candidate gene association studies provided the first evidence that the HLA region and CTLA-4 represented AITD risk loci. Recent improvements in; high throughput genotyping technologies, collection of larger disease cohorts and cataloguing of genome-scale variation have facilitated genome-wide association studies and more thorough screening of candidate gene regions. This has allowed identification of many novel AITD risk genes and more detailed association mapping. The growing number of confirmed AITD susceptibility loci, implicates a number of putative disease mechanisms most of which are tightly linked with aspects of immune system function. The unprecedented advances in genetic study will allow future studies to identify further novel disease risk genes and to identify aetiological variants within specific gene regions, which will undoubtedly lead to a better understanding of AITD patho-physiology. PMID:22654554

  7. Superior Cross-Species Reference Genes: A Blueberry Case Study

    PubMed Central

    Die, Jose V.; Rowland, Lisa J.

    2013-01-01

    The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469

  8. Gene-Based Sequencing Identifies Lipid-Influencing Variants with Ethnicity-Specific Effects in African Americans

    PubMed Central

    Bentley, Amy R.; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P.; Zhou, Jie; Huang, Hanxia; Mullikin, James C.; Blakesley, Robert W.; Hansen, Nancy F.; Bouffard, Gerard G.; Cherukuri, Praveen F.; Maskeri, Baishali; Young, Alice C.; Adeyemo, Adebowale; Rotimi, Charles N.

    2014-01-01

    Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA. PMID:24603370

  9. A Stratified Transcriptomics Analysis of Polygenic Fat and Lean Mouse Adipose Tissues Identifies Novel Candidate Obesity Genes

    PubMed Central

    Morton, Nicholas M.; Nelson, Yvonne B.; Michailidou, Zoi; Di Rollo, Emma M.; Ramage, Lynne; Hadoke, Patrick W. F.; Seckl, Jonathan R.; Bunger, Lutz; Horvat, Simon; Kenyon, Christopher J.; Dunbar, Donald R.

    2011-01-01

    Background Obesity and metabolic syndrome results from a complex interaction between genetic and environmental factors. In addition to brain-regulated processes, recent genome wide association studies have indicated that genes highly expressed in adipose tissue affect the distribution and function of fat and thus contribute to obesity. Using a stratified transcriptome gene enrichment approach we attempted to identify adipose tissue-specific obesity genes in the unique polygenic Fat (F) mouse strain generated by selective breeding over 60 generations for divergent adiposity from a comparator Lean (L) strain. Results To enrich for adipose tissue obesity genes a ‘snap-shot’ pooled-sample transcriptome comparison of key fat depots and non adipose tissues (muscle, liver, kidney) was performed. Known obesity quantitative trait loci (QTL) information for the model allowed us to further filter genes for increased likelihood of being causal or secondary for obesity. This successfully identified several genes previously linked to obesity (C1qr1, and Np3r) as positional QTL candidate genes elevated specifically in F line adipose tissue. A number of novel obesity candidate genes were also identified (Thbs1, Ppp1r3d, Tmepai, Trp53inp2, Ttc7b, Tuba1a, Fgf13, Fmr) that have inferred roles in fat cell function. Quantitative microarray analysis was then applied to the most phenotypically divergent adipose depot after exaggerating F and L strain differences with chronic high fat feeding which revealed a distinct gene expression profile of line, fat depot and diet-responsive inflammatory, angiogenic and metabolic pathways. Selected candidate genes Npr3 and Thbs1, as well as Gys2, a non-QTL gene that otherwise passed our enrichment criteria were characterised, revealing novel functional effects consistent with a contribution to obesity. Conclusions A focussed candidate gene enrichment strategy in the unique F and L model has identified novel adipose tissue-enriched genes

  10. Gene panel sequencing in familial breast/ovarian cancer patients identifies multiple novel mutations also in genes others than BRCA1/2.

    PubMed

    Kraus, Cornelia; Hoyer, Juliane; Vasileiou, Georgia; Wunderle, Marius; Lux, Michael P; Fasching, Peter A; Krumbiegel, Mandy; Uebe, Steffen; Reuter, Miriam; Beckmann, Matthias W; Reis, André

    2017-01-01

    Breast and ovarian cancer (BC/OC) predisposition has been attributed to a number of high- and moderate to low-penetrance susceptibility genes. With the advent of next generation sequencing (NGS) simultaneous testing of these genes has become feasible. In this monocentric study, we report results of panel-based screening of 14 BC/OC susceptibility genes (BRCA1, BRCA2, RAD51C, RAD51D, CHEK2, PALB2, ATM, NBN, CDH1, TP53, MLH1, MSH2, MSH6 and PMS2) in a group of 581 consecutive individuals from a German population with BC and/or OC fulfilling diagnostic criteria for BRCA1 and BRCA2 testing including 179 with a triple-negative tumor. Altogether we identified 106 deleterious mutations in 105 (18%) patients in 10 different genes, including seven different exon deletions. Of these 106 mutations, 16 (15%) were novel and only six were found in BRCA1/2. To further characterize mutations located in or nearby splicing consensus sites we performed RT-PCR analysis which allowed confirmation of pathogenicity in 7 of 9 mutations analyzed. In PALB2, we identified a deleterious variant in six cases. All but one were associated with early onset BC and a positive family history indicating that penetrance for PALB2 mutations is comparable to BRCA2. Overall, extended testing beyond BRCA1/2 identified a deleterious mutation in further 6% of patients. As a downside, 89 variants of uncertain significance were identified highlighting the need for comprehensive variant databases. In conclusion, panel testing yields more accurate information on genetic cancer risk than assessing BRCA1/2 alone and wide-spread testing will help improve penetrance assessment of variants in these risk genes. © 2016 UICC.

  11. Genome-Wide association study identifies candidate genes for Parkinson's disease in an Ashkenazi Jewish population

    PubMed Central

    2011-01-01

    Background To date, nine Parkinson disease (PD) genome-wide association studies in North American, European and Asian populations have been published. The majority of studies have confirmed the association of the previously identified genetic risk factors, SNCA and MAPT, and two studies have identified three new PD susceptibility loci/genes (PARK16, BST1 and HLA-DRB5). In a recent meta-analysis of datasets from five of the published PD GWAS an additional 6 novel candidate genes (SYT11, ACMSD, STK39, MCCC1/LAMP3, GAK and CCDC62/HIP1R) were identified. Collectively the associations identified in these GWAS account for only a small proportion of the estimated total heritability of PD suggesting that an 'unknown' component of the genetic architecture of PD remains to be identified. Methods We applied a GWAS approach to a relatively homogeneous Ashkenazi Jewish (AJ) population from New York to search for both 'rare' and 'common' genetic variants that confer risk of PD by examining any SNPs with allele frequencies exceeding 2%. We have focused on a genetic isolate, the AJ population, as a discovery dataset since this cohort has a higher sharing of genetic background and historically experienced a significant bottleneck. We also conducted a replication study using two publicly available datasets from dbGaP. The joint analysis dataset had a combined sample size of 2,050 cases and 1,836 controls. Results We identified the top 57 SNPs showing the strongest evidence of association in the AJ dataset (p < 9.9 × 10-5). Six SNPs located within gene regions had positive signals in at least one other independent dbGaP dataset: LOC100505836 (Chr3p24), LOC153328/SLC25A48 (Chr5q31.1), UNC13B (9p13.3), SLCO3A1(15q26.1), WNT3(17q21.3) and NSF (17q21.3). We also replicated published associations for the gene regions SNCA (Chr4q21; rs3775442, p = 0.037), PARK16 (Chr1q32.1; rs823114 (NUCKS1), p = 6.12 × 10-4), BST1 (Chr4p15; rs12502586, p = 0.027), STK39 (Chr2q24.3; rs3754775, p = 0

  12. De novo Transcriptome Analysis of Miscanthus lutarioriparius Identifies Candidate Genes in Rhizome Development

    PubMed Central

    Hu, Ruibo; Yu, Changjiang; Wang, Xiaoyu; Jia, Chunlin; Pei, Shengqiang; He, Kang; He, Guo; Kong, Yingzhen; Zhou, Gongke

    2017-01-01

    HIGHLIGHT De novo transcriptome profiling of five tissues reveals candidate genes putatively involved in rhizome development in M. lutarioriparius. Miscanthus lutarioriparius is a promising lignocellulosic feedstock for second-generation bioethanol production. However, the genomic resource for this species is relatively limited thus hampers our understanding of the molecular mechanisms underlying many important biological processes. In this study, we performed the first de novo transcriptome analysis of five tissues (leaf, stem, root, lateral bud and rhizome bud) of M. lutarioriparius with an emphasis to identify putative genes involved in rhizome development. Approximately 66 gigabase (GB) paired-end clean reads were obtained and assembled into 169,064 unigenes with an average length of 759 bp. Among these unigenes, 103,899 (61.5%) were annotated in seven public protein databases. Differential gene expression profiling analysis revealed that 4,609, 3,188, 1,679, 1,218, and 1,077 genes were predominantly expressed in root, leaf, stem, lateral bud, and rhizome bud, respectively. Their expression patterns were further classified into 12 distinct clusters. Pathway enrichment analysis revealed that genes predominantly expressed in rhizome bud were mainly involved in primary metabolism and hormone signaling and transduction pathways. Noteworthy, 19 transcription factors (TFs) and 16 hormone signaling pathway-related genes were identified to be predominantly expressed in rhizome bud compared with the other tissues, suggesting putative roles in rhizome formation and development. In addition, a predictive regulatory network was constructed between four TFs and six auxin and abscisic acid (ABA) -related genes. Furthermore, the expression of 24 rhizome-specific genes was further validated by quantitative real-time RT-PCR (qRT-PCR) analysis. Taken together, this study provide a global portrait of gene expression across five different tissues and reveal preliminary insights

  13. Blood pressure loci identified with a gene-centric array.

    PubMed

    Johnson, Toby; Gaunt, Tom R; Newhouse, Stephen J; Padmanabhan, Sandosh; Tomaszewski, Maciej; Kumari, Meena; Morris, Richard W; Tzoulaki, Ioanna; O'Brien, Eoin T; Poulter, Neil R; Sever, Peter; Shields, Denis C; Thom, Simon; Wannamethee, Sasiwarang G; Whincup, Peter H; Brown, Morris J; Connell, John M; Dobson, Richard J; Howard, Philip J; Mein, Charles A; Onipinla, Abiodun; Shaw-Hawkins, Sue; Zhang, Yun; Davey Smith, George; Day, Ian N M; Lawlor, Debbie A; Goodall, Alison H; Fowkes, F Gerald; Abecasis, Gonçalo R; Elliott, Paul; Gateva, Vesela; Braund, Peter S; Burton, Paul R; Nelson, Christopher P; Tobin, Martin D; van der Harst, Pim; Glorioso, Nicola; Neuvrith, Hani; Salvi, Erika; Staessen, Jan A; Stucchi, Andrea; Devos, Nabila; Jeunemaitre, Xavier; Plouin, Pierre-François; Tichet, Jean; Juhanson, Peeter; Org, Elin; Putku, Margus; Sõber, Siim; Veldre, Gudrun; Viigimaa, Margus; Levinsson, Anna; Rosengren, Annika; Thelle, Dag S; Hastie, Claire E; Hedner, Thomas; Lee, Wai K; Melander, Olle; Wahlstrand, Björn; Hardy, Rebecca; Wong, Andrew; Cooper, Jackie A; Palmen, Jutta; Chen, Li; Stewart, Alexandre F R; Wells, George A; Westra, Harm-Jan; Wolfs, Marcel G M; Clarke, Robert; Franzosi, Maria Grazia; Goel, Anuj; Hamsten, Anders; Lathrop, Mark; Peden, John F; Seedorf, Udo; Watkins, Hugh; Ouwehand, Willem H; Sambrook, Jennifer; Stephens, Jonathan; Casas, Juan-Pablo; Drenos, Fotios; Holmes, Michael V; Kivimaki, Mika; Shah, Sonia; Shah, Tina; Talmud, Philippa J; Whittaker, John; Wallace, Chris; Delles, Christian; Laan, Maris; Kuh, Diana; Humphries, Steve E; Nyberg, Fredrik; Cusi, Daniele; Roberts, Robert; Newton-Cheh, Christopher; Franke, Lude; Stanton, Alice V; Dominiczak, Anna F; Farrall, Martin; Hingorani, Aroon D; Samani, Nilesh J; Caulfield, Mark J; Munroe, Patricia B

    2011-12-09

    Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a bespoke gene-centric array to genotype an independent discovery sample of 25,118 individuals that combined hypertensive case-control and general population samples. We followed up four SNPs associated with BP at our p < 8.56 × 10(-7) study-specific significance threshold and six suggestively associated SNPs in a further 59,349 individuals. We identified and replicated a SNP at LSP1/TNNT3, a SNP at MTHFR-NPPB independent (r(2) = 0.33) of previous reports, and replicated SNPs at AGT and ATP2B1 reported previously. An analysis of combined discovery and follow-up data identified SNPs significantly associated with BP at p < 8.56 × 10(-7) at four further loci (NPR3, HFE, NOS3, and SOX6). The high number of discoveries made with modest genotyping effort can be attributed to using a large-scale yet targeted genotyping array and to the development of a weighting scheme that maximized power when meta-analyzing results from samples ascertained with extreme phenotypes, in combination with results from nonascertained or population samples. Chromatin immunoprecipitation and transcript expression data highlight potential gene regulatory mechanisms at the MTHFR and NOS3 loci. These results provide candidates for further study to help dissect mechanisms affecting BP and highlight the utility of studying SNPs and samples that are independent of those studied previously even when the sample size is smaller than that in previous studies. Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  14. Methods to identify and analyze gene products involved in neuronal intracellular transport using Drosophila

    PubMed Central

    Neisch, Amanda L.; Avery, Adam W.; Machame, James B.; Li, Min-gang; Hays, Thomas S.

    2017-01-01

    Proper neuronal function critically depends on efficient intracellular transport and disruption of transport leads to neurodegeneration. Molecular pathways that support or regulate neuronal transport are not fully understood. A greater understanding of these pathways will help reveal the pathological mechanisms underlying disease. Drosophila melanogaster is the premier model system for performing large-scale genetic functional screens. Here we describe methods to carry out primary and secondary genetic screens in Drosophila aimed at identifying novel gene products and pathways that impact neuronal intracellular transport. These screens are performed using whole animal or live cell imaging of intact neural tissue to ensure integrity of neurons and their cellular environment. The primary screen is used to identify gross defects in neuronal function indicative of a disruption in microtubule-based transport. The secondary screens, conducted in both motoneurons and dendritic arborization neurons, will confirm the function of candidate gene products in intracellular transport. Together, the methodologies described here will support labs interested in identifying and characterizing gene products that alter intracellular transport in Drosophila. PMID:26794520

  15. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  16. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  17. Gene Expression Analysis of Forskolin Treated Basilar Papillae Identifies MicroRNA181a as a Mediator of Proliferation

    PubMed Central

    Frucht, Corey S.; Uduman, Mohamed; Duke, Jamie L.; Kleinstein, Steven H.; Santos-Sacchi, Joseph; Navaratnam, Dhasakumar S.

    2010-01-01

    Background Auditory hair cells spontaneously regenerate following injury in birds but not mammals. A better understanding of the molecular events underlying hair cell regeneration in birds may allow for identification and eventually manipulation of relevant pathways in mammals to stimulate regeneration and restore hearing in deaf patients. Methodology/Principal Findings Gene expression was profiled in forskolin treated (i.e., proliferating) and quiescent control auditory epithelia of post-hatch chicks using an Affymetrix whole-genome chicken array after 24 (n = 6), 48 (n = 6), and 72 (n = 12) hours in culture. In the forskolin-treated epithelia there was significant (p<0.05; >two-fold change) upregulation of many genes thought to be relevant to cell cycle control and inner ear development. Gene set enrichment analysis was performed on the data and identified myriad microRNAs that are likely to be upregulated in the regenerating tissue, including microRNA181a (miR181a), which is known to mediate proliferation in other systems. Functional experiments showed that miR181a overexpression is sufficient to stimulate proliferation within the basilar papilla, as assayed by BrdU incorporation. Further, some of the newly produced cells express the early hair cell marker myosin VI, suggesting that miR181a transfection can result in the production of new hair cells. Conclusions/Significance These studies have identified a single microRNA, miR181a, that can cause proliferation in the chicken auditory epithelium with production of new hair cells. PMID:20634979

  18. Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls.

    PubMed

    Meijer, Rosa J; Goeman, Jelle J

    2016-09-01

    The use of multiple testing procedures in the context of gene-set testing is an important but relatively underexposed topic. If a multiple testing method is used, this is usually a standard familywise error rate (FWER) or false discovery rate (FDR) controlling procedure in which the logical relationships that exist between the different (self-contained) hypotheses are not taken into account. Taking those relationships into account, however, can lead to more powerful variants of existing multiple testing procedures and can make summarizing and interpreting the final results easier. We will show that, from the perspective of interpretation as well as from the perspective of power improvement, FWER controlling methods are more suitable than FDR controlling methods. As an example of a possible power improvement, we suggest a modified version of the popular method by Holm, which we also implemented in the R package cherry. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  19. Identifying pathogenic processes by integrating microarray data with prior knowledge

    PubMed Central

    2014-01-01

    Background It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping. Results Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation. Conclusion Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways. PMID:24758699

  20. A gene expression biomarker identifies in vitro and in vivo ERα modulators in a human gene expression compendium

    EPA Science Inventory

    We propose the use of gene expression profiling to complement the chemical characterization currently based on HTS assay data and present a case study relevant to the Endocrine Disruptor Screening Program. We have developed computational methods to identify estrogen receptor &alp...

  1. Integration of QTL and bioinformatic tools to identify candidate genes for triglycerides in mice[S

    PubMed Central

    Leduc, Magalie S.; Hageman, Rachael S.; Verdugo, Ricardo A.; Tsaih, Shirng-Wern; Walsh, Kenneth; Churchill, Gary A.; Paigen, Beverly

    2011-01-01

    To identify genetic loci influencing lipid levels, we performed quantitative trait loci (QTL) analysis between inbred mouse strains MRL/MpJ and SM/J, measuring triglyceride levels at 8 weeks of age in F2 mice fed a chow diet. We identified one significant QTL on chromosome (Chr) 15 and three suggestive QTL on Chrs 2, 7, and 17. We also carried out microarray analysis on the livers of parental strains of 282 F2 mice and used these data to find cis-regulated expression QTL. We then narrowed the list of candidate genes under significant QTL using a “toolbox” of bioinformatic resources, including haplotype analysis; parental strain comparison for gene expression differences and nonsynonymous coding single nucleotide polymorphisms (SNP); cis-regulated eQTL in livers of F2 mice; correlation between gene expression and phenotype; and conditioning of expression on the phenotype. We suggest Slc25a7 as a candidate gene for the Chr 7 QTL and, based on expression differences, five genes (Polr3 h, Cyp2d22, Cyp2d26, Tspo, and Ttll12) as candidate genes for Chr 15 QTL. This study shows how bioinformatics can be used effectively to reduce candidate gene lists for QTL related to complex traits. PMID:21622629

  2. Transcriptome and metabolite analysis identifies nitrogen utilization genes in tea plant (Camellia sinensis).

    PubMed

    Li, Wei; Xiang, Fen; Zhong, Micai; Zhou, Lingyun; Liu, Hongyan; Li, Saijun; Wang, Xuewen

    2017-05-10

    Applied nitrogen (N) fertilizer significantly increases the leaf yield. However, most N is not utilized by the plant, negatively impacting the environment. To date, little is known regarding N utilization genes and mechanisms in the leaf production. To understand this, we investigated transcriptomes using RNA-seq and amino acid levels with N treatment in tea (Camellia sinensis), the most popular beverage crop. We identified 196 and 29 common differentially expressed genes in roots and leaves, respectively, in response to ammonium in two tea varieties. Among those genes, AMT, NRT and AQP for N uptake and GOGAT and GS for N assimilation were the key genes, validated by RT-qPCR, which expressed in a network manner with tissue specificity. Importantly, only AQP and three novel DEGs associated with stress, manganese binding, and gibberellin-regulated transcription factor were common in N responses across all tissues and varieties. A hypothesized gene regulatory network for N was proposed. A strong statistical correlation between key genes' expression and amino acid content was revealed. The key genes and regulatory network improve our understanding of the molecular mechanism of N usage and offer gene targets for plant improvement.

  3. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  4. Identifying resistance gene analogs associated with resistances to different pathogens in common bean.

    PubMed

    López, Camilo E; Acosta, Iván F; Jara, Carlos; Pedraza, Fabio; Gaitán-Solís, Eliana; Gallego, Gerardo; Beebe, Steve; Tohme, Joe

    2003-01-01

    ABSTRACT A polymerase chain reaction approach using degenerate primers that targeted the conserved domains of cloned plant disease resistance genes (R genes) was used to isolate a set of 15 resistance gene analogs (RGAs) from common bean (Phaseolus vulgaris). Eight different classes of RGAs were obtained from nucleotide binding site (NBS)-based primers and seven from not previously described Toll/Interleukin-1 receptor-like (TIR)-based primers. Putative amino acid sequences of RGAs were significantly similar to R genes and contained additional conserved motifs. The NBS-type RGAs were classified in two subgroups according to the expected final residue in the kinase-2 motif. Eleven RGAs were mapped at 19 loci on eight linkage groups of the common bean genetic map constructed at Centro Internacional de Agricultura Tropical. Genetic linkage was shown for eight RGAs with partial resistance to anthracnose, angular leaf spot (ALS) and Bean golden yellow mosaic virus (BGYMV). RGA1 and RGA2 were associated with resistance loci to anthracnose and BGYMV and were part of two clusters of R genes previously described. A new major cluster was detected by RGA7 and explained up to 63.9% of resistance to ALS and has a putative contribution to anthracnose resistance. These results show the usefulness of RGAs as candidate genes to detect and eventually isolate numerous R genes in common bean.

  5. Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, R. L., Hamaguchi, L., Busch, M. A., and Weigel, D.

    2003-06-01

    OAK-B135 In Arabidopsis thaliana, cis-regulatory sequences of the floral homeotic gene AGAMOUS (AG) are located in the second intron. This 3 kb intron contains binding sites for two direct activators of AG, LEAFY (LFY) and WUSCHEL (WUS), along with other putative regulatory elements. We have used phylogenetic footprinting and the related technique of phylogenetic shadowing to identify putative cis-regulatory elements in this intron. Among 29 Brassicaceae, several other motifs, but not the LFY and WUS binding sites previously identified, are largely invariant. Using reporter gene analyses, we tested six of these motifs and found that they are all functionally importantmore » for activity of AG regulatory sequences in A. thaliana. Although there is little obvious sequence similarity outside the Brassicaceae, the intron from cucumber AG has at least partial activity in A. thaliana. Our studies underscore the value of the comparative approach as a tool that complements gene-by-gene promoter dissection, but also highlight that sequence-based studies alone are insufficient for a complete identification of cis-regulatory sites.« less

  6. Genetic Susceptibility to Vitiligo: GWAS Approaches for Identifying Vitiligo Susceptibility Genes and Loci

    PubMed Central

    Shen, Changbing; Gao, Jing; Sheng, Yujun; Dou, Jinfa; Zhou, Fusheng; Zheng, Xiaodong; Ko, Randy; Tang, Xianfa; Zhu, Caihong; Yin, Xianyong; Sun, Liangdan; Cui, Yong; Zhang, Xuejun

    2016-01-01

    Vitiligo is an autoimmune disease with a strong genetic component, characterized by areas of depigmented skin resulting from loss of epidermal melanocytes. Genetic factors are known to play key roles in vitiligo through discoveries in association studies and family studies. Previously, vitiligo susceptibility genes were mainly revealed through linkage analysis and candidate gene studies. Recently, our understanding of the genetic basis of vitiligo has been rapidly advancing through genome-wide association study (GWAS). More than 40 robust susceptible loci have been identified and confirmed to be associated with vitiligo by using GWAS. Most of these associated genes participate in important pathways involved in the pathogenesis of vitiligo. Many susceptible loci with unknown functions in the pathogenesis of vitiligo have also been identified, indicating that additional molecular mechanisms may contribute to the risk of developing vitiligo. In this review, we summarize the key loci that are of genome-wide significance, which have been shown to influence vitiligo risk. These genetic loci may help build the foundation for genetic diagnosis and personalize treatment for patients with vitiligo in the future. However, substantial additional studies, including gene-targeted and functional studies, are required to confirm the causality of the genetic variants and their biological relevance in the development of vitiligo. PMID:26870082

  7. Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods

    PubMed Central

    2012-01-01

    High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods. Reviewers This article was reviewed by Arcady Mushegian, Byung-Soo Kim and Joel Bader. PMID:23227854

  8. A novel gammaretroviral shuttle vector insertional mutagenesis screen identifies SHARPIN as a breast cancer metastasis gene and prognostic biomarker.

    PubMed

    Bii, Victor M; Rae, Dustin T; Trobridge, Grant D

    2015-11-24

    Breast cancer (BC) is the second leading cause of malignancy among U.S. women. Metastasis results in a poor prognosis and increased mortality, but the molecular mechanisms by which metastatic tumors occur are not well understood. Identifying the genes that drive the metastatic process could provide targets for improved therapy and biomarkers to improve BC patient outcomes. Using a forward mutagenesis screen, BC cells mutagenized with a replication-incompetent gammaretroviral vector (γRV) were xenotransplanted into the mammary fat pad of immunodeficient mice. In this approach the vector provirus dysregulates nearby genes, providing a selective advantage to transduced cells to form metastases. Metastatic tumors were analyzed for proviral integration sites to identify nearby candidate metastasis genes. The γRV has a transgene cassette that allows for rescue in bacteria and rapid identification of vector integration sites. Using this approach, we identified the previously described metastasis gene WWTR1 (TAZ), and three other novel candidate metastasis genes including SHARPIN. SHARPIN was independently validated in vivo as a BC metastasis gene. Analysis of patient data showed that SHARPIN expression predicts metastasis-free survival after adjuvant therapy. Our approach has broad potential to identify genes involved in oncogenic processes for BC and other cancers. We show here it can identify both known (WWTR1) and novel (SHARPIN) BC metastasis genes.

  9. A comparative gene analysis with rice identified orthologous group II HKT genes and their association with Na(+) concentration in bread wheat.

    PubMed

    Ariyarathna, H A Chandima K; Oldach, Klaus H; Francki, Michael G

    2016-01-19

    Although the HKT transporter genes ascertain some of the key determinants of crop salt tolerance mechanisms, the diversity and functional role of group II HKT genes are not clearly understood in bread wheat. The advanced knowledge on rice HKT and whole genome sequence was, therefore, used in comparative gene analysis to identify orthologous wheat group II HKT genes and their role in trait variation under different saline environments. The four group II HKTs in rice identified two orthologous gene families from bread wheat, including the known TaHKT2;1 gene family and a new distinctly different gene family designated as TaHKT2;2. A single copy of TaHKT2;2 was found on each homeologous chromosome arm 7AL, 7BL and 7DL and each gene was expressed in leaf blade, sheath and root tissues under non-stressed and at 200 mM salt stressed conditions. The proteins encoded by genes of the TaHKT2;2 family revealed more than 93% amino acid sequence identity but ≤52% amino acid identity compared to the proteins encoded by TaHKT2;1 family. Specifically, variations in known critical domains predicted functional differences between the two protein families. Similar to orthologous rice genes on chromosome 6L, TaHKT2;1 and TaHKT2;2 genes were located approximately 3 kb apart on wheat chromosomes 7AL, 7BL and 7DL, forming a static syntenic block in the two species. The chromosomal region on 7AL containing TaHKT2;1 7AL-1 co-located with QTL for shoot Na(+) concentration and yield in some saline environments. The differences in copy number, genes sequences and encoded proteins between TaHKT2;2 homeologous genes and other group II HKT gene families within and across species likely reflect functional diversity for ion selectivity and transport in plants. Evidence indicated that neither TaHKT2;2 nor TaHKT2;1 were associated with primary root Na(+) uptake but TaHKT2;1 may be associated with trait variation for Na(+) exclusion and yield in some but not all saline environments.

  10. Identifying Loci Under Selection Against Gene Flow in Isolation-with-Migration Models

    PubMed Central

    Sousa, Vitor C.; Carneiro, Miguel; Ferrand, Nuno; Hey, Jody

    2013-01-01

    When divergence occurs in the presence of gene flow, there can arise an interesting dynamic in which selection against gene flow, at sites associated with population-specific adaptations or genetic incompatibilities, can cause net gene flow to vary across the genome. Loci linked to sites under selection may experience reduced gene flow and may experience genetic bottlenecks by the action of nearby selective sweeps. Data from histories such as these may be poorly fitted by conventional neutral model approaches to demographic inference, which treat all loci as equally subject to forces of genetic drift and gene flow. To allow for demographic inference in the face of such histories, as well as the identification of loci affected by selection, we developed an isolation-with-migration model that explicitly provides for variation among genomic regions in migration rates and/or rates of genetic drift. The method allows for loci to fall into any of multiple groups, each characterized by a different set of parameters, thus relaxing the assumption that all loci share the same demography. By grouping loci, the method can be applied to data with multiple loci and still have tractable dimensionality and statistical power. We studied the performance of the method using simulated data, and we applied the method to study the divergence of two subspecies of European rabbits (Oryctolagus cuniculus). PMID:23457232

  11. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

    PubMed Central

    Poole, William; Leinonen, Kalle; Shmulevich, Ilya

    2017-01-01

    Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390

  12. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

    PubMed

    Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

    2017-02-01

    Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.

  13. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes

    PubMed Central

    Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.

    2005-01-01

    Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209

  14. Genome-wide screen in Saccharomyces cerevisiae identifies vacuolar protein sorting, autophagy, biosynthetic, and tRNA methylation genes involved in life span regulation.

    PubMed

    Fabrizio, Paola; Hoon, Shawn; Shamalnasab, Mehrnaz; Galbani, Abdulaye; Wei, Min; Giaever, Guri; Nislow, Corey; Longo, Valter D

    2010-07-15

    The study of the chronological life span of Saccharomyces cerevisiae, which measures the survival of populations of non-dividing yeast, has resulted in the identification of homologous genes and pathways that promote aging in organisms ranging from yeast to mammals. Using a competitive genome-wide approach, we performed a screen of a complete set of approximately 4,800 viable deletion mutants to identify genes that either increase or decrease chronological life span. Half of the putative short-/long-lived mutants retested from the primary screen were confirmed, demonstrating the utility of our approach. Deletion of genes involved in vacuolar protein sorting, autophagy, and mitochondrial function shortened life span, confirming that respiration and degradation processes are essential for long-term survival. Among the genes whose deletion significantly extended life span are ACB1, CKA2, and TRM9, implicated in fatty acid transport and biosynthesis, cell signaling, and tRNA methylation, respectively. Deletion of these genes conferred heat-shock resistance, supporting the link between life span extension and cellular protection observed in several model organisms. The high degree of conservation of these novel yeast longevity determinants in other species raises the possibility that their role in senescence might be conserved.

  15. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    PubMed

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  16. Novel mutations in the homogentisate 1,2 dioxygenase gene identified in Jordanian patients with alkaptonuria.

    PubMed

    Al-sbou, Mohammed

    2012-06-01

    This study was conducted to identify mutations in the homogentisate 1,2 dioxygenase gene (HGD) in alkaptonuria patients among Jordanian population. Blood samples were collected from four alkaptonuria patients, four carriers, and two healthy volunteers. DNA was isolated from peripheral blood. All 14 exons of the HGD gene were amplified using the polymerase chain reaction (PCR) technique. The PCR products were then purified and analyzed by sequencing. Five mutations were identified in our samples. Four of them were novel C1273A, T1046G, 551-552insG, T533G and had not been previously reported, and one mutation T847C has been described before. The types of mutations identified were two missense mutations, one splice site mutation, one frameshift mutation, and one polymorphism. We present the first molecular study of the HGD gene in Jordanian alkaptonuria patients. This study provides valuable information about the molecular basis of alkaptonuria in Jordanian population.

  17. Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS.

    PubMed

    Kwon, Ji-Sun; Kim, Jihye; Nam, Dougu; Kim, Sangsoo

    2012-06-01

    Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.

  18. A systems-wide comparison of red rice (Oryza longistaminata) tissues identifies rhizome specific genes and proteins that are targets for cultivated rice improvement

    PubMed Central

    2014-01-01

    Background The rhizome, the original stem of land plants, enables species to invade new territory and is a critical component of perenniality, especially in grasses. Red rice (Oryza longistaminata) is a perennial wild rice species with many valuable traits that could be used to improve cultivated rice cultivars, including rhizomatousness, disease resistance and drought tolerance. Despite these features, little is known about the molecular mechanisms that contribute to rhizome growth, development and function in this plant. Results We used an integrated approach to compare the transcriptome, proteome and metabolome of the rhizome to other tissues of red rice. 116 Gb of transcriptome sequence was obtained from various tissues and used to identify rhizome-specific and preferentially expressed genes, including transcription factors and hormone metabolism and stress response-related genes. Proteomics and metabolomics approaches identified 41 proteins and more than 100 primary metabolites and plant hormones with rhizome preferential accumulation. Of particular interest was the identification of a large number of gene transcripts from Magnaportha oryzae, the fungus that causes rice blast disease in cultivated rice, even though the red rice plants showed no sign of disease. Conclusions A significant set of genes, proteins and metabolites appear to be specifically or preferentially expressed in the rhizome of O. longistaminata. The presence of M. oryzae gene transcripts at a high level in apparently healthy plants suggests that red rice is resistant to this pathogen, and may be able to provide genes to cultivated rice that will enable resistance to rice blast disease. PMID:24521476

  19. Gene interactions in the DNA damage-response pathway identified by genome-wide RNA-interference analysis of synthetic lethality

    PubMed Central

    van Haaften, Gijs; Vastenhouw, Nadine L.; Nollen, Ellen A. A.; Plasterk, Ronald H. A.; Tijsterman, Marcel

    2004-01-01

    Here, we describe a systematic search for synthetic gene interactions in a multicellular organism, the nematode Caenorhabditis elegans. We established a high-throughput method to determine synthetic gene interactions by genome-wide RNA interference and identified genes that are required to protect the germ line against DNA double-strand breaks. Besides known DNA-repair proteins such as the C. elegans orthologs of TopBP1, RPA2, and RAD51, eight genes previously unassociated with a double-strand-break response were identified. Knockdown of these genes increased sensitivity to ionizing radiation and camptothecin and resulted in increased chromosomal nondisjunction. All genes have human orthologs that may play a role in human carcinogenesis. PMID:15326288

  20. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    PubMed Central

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  1. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence.

    PubMed

    Sniekers, Suzanne; Stringer, Sven; Watanabe, Kyoko; Jansen, Philip R; Coleman, Jonathan R I; Krapohl, Eva; Taskesen, Erdogan; Hammerschlag, Anke R; Okbay, Aysu; Zabaneh, Delilah; Amin, Najaf; Breen, Gerome; Cesarini, David; Chabris, Christopher F; Iacono, William G; Ikram, M Arfan; Johannesson, Magnus; Koellinger, Philipp; Lee, James J; Magnusson, Patrik K E; McGue, Matt; Miller, Mike B; Ollier, William E R; Payton, Antony; Pendleton, Neil; Plomin, Robert; Rietveld, Cornelius A; Tiemeier, Henning; van Duijn, Cornelia M; Posthuma, Danielle

    2017-07-01

    Intelligence is associated with important economic and health-related life outcomes. Despite intelligence having substantial heritability (0.54) and a confirmed polygenic nature, initial genetic studies were mostly underpowered. Here we report a meta-analysis for intelligence of 78,308 individuals. We identify 336 associated SNPs (METAL P < 5 × 10 -8 ) in 18 genomic loci, of which 15 are new. Around half of the SNPs are located inside a gene, implicating 22 genes, of which 11 are new findings. Gene-based analyses identified an additional 30 genes (MAGMA P < 2.73 × 10 -6 ), of which all but one had not been implicated previously. We show that the identified genes are predominantly expressed in brain tissue, and pathway analysis indicates the involvement of genes regulating cell development (MAGMA competitive P = 3.5 × 10 -6 ). Despite the well-known difference in twin-based heritability for intelligence in childhood (0.45) and adulthood (0.80), we show substantial genetic correlation (r g = 0.89, LD score regression P = 5.4 × 10 -29 ). These findings provide new insight into the genetic architecture of intelligence.

  2. Transcriptional profiling identifies differentially expressed genes in developing turkey skeletal muscle

    PubMed Central

    2011-01-01

    involved in extracellular matrix regulation, cell death/apoptosis, and calcium signaling/muscle function, as well as genes with miscellaneous function was confirmed by qPCR. Conclusions The current study identified gene pathways and uncovered novel genes important in turkey muscle growth and development. Future experiments will focus further on several of these candidate genes and the expression and mechanism of action of their protein products. PMID:21385442

  3. Novel application of the published kinase inhibitor set to identify therapeutic targets and pathways in triple negative breast cancer subtypes

    PubMed Central

    Phamduy, Theresa B.; Chrisey, Douglas B.

    2017-01-01

    Triple negative breast cancers (TNBCs) have high recurrence and metastasis rates. Acquisition of a mesenchymal morphology and phenotype in addition to driving migration is a consequential process that promotes metastasis. Although some kinases are known to regulate a mesenchymal phenotype, the role for a substantial portion of the human kinome remains uncharacterized. Here we evaluated the Published Kinase Inhibitor Set (PKIS) and screened a panel of TNBC cell lines to evaluate the compounds’ effects on a mesenchymal phenotype. Our screen identified 36 hits representative of twelve kinase inhibitor chemotypes based on reversal of the mesenchymal cell morphology, which was then prioritized to twelve compounds based on gene expression and migratory behavior analyses. We selected the most active compound and confirmed mesenchymal reversal on transcript and protein levels with qRT-PCR and Western Blot. Finally, we utilized a kinase array to identify candidate kinases responsible for the EMT reversal. This investigation shows the novel application to identify previously unrecognized kinase pathways and targets in acquisition of a mesenchymal TNBC phenotype that warrant further investigation. Future studies will examine specific roles of the kinases in mechanisms responsible for acquisition of the mesenchymal and/or migratory phenotype. PMID:28771473

  4. Consistency of gene starts among Burkholderia genomes

    PubMed Central

    2011-01-01

    Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528

  5. Computational modeling identifies key gene regulatory interactions underlying phenobarbital-mediated tumor promotion

    PubMed Central

    Luisier, Raphaëlle; Unterberger, Elif B.; Goodman, Jay I.; Schwarz, Michael; Moggs, Jonathan; Terranova, Rémi; van Nimwegen, Erik

    2014-01-01

    Gene regulatory interactions underlying the early stages of non-genotoxic carcinogenesis are poorly understood. Here, we have identified key candidate regulators of phenobarbital (PB)-mediated mouse liver tumorigenesis, a well-characterized model of non-genotoxic carcinogenesis, by applying a new computational modeling approach to a comprehensive collection of in vivo gene expression studies. We have combined our previously developed motif activity response analysis (MARA), which models gene expression patterns in terms of computationally predicted transcription factor binding sites with singular value decomposition (SVD) of the inferred motif activities, to disentangle the roles that different transcriptional regulators play in specific biological pathways of tumor promotion. Furthermore, transgenic mouse models enabled us to identify which of these regulatory activities was downstream of constitutive androstane receptor and β-catenin signaling, both crucial components of PB-mediated liver tumorigenesis. We propose novel roles for E2F and ZFP161 in PB-mediated hepatocyte proliferation and suggest that PB-mediated suppression of ESR1 activity contributes to the development of a tumor-prone environment. Our study shows that combining MARA with SVD allows for automated identification of independent transcription regulatory programs within a complex in vivo tissue environment and provides novel mechanistic insights into PB-mediated hepatocarcinogenesis. PMID:24464994

  6. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing.

    PubMed

    Mandelker, Diana; Schmidt, Ryan J; Ankala, Arunkanth; McDonald Gibson, Kristin; Bowser, Mark; Sharma, Himanshu; Duffy, Elizabeth; Hegde, Madhuri; Santani, Avni; Lebo, Matthew; Funke, Birgit

    2016-12-01

    Next-generation sequencing (NGS) is now routinely used to interrogate large sets of genes in a diagnostic setting. Regions of high sequence homology continue to be a major challenge for short-read technologies and can lead to false-positive and false-negative diagnostic errors. At the scale of whole-exome sequencing (WES), laboratories may be limited in their knowledge of genes and regions that pose technical hurdles due to high homology. We have created an exome-wide resource that catalogs highly homologous regions that is tailored toward diagnostic applications. This resource was developed using a mappability-based approach tailored to current Sanger and NGS protocols. Gene-level and exon-level lists delineate regions that are difficult or impossible to analyze via standard NGS. These regions are ranked by degree of affectedness, annotated for medical relevance, and classified by the type of homology (within-gene, different functional gene, known pseudogene, uncharacterized noncoding region). Additionally, we provide a list of exons that cannot be analyzed by short-amplicon Sanger sequencing. This resource can help guide clinical test design, supplemental assay implementation, and results interpretation in the context of high homology.Genet Med 18 12, 1282-1289.

  7. A fast and high performance multiple data integration algorithm for identifying human disease genes

    PubMed Central

    2015-01-01

    Background Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. Results In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. Conclusions The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms. PMID:26399620

  8. Epigenomic Elements Analyses for Promoters Identify ESRRG as a New Susceptibility Gene for Obesity-related Traits

    PubMed Central

    Dong, Shan-Shan; Guo, Yan; Zhu, Dong-Li; Chen, Xiao-Feng; Wu, Xiao-Ming; Shen, Hui; Chen, Xiang-Ding; Tan, Li-Jun; Tian, Qing; Deng, Hong-Wen; Yang, Tie-Lin

    2016-01-01

    OBJECTIVES With ENCODE epigenomic data and results from published genome-wide association studies (GWASs), we aimed to find regulatory signatures of obesity genes and discover novel susceptibility genes. METHODS Obesity genes were obtained from public GWASs databases and their promoters were annotated based on the regulatory elements information. Significantly enriched or depleted epigenomic elements in the promoters of obesity genes were evaluated and all human genes were then prioritized according to the existence of the selected elements to predict new candidate genes. Top ranked genes were subsequently applied to validate their associations with obesity-related traits in three independent in-house GWASs samples. RESULTS We identified RAD21 and EZH2 as over-represented, STAT2 and IRF3 as depleted transcription factors. Histone modification of H3K9me3 and chromatin state segmentation of “poised promoter” and “repressed” were overrepresented. All genes were prioritized and we selected the top five genes for validation at population level. Combined results from the three GWASs samples, rs7522101 in ESRRG remained significantly associated with BMI after multiple testing corrections (P = 7.25 × 10−5). It was also associated with β-cell function (P = 1.99 × 10−3) and fasting glucose level (P < 0.05) in the meta-analyses of glucose and insulin-related traits consortium (MAGIC) dataset. CONCLUSIONS In summary, we identified epigenomic characteristics for obesity genes and suggested ESRRG as a novel obesity susceptibility gene. PMID:27113491

  9. Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data.

    PubMed

    Lee, Hyeonjeong; Shin, Miyoung

    2017-01-01

    The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into

  10. In-Silico Integration Approach to Identify a Key miRNA Regulating a Gene Network in Aggressive Prostate Cancer

    PubMed Central

    Colaprico, Antonio; Bontempi, Gianluca; Castiglioni, Isabella

    2018-01-01

    Like other cancer diseases, prostate cancer (PC) is caused by the accumulation of genetic alterations in the cells that drives malignant growth. These alterations are revealed by gene profiling and copy number alteration (CNA) analysis. Moreover, recent evidence suggests that also microRNAs have an important role in PC development. Despite efforts to profile PC, the alterations (gene, CNA, and miRNA) and biological processes that correlate with disease development and progression remain partially elusive. Many gene signatures proposed as diagnostic or prognostic tools in cancer poorly overlap. The identification of co-expressed genes, that are functionally related, can identify a core network of genes associated with PC with a better reproducibility. By combining different approaches, including the integration of mRNA expression profiles, CNAs, and miRNA expression levels, we identified a gene signature of four genes overlapping with other published gene signatures and able to distinguish, in silico, high Gleason-scored PC from normal human tissue, which was further enriched to 19 genes by gene co-expression analysis. From the analysis of miRNAs possibly regulating this network, we found that hsa-miR-153 was highly connected to the genes in the network. Our results identify a four-gene signature with diagnostic and prognostic value in PC and suggest an interesting gene network that could play a key regulatory role in PC development and progression. Furthermore, hsa-miR-153, controlling this network, could be a potential biomarker for theranostics in high Gleason-scored PC. PMID:29562723

  11. Genome-wide significant localization for working and spatial memory: Identifying genes for psychosis using models of cognition.

    PubMed

    Knowles, Emma E M; Carless, Melanie A; de Almeida, Marcio A A; Curran, Joanne E; McKay, D Reese; Sprooten, Emma; Dyer, Thomas D; Göring, Harald H; Olvera, Rene; Fox, Peter; Almasy, Laura; Duggirala, Ravi; Kent, Jack W; Blangero, John; Glahn, David C

    2014-01-01

    It is well established that risk for developing psychosis is largely mediated by the influence of genes, but identifying precisely which genes underlie that risk has been problematic. Focusing on endophenotypes, rather than illness risk, is one solution to this problem. Impaired cognition is a well-established endophenotype of psychosis. Here we aimed to characterize the genetic architecture of cognition using phenotypically detailed models as opposed to relying on general IQ or individual neuropsychological measures. In so doing we hoped to identify genes that mediate cognitive ability, which might also contribute to psychosis risk. Hierarchical factor models of genetically clustered cognitive traits were subjected to linkage analysis followed by QTL region-specific association analyses in a sample of 1,269 Mexican American individuals from extended pedigrees. We identified four genome wide significant QTLs, two for working and two for spatial memory, and a number of plausible and interesting candidate genes. The creation of detailed models of cognition seemingly enhanced the power to detect genetic effects on cognition and provided a number of possible candidate genes for psychosis. © 2013 Wiley Periodicals, Inc.

  12. Microarray analysis and scale-free gene networks identify candidate regulators in drought-stressed roots of loblolly pine (P. taeda L.)

    PubMed Central

    2011-01-01

    Background Global transcriptional analysis of loblolly pine (Pinus taeda L.) is challenging due to limited molecular tools. PtGen2, a 26,496 feature cDNA microarray, was fabricated and used to assess drought-induced gene expression in loblolly pine propagule roots. Statistical analysis of differential expression and weighted gene correlation network analysis were used to identify drought-responsive genes and further characterize the molecular basis of drought tolerance in loblolly pine. Results Microarrays were used to interrogate root cDNA populations obtained from 12 genotype × treatment combinations (four genotypes, three watering regimes). Comparison of drought-stressed roots with roots from the control treatment identified 2445 genes displaying at least a 1.5-fold expression difference (false discovery rate = 0.01). Genes commonly associated with drought response in pine and other plant species, as well as a number of abiotic and biotic stress-related genes, were up-regulated in drought-stressed roots. Only 76 genes were identified as differentially expressed in drought-recovered roots, indicating that the transcript population can return to the pre-drought state within 48 hours. Gene correlation analysis predicts a scale-free network topology and identifies eleven co-expression modules that ranged in size from 34 to 938 members. Network topological parameters identified a number of central nodes (hubs) including those with significant homology (E-values ≤ 2 × 10-30) to 9-cis-epoxycarotenoid dioxygenase, zeatin O-glucosyltransferase, and ABA-responsive protein. Identified hubs also include genes that have been associated previously with osmotic stress, phytohormones, enzymes that detoxify reactive oxygen species, and several genes of unknown function. Conclusion PtGen2 was used to evaluate transcriptome responses in loblolly pine and was leveraged to identify 2445 differentially expressed genes responding to severe drought stress in roots. Many of the

  13. In Silico Analysis Identifies a Novel Role for Androgens in the Regulation of Human Endometrial Apoptosis

    PubMed Central

    Marshall, Elaine; Lowrey, Jacqueline; MacPherson, Sheila; Maybin, Jacqueline A.; Collins, Frances; Critchley, Hilary O. D.

    2011-01-01

    Context: The endometrium is a multicellular, steroid-responsive tissue that undergoes dynamic remodeling every menstrual cycle in preparation for implantation and, in absence of pregnancy, menstruation. Androgen receptors are present in the endometrium. Objective: The objective of the study was to investigate the impact of androgens on human endometrial stromal cells (hESC). Design: Bioinformatics was used to identify an androgen-regulated gene set and processes associated with their function. Regulation of target genes and impact of androgens on cell function were validated using primary hESC. Setting: The study was conducted at the University Research Institute. Patients: Endometrium was collected from women with regular menses; tissues were used for recovery of cells, total mRNA, or protein and for immunohistochemistry. Results: A new endometrial androgen target gene set (n = 15) was identified. Bioinformatics revealed 12 of these genes interacted in one pathway and identified an association with control of cell survival. Dynamic androgen-dependent changes in expression of the gene set were detected in hESC with nine significantly down-regulated at 2 and/or 8 h. Treatment of hESC with dihydrotestosterone reduced staurosporine-induced apoptosis and cell migration/proliferation. Conclusions: Rigorous in silico analysis resulted in identification of a group of androgen-regulated genes expressed in human endometrium. Pathway analysis and functional assays suggest androgen-dependent changes in gene expression may have a significant impact on stromal cell proliferation, migration, and survival. These data provide the platform for further studies on the role of circulatory or local androgens in the regulation of endometrial function and identify androgens as candidates in the pathogenesis of common endometrial disorders including polycystic ovarian syndrome, cancer, and endometriosis. PMID:21865353

  14. Transcriptome analysis identifies genes involved in ethanol response of Saccharomyces cerevisiae in Agave tequilana juice.

    PubMed

    Ramirez-Córdova, Jesús; Drnevich, Jenny; Madrigal-Pulido, Jaime Alberto; Arrizon, Javier; Allen, Kirk; Martínez-Velázquez, Moisés; Alvarez-Maya, Ikuri

    2012-08-01

    During ethanol fermentation, yeast cells are exposed to stress due to the accumulation of ethanol, cell growth is altered and the output of the target product is reduced. For Agave beverages, like tequila, no reports have been published on the global gene expression under ethanol stress. In this work, we used microarray analysis to identify Saccharomyces cerevisiae genes involved in the ethanol response. Gene expression of a tequila yeast strain of S. cerevisiae (AR5) was explored by comparing global gene expression with that of laboratory strain S288C, both after ethanol exposure. Additionally, we used two different culture conditions, cells grown in Agave tequilana juice as a natural fermentation media or grown in yeast-extract peptone dextrose as artificial media. Of the 6368 S. cerevisiae genes in the microarray, 657 genes were identified that had different expression responses to ethanol stress due to strain and/or media. A cluster of 28 genes was found over-expressed specifically in the AR5 tequila strain that could be involved in the adaptation to tequila yeast fermentation, 14 of which are unknown such as yor343c, ylr162w, ygr182c, ymr265c, yer053c-a or ydr415c. These could be the most suitable genes for transforming tequila yeast to increase ethanol tolerance in the tequila fermentation process. Other genes involved in response to stress (RFC4, TSA1, MLH1, PAU3, RAD53) or transport (CYB2, TIP20, QCR9) were expressed in the same cluster. Unknown genes could be good candidates for the development of recombinant yeasts with ethanol tolerance for use in industrial tequila fermentation.

  15. Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.

    PubMed

    Jiang, Xue; Zhang, Han; Quan, Xiongwen

    2016-01-01

    Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.

  16. Ecology of antibiotic resistance genes: characterization of enterococci from houseflies collected in food settings.

    PubMed

    Macovei, Lilia; Zurek, Ludek

    2006-06-01

    In this project, enterococci from the digestive tracts of 260 houseflies (Musca domestica L.) collected from five restaurants were characterized. Houseflies frequently (97% of the flies were positive) carried enterococci (mean, 3.1 x 10(3) CFU/fly). Using multiplex PCR, 205 of 355 randomly selected enterococcal isolates were identified and characterized. The majority of these isolates were Enterococcus faecalis (88.2%); in addition, 6.8% were E. faecium, and 4.9% were E. casseliflavus. E. faecalis isolates were phenotypically resistant to tetracycline (66.3%), erythromycin (23.8%), streptomycin (11.6%), ciprofloxacin (9.9%), and kanamycin (8.3%). Tetracycline resistance in E. faecalis was encoded by tet(M) (65.8%), tet(O) (1.7%), and tet(W) (0.8%). The majority (78.3%) of the erythromycin-resistant E. faecalis isolates carried erm(B). The conjugative transposon Tn916 and members of the Tn916/Tn1545 family were detected in 30.2% and 34.6% of the identified isolates, respectively. E. faecalis carried virulence genes, including a gelatinase gene (gelE; 70.7%), an aggregation substance gene (asa1; 33.2%), an enterococcus surface protein gene (esp; 8.8%), and a cytolysin gene (cylA; 8.8%). Phenotypic assays showed that 91.4% of the isolates with the gelE gene were gelatinolytic and that 46.7% of the isolates with the asa1 gene aggregated. All isolates with the cylA gene were hemolytic on human blood. This study showed that houseflies in food-handling and -serving facilities carry antibiotic-resistant and potentially virulent enterococci that have the capacity for horizontal transfer of antibiotic resistance genes to other bacteria.

  17. Next-generation DNA sequencing identifies novel gene variants and pathways involved in specific language impairment.

    PubMed

    Chen, Xiaowei Sylvia; Reader, Rose H; Hoischen, Alexander; Veltman, Joris A; Simpson, Nuala H; Francks, Clyde; Newbury, Dianne F; Fisher, Simon E

    2017-04-25

    A significant proportion of children have unexplained problems acquiring proficient linguistic skills despite adequate intelligence and opportunity. Developmental language disorders are highly heritable with substantial societal impact. Molecular studies have begun to identify candidate loci, but much of the underlying genetic architecture remains undetermined. We performed whole-exome sequencing of 43 unrelated probands affected by severe specific language impairment, followed by independent validations with Sanger sequencing, and analyses of segregation patterns in parents and siblings, to shed new light on aetiology. By first focusing on a pre-defined set of known candidates from the literature, we identified potentially pathogenic variants in genes already implicated in diverse language-related syndromes, including ERC1, GRIN2A, and SRPX2. Complementary analyses suggested novel putative candidates carrying validated variants which were predicted to have functional effects, such as OXR1, SCN9A and KMT2D. We also searched for potential "multiple-hit" cases; one proband carried a rare AUTS2 variant in combination with a rare inherited haplotype affecting STARD9, while another carried a novel nonsynonymous variant in SEMA6D together with a rare stop-gain in SYNPR. On broadening scope to all rare and novel variants throughout the exomes, we identified biological themes that were enriched for such variants, including microtubule transport and cytoskeletal regulation.

  18. Next-generation DNA sequencing identifies novel gene variants and pathways involved in specific language impairment

    PubMed Central

    Chen, Xiaowei Sylvia; Reader, Rose H.; Hoischen, Alexander; Veltman, Joris A.; Simpson, Nuala H.; Francks, Clyde; Newbury, Dianne F.; Fisher, Simon E.

    2017-01-01

    A significant proportion of children have unexplained problems acquiring proficient linguistic skills despite adequate intelligence and opportunity. Developmental language disorders are highly heritable with substantial societal impact. Molecular studies have begun to identify candidate loci, but much of the underlying genetic architecture remains undetermined. We performed whole-exome sequencing of 43 unrelated probands affected by severe specific language impairment, followed by independent validations with Sanger sequencing, and analyses of segregation patterns in parents and siblings, to shed new light on aetiology. By first focusing on a pre-defined set of known candidates from the literature, we identified potentially pathogenic variants in genes already implicated in diverse language-related syndromes, including ERC1, GRIN2A, and SRPX2. Complementary analyses suggested novel putative candidates carrying validated variants which were predicted to have functional effects, such as OXR1, SCN9A and KMT2D. We also searched for potential “multiple-hit” cases; one proband carried a rare AUTS2 variant in combination with a rare inherited haplotype affecting STARD9, while another carried a novel nonsynonymous variant in SEMA6D together with a rare stop-gain in SYNPR. On broadening scope to all rare and novel variants throughout the exomes, we identified biological themes that were enriched for such variants, including microtubule transport and cytoskeletal regulation. PMID:28440294

  19. Genome-wide association study identifies the SERPINB gene cluster as a susceptibility locus for food allergy.

    PubMed

    Marenholz, Ingo; Grosche, Sarah; Kalb, Birgit; Rüschendorf, Franz; Blümchen, Katharina; Schlags, Rupert; Harandi, Neda; Price, Mareike; Hansen, Gesine; Seidenberg, Jürgen; Röblitz, Holger; Yürek, Songül; Tschirner, Sebastian; Hong, Xiumei; Wang, Xiaobin; Homuth, Georg; Schmidt, Carsten O; Nöthen, Markus M; Hübner, Norbert; Niggemann, Bodo; Beyer, Kirsten; Lee, Young-Ae

    2017-10-20

    Genetic factors and mechanisms underlying food allergy are largely unknown. Due to heterogeneity of symptoms a reliable diagnosis is often difficult to make. Here, we report a genome-wide association study on food allergy diagnosed by oral food challenge in 497 cases and 2387 controls. We identify five loci at genome-wide significance, the clade B serpin (SERPINB) gene cluster at 18q21.3, the cytokine gene cluster at 5q31.1, the filaggrin gene, the C11orf30/LRRC32 locus, and the human leukocyte antigen (HLA) region. Stratifying the results for the causative food demonstrates that association of the HLA locus is peanut allergy-specific whereas the other four loci increase the risk for any food allergy. Variants in the SERPINB gene cluster are associated with SERPINB10 expression in leukocytes. Moreover, SERPINB genes are highly expressed in the esophagus. All identified loci are involved in immunological regulation or epithelial barrier function, emphasizing the role of both mechanisms in food allergy.

  20. Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies

    PubMed Central

    2014-01-01

    Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods. PMID:24708878