Sample records for identifies gene networks

  1. Identifying gene networks underlying the neurobiology of ethanol and alcoholism.

    PubMed

    Wolen, Aaron R; Miles, Michael F

    2012-01-01

    For complex disorders such as alcoholism, identifying the genes linked to these diseases and their specific roles is difficult. Traditional genetic approaches, such as genetic association studies (including genome-wide association studies) and analyses of quantitative trait loci (QTLs) in both humans and laboratory animals already have helped identify some candidate genes. However, because of technical obstacles, such as the small impact of any individual gene, these approaches only have limited effectiveness in identifying specific genes that contribute to complex diseases. The emerging field of systems biology, which allows for analyses of entire gene networks, may help researchers better elucidate the genetic basis of alcoholism, both in humans and in animal models. Such networks can be identified using approaches such as high-throughput molecular profiling (e.g., through microarray-based gene expression analyses) or strategies referred to as genetical genomics, such as the mapping of expression QTLs (eQTLs). Characterization of gene networks can shed light on the biological pathways underlying complex traits and provide the functional context for identifying those genes that contribute to disease development.

  2. Identifying key genes in rheumatoid arthritis by weighted gene co-expression network analysis.

    PubMed

    Ma, Chunhui; Lv, Qi; Teng, Songsong; Yu, Yinxian; Niu, Kerun; Yi, Chengqin

    2017-08-01

    This study aimed to identify rheumatoid arthritis (RA) related genes based on microarray data using the WGCNA (weighted gene co-expression network analysis) method. Two gene expression profile datasets GSE55235 (10 RA samples and 10 healthy controls) and GSE77298 (16 RA samples and seven healthy controls) were downloaded from Gene Expression Omnibus database. Characteristic genes were identified using metaDE package. WGCNA was used to find disease-related networks based on gene expression correlation coefficients, and module significance was defined as the average gene significance of all genes used to assess the correlation between the module and RA status. Genes in the disease-related gene co-expression network were subject to functional annotation and pathway enrichment analysis using Database for Annotation Visualization and Integrated Discovery. Characteristic genes were also mapped to the Connectivity Map to screen small molecules. A total of 599 characteristic genes were identified. For each dataset, characteristic genes in the green, red and turquoise modules were most closely associated with RA, with gene numbers of 54, 43 and 79, respectively. These genes were enriched in totally enriched in 17 Gene Ontology terms, mainly related to immune response (CD97, FYB, CXCL1, IKBKE, CCR1, etc.), inflammatory response (CD97, CXCL1, C3AR1, CCR1, LYZ, etc.) and homeostasis (C3AR1, CCR1, PLN, CCL19, PPT1, etc.). Two small-molecule drugs sanguinarine and papaverine were predicted to have a therapeutic effect against RA. Genes related to immune response, inflammatory response and homeostasis presumably have critical roles in RA pathogenesis. Sanguinarine and papaverine have a potential therapeutic effect against RA. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.

  3. ICan: an integrated co-alteration network to identify ovarian cancer-related genes.

    PubMed

    Zhou, Yuanshuai; Liu, Yongjing; Li, Kening; Zhang, Rui; Qiu, Fujun; Zhao, Ning; Xu, Yan

    2015-01-01

    Over the last decade, an increasing number of integrative studies on cancer-related genes have been published. Integrative analyses aim to overcome the limitation of a single data type, and provide a more complete view of carcinogenesis. The vast majority of these studies used sample-matched data of gene expression and copy number to investigate the impact of copy number alteration on gene expression, and to predict and prioritize candidate oncogenes and tumor suppressor genes. However, correlations between genes were neglected in these studies. Our work aimed to evaluate the co-alteration of copy number, methylation and expression, allowing us to identify cancer-related genes and essential functional modules in cancer. We built the Integrated Co-alteration network (ICan) based on multi-omics data, and analyzed the network to uncover cancer-related genes. After comparison with random networks, we identified 155 ovarian cancer-related genes, including well-known (TP53, BRCA1, RB1 and PTEN) and also novel cancer-related genes, such as PDPN and EphA2. We compared the results with a conventional method: CNAmet, and obtained a significantly better area under the curve value (ICan: 0.8179, CNAmet: 0.5183). In this paper, we describe a framework to find cancer-related genes based on an Integrated Co-alteration network. Our results proved that ICan could precisely identify candidate cancer genes and provide increased mechanistic understanding of carcinogenesis. This work suggested a new research direction for biological network analyses involving multi-omics data.

  4. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer.

    PubMed

    Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia

    2015-06-01

    To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. ICan: An Integrated Co-Alteration Network to Identify Ovarian Cancer-Related Genes

    PubMed Central

    Zhou, Yuanshuai; Liu, Yongjing; Li, Kening; Zhang, Rui; Qiu, Fujun; Zhao, Ning; Xu, Yan

    2015-01-01

    Background Over the last decade, an increasing number of integrative studies on cancer-related genes have been published. Integrative analyses aim to overcome the limitation of a single data type, and provide a more complete view of carcinogenesis. The vast majority of these studies used sample-matched data of gene expression and copy number to investigate the impact of copy number alteration on gene expression, and to predict and prioritize candidate oncogenes and tumor suppressor genes. However, correlations between genes were neglected in these studies. Our work aimed to evaluate the co-alteration of copy number, methylation and expression, allowing us to identify cancer-related genes and essential functional modules in cancer. Results We built the Integrated Co-alteration network (ICan) based on multi-omics data, and analyzed the network to uncover cancer-related genes. After comparison with random networks, we identified 155 ovarian cancer-related genes, including well-known (TP53, BRCA1, RB1 and PTEN) and also novel cancer-related genes, such as PDPN and EphA2. We compared the results with a conventional method: CNAmet, and obtained a significantly better area under the curve value (ICan: 0.8179, CNAmet: 0.5183). Conclusion In this paper, we describe a framework to find cancer-related genes based on an Integrated Co-alteration network. Our results proved that ICan could precisely identify candidate cancer genes and provide increased mechanistic understanding of carcinogenesis. This work suggested a new research direction for biological network analyses involving multi-omics data. PMID:25803614

  6. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.

    PubMed

    Wei, Pi-Jing; Zhang, Di; Xia, Junfeng; Zheng, Chun-Hou

    2016-12-23

    Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each individual. Correspondingly, one of the key challenges is to pinpoint functional driver mutations or pathways, which contributes to tumorigenesis, from millions of functional neutral passenger mutations. In this paper, in order to identify driver genes effectively, we applied a generalized additive model to mutation profiles to filter genes with long length and constructed a new gene-gene interaction network. Then we integrated the mutation data and expression data into the gene-gene interaction network. Lastly, greedy algorithm was used to prioritize candidate driver genes from the integrated data. We named the proposed method Length-Net-Driver (LNDriver). Experiments on three TCGA datasets, i.e., head and neck squamous cell carcinoma, kidney renal clear cell carcinoma and thyroid carcinoma, demonstrated that the proposed method was effective. Also, it can identify not only frequently mutated drivers, but also rare candidate driver genes.

  7. Gene Network Construction from Microarray Data Identifies a Key Network Module and Several Candidate Hub Genes in Age-Associated Spatial Learning Impairment

    PubMed Central

    Uddin, Raihan; Singh, Shiva M.

    2017-01-01

    As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in “learning and memory” related functions and pathways. Subsequent differential network analysis of this “learning and memory” module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken

  8. Gene Network Construction from Microarray Data Identifies a Key Network Module and Several Candidate Hub Genes in Age-Associated Spatial Learning Impairment.

    PubMed

    Uddin, Raihan; Singh, Shiva M

    2017-01-01

    As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in "learning and memory" related functions and pathways. Subsequent differential network analysis of this "learning and memory" module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they

  9. Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network.

    PubMed

    Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui

    2017-10-01

    The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.

  10. Network-Based Integration of GWAS and Gene Expression Identifies a HOX-Centric Network Associated with Serous Ovarian Cancer Risk.

    PubMed

    Kar, Siddhartha P; Tyrer, Jonathan P; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S; Cramer, Daniel; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K; Kelemen, Linda E; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Iain A; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B; Narod, Steven A; Nedergaard, Lotte; Ness, Roberta B; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schildkraut, Joellen M; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Sucheston-Campbell, Lara E; Tangen, Ingvild L; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S; van Altena, Anne M; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A; Monteiro, Alvaro N A; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P

    2015-10-01

    Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by coexpression may also be enriched for additional EOC risk associations. We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly coexpressed with each selected TF gene in the unified microarray dataset of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this dataset were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P < 0.05 and FDR < 0.05). These results were replicated (P < 0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Network analysis integrating large, context-specific datasets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. ©2015 American Association for Cancer Research.

  11. Network-based integration of GWAS and gene expression identifies a HOX-centric network associated with serous ovarian cancer risk

    PubMed Central

    Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.

    2015-01-01

    Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509

  12. A gene co-expression network model identifies yield-related vicinity networks in Jatropha curcas shoot system.

    PubMed

    Govender, Nisha; Senan, Siju; Mohamed-Hussein, Zeti-Azura; Wickneswari, Ratnam

    2018-06-15

    The plant shoot system consists of reproductive organs such as inflorescences, buds and fruits, and the vegetative leaves and stems. In this study, the reproductive part of the Jatropha curcas shoot system, which includes the aerial shoots, shoots bearing the inflorescence and inflorescence were investigated in regard to gene-to-gene interactions underpinning yield-related biological processes. An RNA-seq based sequencing of shoot tissues performed on an Illumina HiSeq. 2500 platform generated 18 transcriptomes. Using the reference genome-based mapping approach, a total of 64 361 genes was identified in all samples and the data was annotated against the non-redundant database by the BLAST2GO Pro. Suite. After removing the outlier genes and samples, a total of 12 734 genes across 17 samples were subjected to gene co-expression network construction using petal, an R library. A gene co-expression network model built with scale-free and small-world properties extracted four vicinity networks (VNs) with putative involvement in yield-related biological processes as follow; heat stress tolerance, floral and shoot meristem differentiation, biosynthesis of chlorophyll molecules and laticifers, cell wall metabolism and epigenetic regulations. Our VNs revealed putative key players that could be adapted in breeding strategies for J. curcas shoot system improvements.

  13. A computational approach to identify cellular heterogeneity and tissue-specific gene regulatory networks.

    PubMed

    Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees

    2018-06-07

    The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.

  14. Gene Network for Identifying the Entropy Changes of Different Modules in Pediatric Sepsis.

    PubMed

    Yang, Jing; Zhang, Pingli; Wang, Lumin

    2016-01-01

    Pediatric sepsis is a disease that threatens life of children. The incidence of pediatric sepsis is higher in developing countries due to various reasons, such as insufficient immunization and nutrition, water and air pollution, etc. Exploring the potential genes via different methods is of significance for the prevention and treatment of pediatric sepsis. This study aimed to identify potential genes associated with pediatric sepsis utilizing analysis of gene network and entropy. The mRNA expression in the blood samples collected from 20 septic children and 30 healthy controls was quantified by using Affymetrix HG-U133A microarray. Two condition-specific protein-protein interaction networks (PINs), one for the healthy control and the other one for the children with sepsis, were deduced by combining the fundamental human PINs with gene expression profiles in the two phenotypes. Subsequently, distinct modules from the two conditional networks were extracted by adopting a maximal clique-merging approach. Delta entropy (ΔS) was calculated between sepsis and control modules. Then, key genes displaying changes in gene composition were identified by matching the control and sepsis modules. Two objective modules were obtained, in which ribosomal protein RPL4 and RPL9 as well as TOP2A were probably considered as the key genes differentiating sepsis from healthy controls. According to previous reports and this work, TOP2A is the potential gene therapy target for pediatric sepsis. The relationship between pediatric sepsis and RPL4 and RPL9 needs further investigation. © 2016 The Author(s) Published by S. Karger AG, Basel.

  15. A novel method to identify pathways associated with renal cell carcinoma based on a gene co-expression network

    PubMed Central

    RUAN, XIYUN; LI, HONGYUN; LIU, BO; CHEN, JIE; ZHANG, SHIBAO; SUN, ZEQIANG; LIU, SHUANGQING; SUN, FAHAI; LIU, QINGYONG

    2015-01-01

    The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson’s correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson’s correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425

  16. Co-expression network analysis identified six hub genes in association with metastasis risk and prognosis in hepatocellular carcinoma

    PubMed Central

    Feng, Juerong; Zhou, Rui; Chang, Ying; Liu, Jing; Zhao, Qiu

    2017-01-01

    Hepatocellular carcinoma (HCC) has a high incidence and mortality worldwide, and its carcinogenesis and progression are influenced by a complex network of gene interactions. A weighted gene co-expression network was constructed to identify gene modules associated with the clinical traits in HCC (n = 214). Among the 13 modules, high correlation was only found between the red module and metastasis risk (classified by the HCC metastasis gene signature) (R2 = −0.74). Moreover, in the red module, 34 network hub genes for metastasis risk were identified, six of which (ABAT, AGXT, ALDH6A1, CYP4A11, DAO and EHHADH) were also hub nodes in the protein-protein interaction network of the module genes. Thus, a total of six hub genes were identified. In validation, all hub genes showed a negative correlation with the four-stage HCC progression (P for trend < 0.05) in the test set. Furthermore, in the training set, HCC samples with any hub gene lowly expressed demonstrated a higher recurrence rate and poorer survival rate (hazard ratios with 95% confidence intervals > 1). RNA-sequencing data of 142 HCC samples showed consistent results in the prognosis. Gene set enrichment analysis (GSEA) demonstrated that in the samples with any hub gene highly expressed, a total of 24 functional gene sets were enriched, most of which focused on amino acid metabolism and oxidation. In conclusion, co-expression network analysis identified six hub genes in association with HCC metastasis risk and prognosis, which might improve the prognosis by influencing amino acid metabolism and oxidation. PMID:28430663

  17. A Systems Approach Identifies Networks and Genes Linking Sleep and Stress: Implications for Neuropsychiatric Disorders

    PubMed Central

    Jiang, Peng; Scarpa, Joseph R.; Fitzpatrick, Karrie; Losic, Bojan; Gao, Vance D.; Hao, Ke; Summa, Keith C.; Yang, He S.; Zhang, Bin; Allada, Ravi; Vitaterna, Martha H.; Turek, Fred W.; Kasarskis, Andrew

    2016-01-01

    SUMMARY Sleep dysfunction and stress susceptibility are co-morbid complex traits, which often precede and predispose patients to a variety of neuropsychiatric diseases. Here, we demonstrate multi-level organizations of genetic landscape, candidate genes, and molecular networks associated with 328 stress and sleep traits in a chronically stressed population of 338 (C57BL/6J×A/J) F2 mice. We constructed striatal gene co-expression networks, revealing functionally and cell-type specific gene co-regulations important for stress and sleep. Using a composite ranking system, we identified network modules most relevant for 15 independent phenotypic categories, highlighting a mitochondria/synaptic module that links sleep and stress. The key network regulators of this module are overrepresented with genes implicated in neuropsychiatric diseases. Our work suggests the interplay between sleep, stress, and neuropathology emerge from genetic influences on gene expression and their collective organization through complex molecular networks, providing a framework to interrogate the mechanisms underlying sleep, stress susceptibility, and related neuropsychiatric disorders. PMID:25921536

  18. Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network.

    PubMed

    Li, Zhandong; An, Lifeng; Li, Hao; Wang, ShaoPeng; Zhou, You; Yuan, Fei; Li, Lin

    2016-05-05

    Nasopharyngeal cancer or nasopharyngeal carcinoma (NPC) is the most common cancer originating in the nasopharynx. The factors that induce nasopharyngeal cancer are still not clear. Additional information about the chemicals or genes related to nasopharyngeal cancer will promote a better understanding of the pathogenesis of this cancer and the factors that induce it. Thus, a computational method NPC-RGCP was proposed in this study to identify the possible relevant chemicals and genes based on the presently known chemicals and genes related to nasopharyngeal cancer. To extensively utilize the functional associations between proteins and chemicals, a heterogeneous network was constructed based on interactions of proteins and chemicals. The NPC-RGCP included two stages: the searching stage and the screening stage. The former stage is for finding new possible genes and chemicals in the heterogeneous network, while the latter stage is for screening and removing false discoveries and selecting the core genes and chemicals. As a result, five putative genes, CXCR3, IRF1, CDK1, GSTP1, and CDH2, and seven putative chemicals, iron, propionic acid, dimethyl sulfoxide, isopropanol, erythrose 4-phosphate, β-D-Fructose 6-phosphate, and flavin adenine dinucleotide, were identified by NPC-RGCP. Extensive analyses provided confirmation that the putative genes and chemicals have significant associations with nasopharyngeal cancer.

  19. Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network

    PubMed Central

    Li, Zhandong; An, Lifeng; Li, Hao; Wang, ShaoPeng; Zhou, You; Yuan, Fei; Li, Lin

    2016-01-01

    Nasopharyngeal cancer or nasopharyngeal carcinoma (NPC) is the most common cancer originating in the nasopharynx. The factors that induce nasopharyngeal cancer are still not clear. Additional information about the chemicals or genes related to nasopharyngeal cancer will promote a better understanding of the pathogenesis of this cancer and the factors that induce it. Thus, a computational method NPC-RGCP was proposed in this study to identify the possible relevant chemicals and genes based on the presently known chemicals and genes related to nasopharyngeal cancer. To extensively utilize the functional associations between proteins and chemicals, a heterogeneous network was constructed based on interactions of proteins and chemicals. The NPC-RGCP included two stages: the searching stage and the screening stage. The former stage is for finding new possible genes and chemicals in the heterogeneous network, while the latter stage is for screening and removing false discoveries and selecting the core genes and chemicals. As a result, five putative genes, CXCR3, IRF1, CDK1, GSTP1, and CDH2, and seven putative chemicals, iron, propionic acid, dimethyl sulfoxide, isopropanol, erythrose 4-phosphate, β-D-Fructose 6-phosphate, and flavin adenine dinucleotide, were identified by NPC-RGCP. Extensive analyses provided confirmation that the putative genes and chemicals have significant associations with nasopharyngeal cancer. PMID:27149165

  20. In-Silico Integration Approach to Identify a Key miRNA Regulating a Gene Network in Aggressive Prostate Cancer

    PubMed Central

    Colaprico, Antonio; Bontempi, Gianluca; Castiglioni, Isabella

    2018-01-01

    Like other cancer diseases, prostate cancer (PC) is caused by the accumulation of genetic alterations in the cells that drives malignant growth. These alterations are revealed by gene profiling and copy number alteration (CNA) analysis. Moreover, recent evidence suggests that also microRNAs have an important role in PC development. Despite efforts to profile PC, the alterations (gene, CNA, and miRNA) and biological processes that correlate with disease development and progression remain partially elusive. Many gene signatures proposed as diagnostic or prognostic tools in cancer poorly overlap. The identification of co-expressed genes, that are functionally related, can identify a core network of genes associated with PC with a better reproducibility. By combining different approaches, including the integration of mRNA expression profiles, CNAs, and miRNA expression levels, we identified a gene signature of four genes overlapping with other published gene signatures and able to distinguish, in silico, high Gleason-scored PC from normal human tissue, which was further enriched to 19 genes by gene co-expression analysis. From the analysis of miRNAs possibly regulating this network, we found that hsa-miR-153 was highly connected to the genes in the network. Our results identify a four-gene signature with diagnostic and prognostic value in PC and suggest an interesting gene network that could play a key regulatory role in PC development and progression. Furthermore, hsa-miR-153, controlling this network, could be a potential biomarker for theranostics in high Gleason-scored PC. PMID:29562723

  1. A systems approach identifies networks and genes linking sleep and stress: implications for neuropsychiatric disorders.

    PubMed

    Jiang, Peng; Scarpa, Joseph R; Fitzpatrick, Karrie; Losic, Bojan; Gao, Vance D; Hao, Ke; Summa, Keith C; Yang, He S; Zhang, Bin; Allada, Ravi; Vitaterna, Martha H; Turek, Fred W; Kasarskis, Andrew

    2015-05-05

    Sleep dysfunction and stress susceptibility are comorbid complex traits that often precede and predispose patients to a variety of neuropsychiatric diseases. Here, we demonstrate multilevel organizations of genetic landscape, candidate genes, and molecular networks associated with 328 stress and sleep traits in a chronically stressed population of 338 (C57BL/6J × A/J) F2 mice. We constructed striatal gene co-expression networks, revealing functionally and cell-type-specific gene co-regulations important for stress and sleep. Using a composite ranking system, we identified network modules most relevant for 15 independent phenotypic categories, highlighting a mitochondria/synaptic module that links sleep and stress. The key network regulators of this module are overrepresented with genes implicated in neuropsychiatric diseases. Our work suggests that the interplay among sleep, stress, and neuropathology emerges from genetic influences on gene expression and their collective organization through complex molecular networks, providing a framework for interrogating the mechanisms underlying sleep, stress susceptibility, and related neuropsychiatric disorders. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Genexpi: a toolset for identifying regulons and validating gene regulatory networks using time-course expression data.

    PubMed

    Modrák, Martin; Vohradský, Jiří

    2018-04-13

    Identifying regulons of sigma factors is a vital subtask of gene network inference. Integrating multiple sources of data is essential for correct identification of regulons and complete gene regulatory networks. Time series of expression data measured with microarrays or RNA-seq combined with static binding experiments (e.g., ChIP-seq) or literature mining may be used for inference of sigma factor regulatory networks. We introduce Genexpi: a tool to identify sigma factors by combining candidates obtained from ChIP experiments or literature mining with time-course gene expression data. While Genexpi can be used to infer other types of regulatory interactions, it was designed and validated on real biological data from bacterial regulons. In this paper, we put primary focus on CyGenexpi: a plugin integrating Genexpi with the Cytoscape software for ease of use. As a part of this effort, a plugin for handling time series data in Cytoscape called CyDataseries has been developed and made available. Genexpi is also available as a standalone command line tool and an R package. Genexpi is a useful part of gene network inference toolbox. It provides meaningful information about the composition of regulons and delivers biologically interpretable results.

  3. Dissecting the Gene Network of Dietary Restriction to Identify Evolutionarily Conserved Pathways and New Functional Genes

    PubMed Central

    Wuttke, Daniel; Connor, Richard; Vora, Chintan; Craig, Thomas; Li, Yang; Wood, Shona; Vasieva, Olga; Shmookler Reis, Robert; Tang, Fusheng; de Magalhães, João Pedro

    2012-01-01

    Dietary restriction (DR), limiting nutrient intake from diet without causing malnutrition, delays the aging process and extends lifespan in multiple organisms. The conserved life-extending effect of DR suggests the involvement of fundamental mechanisms, although these remain a subject of debate. To help decipher the life-extending mechanisms of DR, we first compiled a list of genes that if genetically altered disrupt or prevent the life-extending effects of DR. We called these DR–essential genes and identified more than 100 in model organisms such as yeast, worms, flies, and mice. In order for other researchers to benefit from this first curated list of genes essential for DR, we established an online database called GenDR (http://genomics.senescence.info/diet/). To dissect the interactions of DR–essential genes and discover the underlying lifespan-extending mechanisms, we then used a variety of network and systems biology approaches to analyze the gene network of DR. We show that DR–essential genes are more conserved at the molecular level and have more molecular interactions than expected by chance. Furthermore, we employed a guilt-by-association method to predict novel DR–essential genes. In budding yeast, we predicted nine genes related to vacuolar functions; we show experimentally that mutations deleting eight of those genes prevent the life-extending effects of DR. Three of these mutants (OPT2, FRE6, and RCR2) had extended lifespan under ad libitum, indicating that the lack of further longevity under DR is not caused by a general compromise of fitness. These results demonstrate how network analyses of DR using GenDR can be used to make phenotypically relevant predictions. Moreover, gene-regulatory circuits reveal that the DR–induced transcriptional signature in yeast involves nutrient-sensing, stress responses and meiotic transcription factors. Finally, comparing the influence of gene expression changes during DR on the interactomes of multiple

  4. A Heterogeneous Network Based Method for Identifying GBM-Related Genes by Integrating Multi-Dimensional Data.

    PubMed

    Chen Peng; Ao Li

    2017-01-01

    The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .

  5. Microarray analysis and scale-free gene networks identify candidate regulators in drought-stressed roots of loblolly pine (P. taeda L.)

    PubMed Central

    2011-01-01

    Background Global transcriptional analysis of loblolly pine (Pinus taeda L.) is challenging due to limited molecular tools. PtGen2, a 26,496 feature cDNA microarray, was fabricated and used to assess drought-induced gene expression in loblolly pine propagule roots. Statistical analysis of differential expression and weighted gene correlation network analysis were used to identify drought-responsive genes and further characterize the molecular basis of drought tolerance in loblolly pine. Results Microarrays were used to interrogate root cDNA populations obtained from 12 genotype × treatment combinations (four genotypes, three watering regimes). Comparison of drought-stressed roots with roots from the control treatment identified 2445 genes displaying at least a 1.5-fold expression difference (false discovery rate = 0.01). Genes commonly associated with drought response in pine and other plant species, as well as a number of abiotic and biotic stress-related genes, were up-regulated in drought-stressed roots. Only 76 genes were identified as differentially expressed in drought-recovered roots, indicating that the transcript population can return to the pre-drought state within 48 hours. Gene correlation analysis predicts a scale-free network topology and identifies eleven co-expression modules that ranged in size from 34 to 938 members. Network topological parameters identified a number of central nodes (hubs) including those with significant homology (E-values ≤ 2 × 10-30) to 9-cis-epoxycarotenoid dioxygenase, zeatin O-glucosyltransferase, and ABA-responsive protein. Identified hubs also include genes that have been associated previously with osmotic stress, phytohormones, enzymes that detoxify reactive oxygen species, and several genes of unknown function. Conclusion PtGen2 was used to evaluate transcriptome responses in loblolly pine and was leveraged to identify 2445 differentially expressed genes responding to severe drought stress in roots. Many of the

  6. Gene networks associated with conditional fear in mice identified using a systems genetics approach

    PubMed Central

    2011-01-01

    Background Our understanding of the genetic basis of learning and memory remains shrouded in mystery. To explore the genetic networks governing the biology of conditional fear, we used a systems genetics approach to analyze a hybrid mouse diversity panel (HMDP) with high mapping resolution. Results A total of 27 behavioral quantitative trait loci were mapped with a false discovery rate of 5%. By integrating fear phenotypes, transcript profiling data from hippocampus and striatum and also genotype information, two gene co-expression networks correlated with context-dependent immobility were identified. We prioritized the key markers and genes in these pathways using intramodular connectivity measures and structural equation modeling. Highly connected genes in the context fear modules included Psmd6, Ube2a and Usp33, suggesting an important role for ubiquitination in learning and memory. In addition, we surveyed the architecture of brain transcript regulation and demonstrated preservation of gene co-expression modules in hippocampus and striatum, while also highlighting important differences. Rps15a, Kif3a, Stard7, 6330503K22RIK, and Plvap were among the individual genes whose transcript abundance were strongly associated with fear phenotypes. Conclusion Application of our multi-faceted mapping strategy permits an increasingly detailed characterization of the genetic networks underlying behavior. PMID:21410935

  7. Network-Based Method for Identifying Co-Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues

    PubMed Central

    Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Cai, Yu-Dong

    2017-01-01

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method. PMID:28974058

  8. Network-Based Method for Identifying Co- Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues.

    PubMed

    Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong

    2017-10-02

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.

  9. A Sparse Reconstruction Approach for Identifying Gene Regulatory Networks Using Steady-State Experiment Data

    PubMed Central

    Zhang, Wanhong; Zhou, Tong

    2015-01-01

    Motivation Identifying gene regulatory networks (GRNs) which consist of a large number of interacting units has become a problem of paramount importance in systems biology. Situations exist extensively in which causal interacting relationships among these units are required to be reconstructed from measured expression data and other a priori information. Though numerous classical methods have been developed to unravel the interactions of GRNs, these methods either have higher computing complexities or have lower estimation accuracies. Note that great similarities exist between identification of genes that directly regulate a specific gene and a sparse vector reconstruction, which often relates to the determination of the number, location and magnitude of nonzero entries of an unknown vector by solving an underdetermined system of linear equations y = Φx. Based on these similarities, we propose a novel framework of sparse reconstruction to identify the structure of a GRN, so as to increase accuracy of causal regulation estimations, as well as to reduce their computational complexity. Results In this paper, a sparse reconstruction framework is proposed on basis of steady-state experiment data to identify GRN structure. Different from traditional methods, this approach is adopted which is well suitable for a large-scale underdetermined problem in inferring a sparse vector. We investigate how to combine the noisy steady-state experiment data and a sparse reconstruction algorithm to identify causal relationships. Efficiency of this method is tested by an artificial linear network, a mitogen-activated protein kinase (MAPK) pathway network and the in silico networks of the DREAM challenges. The performance of the suggested approach is compared with two state-of-the-art algorithms, the widely adopted total least-squares (TLS) method and those available results on the DREAM project. Actual results show that, with a lower computational cost, the proposed method can

  10. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets

    PubMed Central

    Vinayagam, Arunachalam; Gibson, Travis E.; Lee, Ho-Joon; Yilmazel, Bahar; Roesel, Charles; Hu, Yanhui; Kwon, Young; Sharma, Amitabh; Liu, Yang-Yu; Perrimon, Norbert; Barabási, Albert-László

    2016-01-01

    The protein–protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as “indispensable,” “neutral,” or “dispensable,” which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network’s control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets. PMID:27091990

  11. Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks.

    PubMed

    Wang, Jian; Xie, Dong; Lin, Hongfei; Yang, Zhihao; Zhang, Yijia

    2012-06-21

    Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification. A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics. The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.

  12. Weighted gene co‑expression network analysis in identification of key genes and networks for ischemic‑reperfusion remodeling myocardium.

    PubMed

    Guo, Nan; Zhang, Nan; Yan, Liqiu; Lian, Zheng; Wang, Jiawang; Lv, Fengfeng; Wang, Yunfei; Cao, Xufen

    2018-06-14

    Acute myocardial infarction induces ventricular remodeling, which is implicated in dilated heart and heart failure. The pathogenical mechanism of myocardium remodeling remains to be elucidated. The aim of the present study was to identify key genes and networks for myocardium remodeling following ischemia‑reperfusion (IR). First, the mRNA expression data from the National Center for Biotechnology Information database were downloaded to identify differences in mRNA expression of the IR heart at days 2 and 7. Then, weighted gene co‑expression network analysis, hierarchical clustering, protein‑protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway were used to identify key genes and networks for the heart remodeling process following IR. A total of 3,321 differentially expressed genes were identified during the heart remodeling process. A total of 6 modules were identified through gene co‑expression network analysis. GO and KEGG analysis results suggested that each module represented a different biological function and was associated with different pathways. Finally, hub genes of each module were identified by PPI network construction. The present study revealed that heart remodeling following IR is a complicated process, involving extracellular matrix organization, neural development, apoptosis and energy metabolism. The dysregulated genes, including SRC proto‑oncogene, non‑receptor tyrosine kinase, discs large MAGUK scaffold protein 1, ATP citrate lyase, RAN, member RAS oncogene family, tumor protein p53, and polo like kinase 2, may be essential for heart remodeling following IR and may be used as potential targets for the inhibition of heart remodeling following acute myocardial infarction.

  13. Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease.

    PubMed

    Johnson, Michael R; Shkura, Kirill; Langley, Sarah R; Delahaye-Duriez, Andree; Srivastava, Prashant; Hill, W David; Rackham, Owen J L; Davies, Gail; Harris, Sarah E; Moreno-Moral, Aida; Rotival, Maxime; Speed, Doug; Petrovski, Slavé; Katz, Anaïs; Hayward, Caroline; Porteous, David J; Smith, Blair H; Padmanabhan, Sandosh; Hocking, Lynne J; Starr, John M; Liewald, David C; Visconti, Alessia; Falchi, Mario; Bottolo, Leonardo; Rossetti, Tiziana; Danis, Bénédicte; Mazzuferi, Manuela; Foerch, Patrik; Grote, Alexander; Helmstaedter, Christoph; Becker, Albert J; Kaminski, Rafal M; Deary, Ian J; Petretto, Enrico

    2016-02-01

    Genetic determinants of cognition are poorly characterized, and their relationship to genes that confer risk for neurodevelopmental disease is unclear. Here we performed a systems-level analysis of genome-wide gene expression data to infer gene-regulatory networks conserved across species and brain regions. Two of these networks, M1 and M3, showed replicable enrichment for common genetic variants underlying healthy human cognitive abilities, including memory. Using exome sequence data from 6,871 trios, we found that M3 genes were also enriched for mutations ascertained from patients with neurodevelopmental disease generally, and intellectual disability and epileptic encephalopathy in particular. M3 consists of 150 genes whose expression is tightly developmentally regulated, but which are collectively poorly annotated for known functional pathways. These results illustrate how systems-level analyses can reveal previously unappreciated relationships between neurodevelopmental disease-associated genes in the developed human brain, and provide empirical support for a convergent gene-regulatory network influencing cognition and neurodevelopmental disease.

  14. Analysis of bHLH coding genes using gene co-expression network approach.

    PubMed

    Srivastava, Swati; Sanchita; Singh, Garima; Singh, Noopur; Srivastava, Gaurava; Sharma, Ashok

    2016-07-01

    Network analysis provides a powerful framework for the interpretation of data. It uses novel reference network-based metrices for module evolution. These could be used to identify module of highly connected genes showing variation in co-expression network. In this study, a co-expression network-based approach was used for analyzing the genes from microarray data. Our approach consists of a simple but robust rank-based network construction. The publicly available gene expression data of Solanum tuberosum under cold and heat stresses were considered to create and analyze a gene co-expression network. The analysis provide highly co-expressed module of bHLH coding genes based on correlation values. Our approach was to analyze the variation of genes expression, according to the time period of stress through co-expression network approach. As the result, the seed genes were identified showing multiple connections with other genes in the same cluster. Seed genes were found to be vary in different time periods of stress. These analyzed seed genes may be utilized further as marker genes for developing the stress tolerant plant species.

  15. Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.

    PubMed

    Jiang, Xue; Zhang, Han; Quan, Xiongwen

    2016-01-01

    Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.

  16. Identifying Functional Mechanisms of Gene and Protein Regulatory Networks in Response to a Broader Range of Environmental Stresses

    PubMed Central

    Li, Cheng-Wei; Chen, Bor-Sen

    2010-01-01

    Cellular responses to sudden environmental stresses or physiological changes provide living organisms with the opportunity for final survival and further development. Therefore, it is an important topic to understand protective mechanisms against environmental stresses from the viewpoint of gene and protein networks. We propose two coupled nonlinear stochastic dynamic models to reconstruct stress-activated gene and protein regulatory networks via microarray data in response to environmental stresses. According to the reconstructed gene/protein networks, some possible mutual interactions, feedforward and feedback loops are found for accelerating response and filtering noises in these signaling pathways. A bow-tie core network is also identified to coordinate mutual interactions and feedforward loops, feedback inhibitions, feedback activations, and cross talks to cope efficiently with a broader range of environmental stresses with limited proteins and pathways. PMID:20454442

  17. Discovering Implicit Entity Relation with the Gene-Citation-Gene Network

    PubMed Central

    Song, Min; Han, Nam-Gi; Kim, Yong-Hwan; Ding, Ying; Chambers, Tamy

    2013-01-01

    In this paper, we apply the entitymetrics model to our constructed Gene-Citation-Gene (GCG) network. Based on the premise there is a hidden, but plausible, relationship between an entity in one article and an entity in its citing article, we constructed a GCG network of gene pairs implicitly connected through citation. We compare the performance of this GCG network to a gene-gene (GG) network constructed over the same corpus but which uses gene pairs explicitly connected through traditional co-occurrence. Using 331,411 MEDLINE abstracts collected from 18,323 seed articles and their references, we identify 25 gene pairs. A comparison of these pairs with interactions found in BioGRID reveal that 96% of the gene pairs in the GCG network have known interactions. We measure network performance using degree, weighted degree, closeness, betweenness centrality and PageRank. Combining all measures, we find the GCG network has more gene pairs, but a lower matching rate than the GG network. However, combining top ranked genes in both networks produces a matching rate of 35.53%. By visualizing both the GG and GCG networks, we find that cancer is the most dominant disease associated with the genes in both networks. Overall, the study indicates that the GCG network can be useful for detecting gene interaction in an implicit manner. PMID:24358368

  18. Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks

    PubMed Central

    Li, Min; Chen, Weijie; Wang, Jianxin; Pan, Yi

    2014-01-01

    Identification of protein complexes from protein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived. The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures. PMID:24963481

  19. Integrating Genetic and Gene Co-expression Analysis Identifies Gene Networks Involved in Alcohol and Stress Responses

    PubMed Central

    Luo, Jie; Xu, Pei; Cao, Peijian; Wan, Hongjian; Lv, Xiaonan; Xu, Shengchun; Wang, Gangjun; Cook, Melloni N.; Jones, Byron C.; Lu, Lu; Wang, Xusheng

    2018-01-01

    Although the link between stress and alcohol is well recognized, the underlying mechanisms of how they interplay at the molecular level remain unclear. The purpose of this study is to identify molecular networks underlying the effects of alcohol and stress responses, as well as their interaction on anxiety behaviors in the hippocampus of mice using a systems genetics approach. Here, we applied a gene co-expression network approach to transcriptomes of 41 BXD mouse strains under four conditions: stress, alcohol, stress-induced alcohol and control. The co-expression analysis identified 14 modules and characterized four expression patterns across the four conditions. The four expression patterns include up-regulation in no restraint stress and given an ethanol injection (NOE) but restoration in restraint stress followed by an ethanol injection (RSE; pattern 1), down-regulation in NOE but rescue in RSE (pattern 2), up-regulation in both restraint stress followed by a saline injection (RSS) and NOE, and further amplification in RSE (pattern 3), and up-regulation in RSS but reduction in both NOE and RSE (pattern 4). We further identified four functional subnetworks by superimposing protein-protein interactions (PPIs) to the 14 co-expression modules, including γ-aminobutyric acid receptor (GABA) signaling, glutamate signaling, neuropeptide signaling, cAMP-dependent signaling. We further performed module specificity analysis to identify modules that are specific to stress, alcohol, or stress-induced alcohol responses. Finally, we conducted causality analysis to link genetic variation to these identified modules, and anxiety behaviors after stress and alcohol treatments. This study underscores the importance of integrative analysis and offers new insights into the molecular networks underlying stress and alcohol responses. PMID:29674951

  20. Reconstructing directed gene regulatory network by only gene expression data.

    PubMed

    Zhang, Lu; Feng, Xi Kang; Ng, Yen Kaow; Li, Shuai Cheng

    2016-08-18

    Accurately identifying gene regulatory network is an important task in understanding in vivo biological activities. The inference of such networks is often accomplished through the use of gene expression data. Many methods have been developed to evaluate gene expression dependencies between transcription factor and its target genes, and some methods also eliminate transitive interactions. The regulatory (or edge) direction is undetermined if the target gene is also a transcription factor. Some methods predict the regulatory directions in the gene regulatory networks by locating the eQTL single nucleotide polymorphism, or by observing the gene expression changes when knocking out/down the candidate transcript factors; regrettably, these additional data are usually unavailable, especially for the samples deriving from human tissues. In this study, we propose the Context Based Dependency Network (CBDN), a method that is able to infer gene regulatory networks with the regulatory directions from gene expression data only. To determine the regulatory direction, CBDN computes the influence of source to target by evaluating the magnitude changes of expression dependencies between the target gene and the others with conditioning on the source gene. CBDN extends the data processing inequality by involving the dependency direction to distinguish between direct and transitive relationship between genes. We also define two types of important regulators which can influence a majority of the genes in the network directly or indirectly. CBDN can detect both of these two types of important regulators by averaging the influence functions of candidate regulator to the other genes. In our experiments with simulated and real data, even with the regulatory direction taken into account, CBDN outperforms the state-of-the-art approaches for inferring gene regulatory network. CBDN identifies the important regulators in the predicted network: 1. TYROBP influences a batch of genes that are

  1. Omics of Brucella: Species-Specific sRNA-Mediated Gene Ontology Regulatory Networks Identified by Computational Biology.

    PubMed

    Vishnu, Udayakumar S; Sankarasubramanian, Jagadesan; Gunasekaran, Paramasamy; Sridhar, Jayavel; Rajendhran, Jeyaprakash

    2016-06-01

    Brucella is an intracellular bacterium that causes the zoonotic infectious disease, brucellosis. Brucella species are currently intensively studied with a view to developing novel global health diagnostics and therapeutics. In this context, small RNAs (sRNAs) are one of the emerging topical areas; they play significant roles in regulating gene expression and cellular processes in bacteria. In the present study, we forecast sRNAs in three Brucella species that infect humans, namely Brucella melitensis, Brucella abortus, and Brucella suis, using a computational biology analysis. We combined two bioinformatic algorithms, SIPHT and sRNAscanner. In B. melitensis 16M, 21 sRNA candidates were identified, of which 14 were novel. Similarly, 14 sRNAs were identified in B. abortus, of which four were novel. In B. suis, 16 sRNAs were identified, and five of them were novel. TargetRNA2 software predicted the putative target genes that could be regulated by the identified sRNAs. The identified mRNA targets are involved in carbohydrate, amino acid, lipid, nucleotide, and coenzyme metabolism and transport, energy production and conversion, replication, recombination, repair, and transcription. Additionally, the Gene Ontology (GO) network analysis revealed the species-specific, sRNA-based regulatory networks in B. melitensis, B. abortus, and B. suis. Taken together, although sRNAs are veritable modulators of gene expression in prokaryotes, there are few reports on the significance of sRNAs in Brucella. This report begins to address this literature gap by offering a series of initial observations based on computational biology to pave the way for future experimental analysis of sRNAs and their targets to explain the complex pathogenesis of Brucella.

  2. m6A-Driver: Identifying Context-Specific mRNA m6A Methylation-Driven Gene Interaction Networks

    PubMed Central

    Zhang, Song-Yao; Zhang, Shao-Wu; Liu, Lian; Huang, Yufei

    2016-01-01

    As the most prevalent mammalian mRNA epigenetic modification, N6-methyladenosine (m6A) has been shown to possess important post-transcriptional regulatory functions. However, the regulatory mechanisms and functional circuits of m6A are still largely elusive. To help unveil the regulatory circuitry mediated by mRNA m6A methylation, we develop here m6A-Driver, an algorithm for predicting m6A-driven genes and associated networks, whose functional interactions are likely to be actively modulated by m6A methylation under a specific condition. Specifically, m6A-Driver integrates the PPI network and the predicted differential m6A methylation sites from methylated RNA immunoprecipitation sequencing (MeRIP-Seq) data using a Random Walk with Restart (RWR) algorithm and then builds a consensus m6A-driven network of m6A-driven genes. To evaluate the performance, we applied m6A-Driver to build the context-specific m6A-driven networks for 4 known m6A (de)methylases, i.e., FTO, METTL3, METTL14 and WTAP. Our results suggest that m6A-Driver can robustly and efficiently identify m6A-driven genes that are functionally more enriched and associated with higher degree of differential expression than differential m6A methylated genes. Pathway analysis of the constructed context-specific m6A-driven gene networks further revealed the regulatory circuitry underlying the dynamic interplays between the methyltransferases and demethylase at the epitranscriptomic layer of gene regulation. PMID:28027310

  3. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.

    PubMed

    Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi

    2007-10-04

    In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.

  4. Regulatory network analysis of Epstein-Barr virus identifies functional modules and hub genes involved in infectious mononucleosis.

    PubMed

    Poorebrahim, Mansour; Salarian, Ali; Najafi, Saeideh; Abazari, Mohammad Foad; Aleagha, Maryam Nouri; Dadras, Mohammad Nasr; Jazayeri, Seyed Mohammad; Ataei, Atousa; Poortahmasebi, Vahdat

    2017-05-01

    Epstein-Barr virus (EBV) is the most common cause of infectious mononucleosis (IM) and establishes lifetime infection associated with a variety of cancers and autoimmune diseases. The aim of this study was to develop an integrative gene regulatory network (GRN) approach and overlying gene expression data to identify the representative subnetworks for IM and EBV latent infection (LI). After identifying differentially expressed genes (DEGs) in both IM and LI gene expression profiles, functional annotations were applied using gene ontology (GO) and BiNGO tools, and construction of GRNs, topological analysis and identification of modules were carried out using several plugins of Cytoscape. In parallel, a human-EBV GRN was generated using the Hu-Vir database for further analyses. Our analysis revealed that the majority of DEGs in both IM and LI were involved in cell-cycle and DNA repair processes. However, these genes showed a significant negative correlation in the IM and LI states. Furthermore, cyclin-dependent kinase 2 (CDK2) - a hub gene with the highest centrality score - appeared to be the key player in cell cycle regulation in IM disease. The most significant functional modules in the IM and LI states were involved in the regulation of the cell cycle and apoptosis, respectively. Human-EBV network analysis revealed several direct targets of EBV proteins during IM disease. Our study provides an important first report on the response to IM/LI EBV infection in humans. An important aspect of our data was the upregulation of genes associated with cell cycle progression and proliferation.

  5. Identifying osteosarcoma metastasis associated genes by weighted gene co-expression network analysis (WGCNA).

    PubMed

    Tian, Honglai; Guan, Donghui; Li, Jianmin

    2018-06-01

    Osteosarcoma (OS), the most common malignant bone tumor, accounts for the heavy healthy threat in the period of children and adolescents. OS occurrence usually correlates with early metastasis and high death rate. This study aimed to better understand the mechanism of OS metastasis.Based on Gene Expression Omnibus (GEO) database, we downloaded 4 expression profile data sets associated with OS metastasis, and selected differential expressed genes. Weighted gene co-expression network analysis (WGCNA) approach allowed us to investigate the most OS metastasis-correlated module. Gene Ontology functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to give annotation of selected OS metastasis-associated genes.We select 897 differential expressed genes from OS metastasis and OS non-metastasis groups. Based on these selected genes, WGCNA further explored 142 genes included in the most OS metastasis-correlated module. Gene Ontology functional and KEGG pathway enrichment analyses showed that significantly OS metastasis-associated genes were involved in pathway correlated with insulin-like growth factor binding.Our research figured out several potential molecules participating in metastasis process and factors acting as biomarker. With this study, we could better explore the mechanism of OS metastasis and further discover more therapy targets.

  6. Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1.

    PubMed

    Gutiérrez, Rodrigo A; Stokes, Trevor L; Thum, Karen; Xu, Xiaodong; Obertello, Mariana; Katari, Manpreet S; Tanurdzic, Milos; Dean, Alexis; Nero, Damion C; McClung, C Robertson; Coruzzi, Gloria M

    2008-03-25

    Understanding how nutrients affect gene expression will help us to understand the mechanisms controlling plant growth and development as a function of nutrient availability. Nitrate has been shown to serve as a signal for the control of gene expression in Arabidopsis. There is also evidence, on a gene-by-gene basis, that downstream products of nitrogen (N) assimilation such as glutamate (Glu) or glutamine (Gln) might serve as signals of organic N status that in turn regulate gene expression. To identify genome-wide responses to such organic N signals, Arabidopsis seedlings were transiently treated with ammonium nitrate in the presence or absence of MSX, an inhibitor of glutamine synthetase, resulting in a block of Glu/Gln synthesis. Genes that responded to organic N were identified as those whose response to ammonium nitrate treatment was blocked in the presence of MSX. We showed that some genes previously identified to be regulated by nitrate are under the control of an organic N-metabolite. Using an integrated network model of molecular interactions, we uncovered a subnetwork regulated by organic N that included CCA1 and target genes involved in N-assimilation. We validated some of the predicted interactions and showed that regulation of the master clock control gene CCA1 by Glu or a Glu-derived metabolite in turn regulates the expression of key N-assimilatory genes. Phase response curve analysis shows that distinct N-metabolites can advance or delay the CCA1 phase. Regulation of CCA1 by organic N signals may represent a novel input mechanism for N-nutrients to affect plant circadian clock function.

  7. Leveraging network analytics to infer patient syndrome and identify causal genes in rare disease cases.

    PubMed

    Krämer, Andreas; Shah, Sohela; Rebres, Robert Anthony; Tang, Susan; Richards, Daniel Rene

    2017-08-11

    Next-generation sequencing is widely used to identify disease-causing variants in patients with rare genetic disorders. Identifying those variants from whole-genome or exome data can be both scientifically challenging and time consuming. A significant amount of time is spent on variant annotation, and interpretation. Fully or partly automated solutions are therefore needed to streamline and scale this process. We describe Phenotype Driven Ranking (PDR), an algorithm integrated into Ingenuity Variant Analysis, that uses observed patient phenotypes to prioritize diseases and genes in order to expedite causal-variant discovery. Our method is based on a network of phenotype-disease-gene relationships derived from the QIAGEN Knowledge Base, which allows for efficient computational association of phenotypes to implicated diseases, and also enables scoring and ranking. We have demonstrated the utility and performance of PDR by applying it to a number of clinical rare-disease cases, where the true causal gene was known beforehand. It is also shown that PDR compares favorably to a representative alternative tool.

  8. Gene network analysis identifies rumen epithelial cell proliferation, differentiation and metabolic pathways perturbed by diet and correlated with methane production

    PubMed Central

    Xiang, Ruidong; McNally, Jody; Rowe, Suzanne; Jonker, Arjan; Pinares-Patino, Cesar S.; Oddy, V. Hutton; Vercoe, Phil E.; McEwan, John C.; Dalrymple, Brian P.

    2016-01-01

    Ruminants obtain nutrients from microbial fermentation of plant material, primarily in their rumen, a multilayered forestomach. How the different layers of the rumen wall respond to diet and influence microbial fermentation, and how these process are regulated, is not well understood. Gene expression correlation networks were constructed from full thickness rumen wall transcriptomes of 24 sheep fed two different amounts and qualities of a forage and measured for methane production. The network contained two major negatively correlated gene sub-networks predominantly representing the epithelial and muscle layers of the rumen wall. Within the epithelium sub-network gene clusters representing lipid/oxo-acid metabolism, general metabolism and proliferating and differentiating cells were identified. The expression of cell cycle and metabolic genes was positively correlated with dry matter intake, ruminal short chain fatty acid concentrations and methane production. A weak correlation between lipid/oxo-acid metabolism genes and methane yield was observed. Feed consumption level explained the majority of gene expression variation, particularly for the cell cycle genes. Many known stratified epithelium transcription factors had significantly enriched targets in the epithelial gene clusters. The expression patterns of the transcription factors and their targets in proliferating and differentiating skin is mirrored in the rumen, suggesting conservation of regulatory systems. PMID:27966600

  9. Identifying New Candidate Genes and Chemicals Related to Prostate Cancer Using a Hybrid Network and Shortest Path Approach

    PubMed Central

    Wang, Meng; Wu, Kai; Lu, Changhong; Kong, Xiangyin

    2015-01-01

    Prostate cancer is a type of cancer that occurs in the male prostate, a gland in the male reproductive system. Because prostate cancer cells may spread to other parts of the body and can influence human reproduction, understanding the mechanisms underlying this disease is critical for designing effective treatments. The identification of as many genes and chemicals related to prostate cancer as possible will enhance our understanding of this disease. In this study, we proposed a computational method to identify new candidate genes and chemicals based on currently known genes and chemicals related to prostate cancer by applying a shortest path approach in a hybrid network. The hybrid network was constructed according to information concerning chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions. Many of the obtained genes and chemicals are associated with prostate cancer. PMID:26504486

  10. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    PubMed

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  11. Predicting hepatocellular carcinoma through cross-talk genes identified by risk pathways

    PubMed Central

    Shao, Zhuo; Huo, Diwei; Zhang, Denan; Xie, Hongbo; Yang, Jingbo; Liu, Qiuqi; Chen, Xiujie

    2018-01-01

    Hepatocellular carcinoma (HCC) is the most frequent type of liver cancer with poor survival rate and high mortality. Despite efforts on the mechanism of HCC, new molecular markers are needed for exact diagnosis, evaluation and treatment. Here, we combined transcriptome of HCC with networks and pathways to identify reliable molecular markers. Through integrating 249 differentially expressed genes with syncretic protein interaction networks, we constructed a HCC-specific network, from which we further extracted 480 pivotal genes. Based on the cross-talk between the enriched pathways of the pivotal genes, we finally identified a HCC signature of 45 genes, which could accurately distinguish HCC patients with normal individuals and reveal the prognosis of HCC patients. Among these 45 genes, 15 showed dysregulated expression patterns and a part have been reported to be associated with HCC and/or other cancers. These findings suggested that our identified 45 gene signature could be potential and valuable molecular markers for diagnosis and evaluation of HCC. PMID:29765536

  12. Network inference analysis identifies an APRR2-like gene linked to pigment accumulation in tomato and pepper fruits.

    PubMed

    Pan, Yu; Bradley, Glyn; Pyke, Kevin; Ball, Graham; Lu, Chungui; Fray, Rupert; Marshall, Alexandra; Jayasuta, Subhalai; Baxter, Charles; van Wijk, Rik; Boyden, Laurie; Cade, Rebecca; Chapman, Natalie H; Fraser, Paul D; Hodgman, Charlie; Seymour, Graham B

    2013-03-01

    Carotenoids represent some of the most important secondary metabolites in the human diet, and tomato (Solanum lycopersicum) is a rich source of these health-promoting compounds. In this work, a novel and fruit-related regulator of pigment accumulation in tomato has been identified by artificial neural network inference analysis and its function validated in transgenic plants. A tomato fruit gene regulatory network was generated using artificial neural network inference analysis and transcription factor gene expression profiles derived from fruits sampled at various points during development and ripening. One of the transcription factor gene expression profiles with a sequence related to an Arabidopsis (Arabidopsis thaliana) ARABIDOPSIS PSEUDO RESPONSE REGULATOR2-LIKE gene (APRR2-Like) was up-regulated at the breaker stage in wild-type tomato fruits and, when overexpressed in transgenic lines, increased plastid number, area, and pigment content, enhancing the levels of chlorophyll in immature unripe fruits and carotenoids in red ripe fruits. Analysis of the transcriptome of transgenic lines overexpressing the tomato APPR2-Like gene revealed up-regulation of several ripening-related genes in the overexpression lines, providing a link between the expression of this tomato gene and the ripening process. A putative ortholog of the tomato APPR2-Like gene in sweet pepper (Capsicum annuum) was associated with pigment accumulation in fruit tissues. We conclude that the function of this gene is conserved across taxa and that it encodes a protein that has an important role in ripening.

  13. Identifying key genes associated with acute myocardial infarction.

    PubMed

    Cheng, Ming; An, Shoukuan; Li, Junquan

    2017-10-01

    This study aimed to identify key genes associated with acute myocardial infarction (AMI) by reanalyzing microarray data. Three gene expression profile datasets GSE66360, GSE34198, and GSE48060 were downloaded from GEO database. After data preprocessing, genes without heterogeneity across different platforms were subjected to differential expression analysis between the AMI group and the control group using metaDE package. P < .05 was used as the cutoff for a differentially expressed gene (DEG). The expression data matrices of DEGs were imported in ReactomeFIViz to construct a gene functional interaction (FI) network. Then, DEGs in each module were subjected to pathway enrichment analysis using DAVID. MiRNAs and transcription factors predicted to regulate target DEGs were identified. Quantitative real-time polymerase chain reaction (RT-PCR) was applied to verify the expression of genes. A total of 913 upregulated genes and 1060 downregulated genes were identified in the AMI group. A FI network consists of 21 modules and DEGs in 12 modules were significantly enriched in pathways. The transcription factor-miRNA-gene network contains 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p. RT-PCR validations showed that expression levels of FOXO3 and MYBL2 were significantly increased in AMI, and expression levels of hsa-miR-21-5p and hsa-miR-30c-5p were obviously decreased in AMI. A total of 41 DEGs, such as SOCS3, VAPA, and COL5A2, are speculated to have roles in the pathogenesis of AMI; 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p may be involved in the regulation of the expression of these DEGs.

  14. Construction and analysis of gene-gene dynamics influence networks based on a Boolean model.

    PubMed

    Mazaya, Maulida; Trinh, Hung-Cuong; Kwon, Yung-Keun

    2017-12-21

    Identification of novel gene-gene relations is a crucial issue to understand system-level biological phenomena. To this end, many methods based on a correlation analysis of gene expressions or structural analysis of molecular interaction networks have been proposed. They have a limitation in identifying more complicated gene-gene dynamical relations, though. To overcome this limitation, we proposed a measure to quantify a gene-gene dynamical influence (GDI) using a Boolean network model and constructed a GDI network to indicate existence of a dynamical influence for every ordered pair of genes. It represents how much a state trajectory of a target gene is changed by a knockout mutation subject to a source gene in a gene-gene molecular interaction (GMI) network. Through a topological comparison between GDI and GMI networks, we observed that the former network is denser than the latter network, which implies that there exist many gene pairs of dynamically influencing but molecularly non-interacting relations. In addition, a larger number of hub genes were generated in the GDI network. On the other hand, there was a correlation between these networks such that the degree value of a node was positively correlated to each other. We further investigated the relationships of the GDI value with structural properties and found that there are negative and positive correlations with the length of a shortest path and the number of paths, respectively. In addition, a GDI network could predict a set of genes whose steady-state expression is affected in E. coli gene-knockout experiments. More interestingly, we found that the drug-targets with side-effects have a larger number of outgoing links than the other genes in the GDI network, which implies that they are more likely to influence the dynamics of other genes. Finally, we found biological evidences showing that the gene pairs which are not molecularly interacting but dynamically influential can be considered for novel gene-gene

  15. Gene expression complex networks: synthesis, identification, and analysis.

    PubMed

    Lopes, Fabrício M; Cesar, Roberto M; Costa, Luciano Da F

    2011-10-01

    Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference

  16. Systems genetics identifies Sestrin 3 as a regulator of a proconvulsant gene network in human epileptic hippocampus

    PubMed Central

    Johnson, Michael R.; Rossetti, Tiziana; Speed, Doug; Srivastava, Prashant K.; Chadeau-Hyam, Marc; Hajji, Nabil; Dabrowska, Aleksandra; Rotival, Maxime; Razzaghi, Banafsheh; Kovac, Stjepana; Wanisch, Klaus; Grillo, Federico W.; Slaviero, Anna; Langley, Sarah R.; Shkura, Kirill; Roncon, Paolo; De, Tisham; Mattheisen, Manuel; Niehusmann, Pitt; O’Brien, Terence J.; Petrovski, Slave; von Lehe, Marec; Hoffmann, Per; Eriksson, Johan; Coffey, Alison J.; Cichon, Sven; Walker, Matthew; Simonato, Michele; Danis, Bénédicte; Mazzuferi, Manuela; Foerch, Patrik; Schoch, Susanne; De Paola, Vincenzo; Kaminski, Rafal M.; Cunliffe, Vincent T.; Becker, Albert J.; Petretto, Enrico

    2015-01-01

    Gene-regulatory network analysis is a powerful approach to elucidate the molecular processes and pathways underlying complex disease. Here we employ systems genetics approaches to characterize the genetic regulation of pathophysiological pathways in human temporal lobe epilepsy (TLE). Using surgically acquired hippocampi from 129 TLE patients, we identify a gene-regulatory network genetically associated with epilepsy that contains a specialized, highly expressed transcriptional module encoding proconvulsive cytokines and Toll-like receptor signalling genes. RNA sequencing analysis in a mouse model of TLE using 100 epileptic and 100 control hippocampi shows the proconvulsive module is preserved across-species, specific to the epileptic hippocampus and upregulated in chronic epilepsy. In the TLE patients, we map the trans-acting genetic control of this proconvulsive module to Sestrin 3 (SESN3), and demonstrate that SESN3 positively regulates the module in macrophages, microglia and neurons. Morpholino-mediated Sesn3 knockdown in zebrafish confirms the regulation of the transcriptional module, and attenuates chemically induced behavioural seizures in vivo. PMID:25615886

  17. Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data

    PubMed Central

    Ping, Yanyan; Deng, Yulan; Wang, Li; Zhang, Hongyi; Zhang, Yong; Xu, Chaohan; Zhao, Hongying; Fan, Huihui; Yu, Fulong; Xiao, Yun; Li, Xia

    2015-01-01

    The driver genetic aberrations collectively regulate core cellular processes underlying cancer development. However, identifying the modules of driver genetic alterations and characterizing their functional mechanisms are still major challenges for cancer studies. Here, we developed an integrative multi-omics method CMDD to identify the driver modules and their affecting dysregulated genes through characterizing genetic alteration-induced dysregulated networks. Applied to glioblastoma (GBM), the CMDD identified a core gene module of 17 genes, including seven known GBM drivers, and their dysregulated genes. The module showed significant association with shorter survival of GBM. When classifying driver genes in the module into two gene sets according to their genetic alteration patterns, we found that one gene set directly participated in the glioma pathway, while the other indirectly regulated the glioma pathway, mostly, via their dysregulated genes. Both of the two gene sets were significant contributors to survival and helpful for classifying GBM subtypes, suggesting their critical roles in GBM pathogenesis. Also, by applying the CMDD to other six cancers, we identified some novel core modules associated with overall survival of patients. Together, these results demonstrate integrative multi-omics data can identify driver modules and uncover their dysregulated genes, which is useful for interpreting cancer genome. PMID:25653168

  18. Inference of cancer-specific gene regulatory networks using soft computing rules.

    PubMed

    Wang, Xiaosheng; Gotoh, Osamu

    2010-03-24

    Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer) using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.

  19. Computational gene network study on antibiotic resistance genes of Acinetobacter baumannii.

    PubMed

    Anitha, P; Anbarasu, Anand; Ramaiah, Sudha

    2014-05-01

    Multi Drug Resistance (MDR) in Acinetobacter baumannii is one of the major threats for emerging nosocomial infections in hospital environment. Multidrug-resistance in A. baumannii may be due to the implementation of multi-combination resistance mechanisms such as β-lactamase synthesis, Penicillin-Binding Proteins (PBPs) changes, alteration in porin proteins and in efflux pumps against various existing classes of antibiotics. Multiple antibiotic resistance genes are involved in MDR. These resistance genes are transferred through plasmids, which are responsible for the dissemination of antibiotic resistance among Acinetobacter spp. In addition, these resistance genes may also have a tendency to interact with each other or with their gene products. Therefore, it becomes necessary to understand the impact of these interactions in antibiotic resistance mechanism. Hence, our study focuses on protein and gene network analysis on various resistance genes, to elucidate the role of the interacting proteins and to study their functional contribution towards antibiotic resistance. From the search tool for the retrieval of interacting gene/protein (STRING), a total of 168 functional partners for 15 resistance genes were extracted based on the confidence scoring system. The network study was then followed up with functional clustering of associated partners using molecular complex detection (MCODE). Later, we selected eight efficient clusters based on score. Interestingly, the associated protein we identified from the network possessed greater functional similarity with known resistance genes. This network-based approach on resistance genes of A. baumannii could help in identifying new genes/proteins and provide clues on their association in antibiotic resistance. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Genes and gene networks implicated in aggression related behaviour.

    PubMed

    Malki, Karim; Pain, Oliver; Du Rietz, Ebba; Tosto, Maria Grazia; Paya-Cano, Jose; Sandnabba, Kenneth N; de Boer, Sietse; Schalkwyk, Leonard C; Sluyter, Frans

    2014-10-01

    Aggressive behaviour is a major cause of mortality and morbidity. Despite of moderate heritability estimates, progress in identifying the genetic factors underlying aggressive behaviour has been limited. There are currently three genetic mouse models of high and low aggression created using selective breeding. This is the first study to offer a global transcriptomic characterization of the prefrontal cortex across all three genetic mouse models of aggression. A systems biology approach has been applied to transcriptomic data across the three pairs of selected inbred mouse strains (Turku Aggressive (TA) and Turku Non-Aggressive (TNA), Short Attack Latency (SAL) and Long Attack Latency (LAL) mice and North Carolina Aggressive (NC900) and North Carolina Non-Aggressive (NC100)), providing novel insight into the neurobiological mechanisms and genetics underlying aggression. First, weighted gene co-expression network analysis (WGCNA) was performed to identify modules of highly correlated genes associated with aggression. Probe sets belonging to gene modules uncovered by WGCNA were carried forward for network analysis using ingenuity pathway analysis (IPA). The RankProd non-parametric algorithm was then used to statistically evaluate expression differences across the genes belonging to modules significantly associated with aggression. IPA uncovered two pathways, involving NF-kB and MAPKs. The secondary RankProd analysis yielded 14 differentially expressed genes, some of which have previously been implicated in pathways associated with aggressive behaviour, such as Adrbk2. The results highlighted plausible candidate genes and gene networks implicated in aggression-related behaviour.

  1. Multiscale Embedded Gene Co-expression Network Analysis

    PubMed Central

    Song, Won-Min; Zhang, Bin

    2015-01-01

    Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma. PMID:26618778

  2. Multiscale Embedded Gene Co-expression Network Analysis.

    PubMed

    Song, Won-Min; Zhang, Bin

    2015-11-01

    Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

  3. Integration of biological networks and gene expression data using Cytoscape

    PubMed Central

    Cline, Melissa S; Smoot, Michael; Cerami, Ethan; Kuchinsky, Allan; Landys, Nerius; Workman, Chris; Christmas, Rowan; Avila-Campilo, Iliana; Creech, Michael; Gross, Benjamin; Hanspers, Kristina; Isserlin, Ruth; Kelley, Ryan; Killcoyne, Sarah; Lotia, Samad; Maere, Steven; Morris, John; Ono, Keiichiro; Pavlovic, Vuk; Pico, Alexander R; Vailaya, Aditya; Wang, Peng-Liang; Adler, Annette; Conklin, Bruce R; Hood, Leroy; Kuiper, Martin; Sander, Chris; Schmulevich, Ilya; Schwikowski, Benno; Warner, Guy J; Ideker, Trey; Bader, Gary D

    2013-01-01

    Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape. PMID:17947979

  4. Identification of Human Disease Genes from Interactome Network Using Graphlet Interaction

    PubMed Central

    Yang, Lun; Wei, Dong-Qing; Qi, Ying-Xin; Jiang, Zong-Lai

    2014-01-01

    Identifying genes related to human diseases, such as cancer and cardiovascular disease, etc., is an important task in biomedical research because of its applications in disease diagnosis and treatment. Interactome networks, especially protein-protein interaction networks, had been used to disease genes identification based on the hypothesis that strong candidate genes tend to closely relate to each other in some kinds of measure on the network. We proposed a new measure to analyze the relationship between network nodes which was called graphlet interaction. The graphlet interaction contained 28 different isomers. The results showed that the numbers of the graphlet interaction isomers between disease genes in interactome networks were significantly larger than random picked genes, while graphlet signatures were not. Then, we designed a new type of score, based on the network properties, to identify disease genes using graphlet interaction. The genes with higher scores were more likely to be disease genes, and all candidate genes were ranked according to their scores. Then the approach was evaluated by leave-one-out cross-validation. The precision of the current approach achieved 90% at about 10% recall, which was apparently higher than the previous three predominant algorithms, random walk, Endeavour and neighborhood based method. Finally, the approach was applied to predict new disease genes related to 4 common diseases, most of which were identified by other independent experimental researches. In conclusion, we demonstrate that the graphlet interaction is an effective tool to analyze the network properties of disease genes, and the scores calculated by graphlet interaction is more precise in identifying disease genes. PMID:24465923

  5. Identifying key genes associated with acute myocardial infarction

    PubMed Central

    Cheng, Ming; An, Shoukuan; Li, Junquan

    2017-01-01

    Abstract Background: This study aimed to identify key genes associated with acute myocardial infarction (AMI) by reanalyzing microarray data. Methods: Three gene expression profile datasets GSE66360, GSE34198, and GSE48060 were downloaded from GEO database. After data preprocessing, genes without heterogeneity across different platforms were subjected to differential expression analysis between the AMI group and the control group using metaDE package. P < .05 was used as the cutoff for a differentially expressed gene (DEG). The expression data matrices of DEGs were imported in ReactomeFIViz to construct a gene functional interaction (FI) network. Then, DEGs in each module were subjected to pathway enrichment analysis using DAVID. MiRNAs and transcription factors predicted to regulate target DEGs were identified. Quantitative real-time polymerase chain reaction (RT-PCR) was applied to verify the expression of genes. Result: A total of 913 upregulated genes and 1060 downregulated genes were identified in the AMI group. A FI network consists of 21 modules and DEGs in 12 modules were significantly enriched in pathways. The transcription factor-miRNA-gene network contains 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p. RT-PCR validations showed that expression levels of FOXO3 and MYBL2 were significantly increased in AMI, and expression levels of hsa-miR-21–5p and hsa-miR-30c-5p were obviously decreased in AMI. Conclusion: A total of 41 DEGs, such as SOCS3, VAPA, and COL5A2, are speculated to have roles in the pathogenesis of AMI; 2 transcription factors FOXO3 and MYBL2, and 2 miRNAs hsa-miR-21-5p and hsa-miR-30c-5p may be involved in the regulation of the expression of these DEGs. PMID:29049183

  6. Gene expression patterns combined with bioinformatics analysis identify genes associated with cholangiocarcinoma.

    PubMed

    Li, Chen; Shen, Weixing; Shen, Sheng; Ai, Zhilong

    2013-12-01

    To explore the molecular mechanisms of cholangiocarcinoma (CC), microarray technology was used to find biomarkers for early detection and diagnosis. The gene expression profiles from 6 patients with CC and 5 normal controls were downloaded from Gene Expression Omnibus and compared. As a result, 204 differentially co-expressed genes (DCGs) in CC patients compared to normal controls were identified using a computational bioinformatics analysis. These genes were mainly involved in coenzyme metabolic process, peptidase activity and oxidation reduction. A regulatory network was constructed by mapping the DCGs to known regulation data. Four transcription factors, FOXC1, ZIC2, NKX2-2 and GCGR, were hub nodes in the network. In conclusion, this study provides a set of targets useful for future investigations into molecular biomarker studies. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Weighted gene co-expression network analysis of expression data of monozygotic twins identifies specific modules and hub genes related to BMI.

    PubMed

    Wang, Weijing; Jiang, Wenjie; Hou, Lin; Duan, Haiping; Wu, Yili; Xu, Chunsheng; Tan, Qihua; Li, Shuxia; Zhang, Dongfeng

    2017-11-13

    The therapeutic management of obesity is challenging, hence further elucidating the underlying mechanisms of obesity development and identifying new diagnostic biomarkers and therapeutic targets are urgent and necessary. Here, we performed differential gene expression analysis and weighted gene co-expression network analysis (WGCNA) to identify significant genes and specific modules related to BMI based on gene expression profile data of 7 discordant monozygotic twins. In the differential gene expression analysis, it appeared that 32 differentially expressed genes (DEGs) were with a trend of up-regulation in twins with higher BMI when compared to their siblings. Categories of positive regulation of nitric-oxide synthase biosynthetic process, positive regulation of NF-kappa B import into nucleus, and peroxidase activity were significantly enriched within GO database and NF-kappa B signaling pathway within KEGG database. DEGs of NAMPT, TLR9, PTGS2, HBD, and PCSK1N might be associated with obesity. In the WGCNA, among the total 20 distinct co-expression modules identified, coral1 module (68 genes) had the strongest positive correlation with BMI (r = 0.56, P = 0.04) and disease status (r = 0.56, P = 0.04). Categories of positive regulation of phospholipase activity, high-density lipoprotein particle clearance, chylomicron remnant clearance, reverse cholesterol transport, intermediate-density lipoprotein particle, chylomicron, low-density lipoprotein particle, very-low-density lipoprotein particle, voltage-gated potassium channel complex, cholesterol transporter activity, and neuropeptide hormone activity were significantly enriched within GO database for this module. And alcoholism and cell adhesion molecules pathways were significantly enriched within KEGG database. Several hub genes, such as GAL, ASB9, NPPB, TBX2, IL17C, APOE, ABCG4, and APOC2 were also identified. The module eigengene of saddlebrown module (212 genes) was also significantly

  8. Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer.

    PubMed

    Zhou, Xionghui; Liu, Juan

    2014-01-01

    Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for

  9. Robust gene network analysis reveals alteration of the STAT5a network as a hallmark of prostate cancer.

    PubMed

    Reddy, Anupama; Huang, C Chris; Liu, Huiqing; Delisi, Charles; Nevalainen, Marja T; Szalma, Sandor; Bhanot, Gyan

    2010-01-01

    We develop a general method to identify gene networks from pair-wise correlations between genes in a microarray data set and apply it to a public prostate cancer gene expression data from 69 primary prostate tumors. We define the degree of a node as the number of genes significantly associated with the node and identify hub genes as those with the highest degree. The correlation network was pruned using transcription factor binding information in VisANT (http://visant.bu.edu/) as a biological filter. The reliability of hub genes was determined using a strict permutation test. Separate networks for normal prostate samples, and prostate cancer samples from African Americans (AA) and European Americans (EA) were generated and compared. We found that the same hubs control disease progression in AA and EA networks. Combining AA and EA samples, we generated networks for low low (<7) and high (≥7) Gleason grade tumors. A comparison of their major hubs with those of the network for normal samples identified two types of changes associated with disease: (i) Some hub genes increased their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with gain of regulatory control in cancer (e.g. possible turning on of oncogenes). (ii) Some hubs reduced their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with loss of regulatory control in cancer (e.g. possible loss of tumor suppressor genes). A striking result was that for both AA and EA tumor samples, STAT5a, CEBPB and EGR1 are major hubs that gain neighbors compared to the normal prostate network. Conversely, HIF-lα is a major hub that loses connections in the prostate cancer network compared to the normal prostate network. We also find that the degree of these hubs changes progressively from normal to low grade to high grade disease, suggesting that these hubs are master regulators of

  10. Functional Module Analysis for Gene Coexpression Networks with Network Integration.

    PubMed

    Zhang, Shuqin; Zhao, Hongyu; Ng, Michael K

    2015-01-01

    Network has been a general tool for studying the complex interactions between different genes, proteins, and other small molecules. Module as a fundamental property of many biological networks has been widely studied and many computational methods have been proposed to identify the modules in an individual network. However, in many cases, a single network is insufficient for module analysis due to the noise in the data or the tuning of parameters when building the biological network. The availability of a large amount of biological networks makes network integration study possible. By integrating such networks, more informative modules for some specific disease can be derived from the networks constructed from different tissues, and consistent factors for different diseases can be inferred. In this paper, we have developed an effective method for module identification from multiple networks under different conditions. The problem is formulated as an optimization model, which combines the module identification in each individual network and alignment of the modules from different networks together. An approximation algorithm based on eigenvector computation is proposed. Our method outperforms the existing methods, especially when the underlying modules in multiple networks are different in simulation studies. We also applied our method to two groups of gene coexpression networks for humans, which include one for three different cancers, and one for three tissues from the morbidly obese patients. We identified 13 modules with three complete subgraphs, and 11 modules with two complete subgraphs, respectively. The modules were validated through Gene Ontology enrichment and KEGG pathway enrichment analysis. We also showed that the main functions of most modules for the corresponding disease have been addressed by other researchers, which may provide the theoretical basis for further studying the modules experimentally.

  11. Fyn-Dependent Gene Networks in Acute Ethanol Sensitivity

    PubMed Central

    Farris, Sean P.; Miles, Michael F.

    2013-01-01

    Studies in humans and animal models document that acute behavioral responses to ethanol are predisposing factor for the risk of long-term drinking behavior. Prior microarray data from our laboratory document strain- and brain region-specific variation in gene expression profile responses to acute ethanol that may be underlying regulators of ethanol behavioral phenotypes. The non-receptor tyrosine kinase Fyn has previously been mechanistically implicated in the sedative-hypnotic response to acute ethanol. To further understand how Fyn may modulate ethanol behaviors, we used whole-genome expression profiling. We characterized basal and acute ethanol-evoked (3 g/kg) gene expression patterns in nucleus accumbens (NAC), prefrontal cortex (PFC), and ventral midbrain (VMB) of control and Fyn knockout mice. Bioinformatics analysis identified a set of Fyn-related gene networks differently regulated by acute ethanol across the three brain regions. In particular, our analysis suggested a coordinate basal decrease in myelin-associated gene expression within NAC and PFC as an underlying factor in sensitivity of Fyn null animals to ethanol sedation. An in silico analysis across the BXD recombinant inbred (RI) strains of mice identified a significant correlation between Fyn expression and a previously published ethanol loss-of-righting-reflex (LORR) phenotype. By combining PFC gene expression correlates to Fyn and LORR across multiple genomic datasets, we identified robust Fyn-centric gene networks related to LORR. Our results thus suggest that multiple system-wide changes exist within specific brain regions of Fyn knockout mice, and that distinct Fyn-dependent expression networks within PFC may be important determinates of the LORR due to acute ethanol. These results add to the interpretation of acute ethanol behavioral sensitivity in Fyn kinase null animals, and identify Fyn-centric gene networks influencing variance in ethanol LORR. Such networks may also inform future design

  12. A meta-analysis of public microarray data identifies biological regulatory networks in Parkinson's disease.

    PubMed

    Su, Lining; Wang, Chunjie; Zheng, Chenqing; Wei, Huiping; Song, Xiaoqing

    2018-04-13

    Parkinson's disease (PD) is a long-term degenerative disease that is caused by environmental and genetic factors. The networks of genes and their regulators that control the progression and development of PD require further elucidation. We examine common differentially expressed genes (DEGs) from several PD blood and substantia nigra (SN) microarray datasets by meta-analysis. Further we screen the PD-specific genes from common DEGs using GCBI. Next, we used a series of bioinformatics software to analyze the miRNAs, lncRNAs and SNPs associated with the common PD-specific genes, and then identify the mTF-miRNA-gene-gTF network. Our results identified 36 common DEGs in PD blood studies and 17 common DEGs in PD SN studies, and five of the genes were previously known to be associated with PD. Further study of the regulatory miRNAs associated with the common PD-specific genes revealed 14 PD-specific miRNAs in our study. Analysis of the mTF-miRNA-gene-gTF network about PD-specific genes revealed two feed-forward loops: one involving the SPRK2 gene, hsa-miR-19a-3p and SPI1, and the second involving the SPRK2 gene, hsa-miR-17-3p and SPI. The long non-coding RNA (lncRNA)-mediated regulatory network identified lncRNAs associated with PD-specific genes and PD-specific miRNAs. Moreover, single nucleotide polymorphism (SNP) analysis of the PD-specific genes identified two significant SNPs, and SNP analysis of the neurodegenerative disease-specific genes identified seven significant SNPs. Most of these SNPs are present in the 3'-untranslated region of genes and are controlled by several miRNAs. Our study identified a total of 53 common DEGs in PD patients compared with healthy controls in blood and brain datasets and five of these genes were previously linked with PD. Regulatory network analysis identified PD-specific miRNAs, associated long non-coding RNA and feed-forward loops, which contribute to our understanding of the mechanisms underlying PD. The SNPs identified in our

  13. Integrated network analysis identifies fight-club nodes as a class of hubs encompassing key putative switch genes that induce major transcriptome reprogramming during grapevine development.

    PubMed

    Palumbo, Maria Concetta; Zenoni, Sara; Fasoli, Marianna; Massonnet, Mélanie; Farina, Lorenzo; Castiglione, Filippo; Pezzotti, Mario; Paci, Paola

    2014-12-01

    We developed an approach that integrates different network-based methods to analyze the correlation network arising from large-scale gene expression data. By studying grapevine (Vitis vinifera) and tomato (Solanum lycopersicum) gene expression atlases and a grapevine berry transcriptomic data set during the transition from immature to mature growth, we identified a category named "fight-club hubs" characterized by a marked negative correlation with the expression profiles of neighboring genes in the network. A special subset named "switch genes" was identified, with the additional property of many significant negative correlations outside their own group in the network. Switch genes are involved in multiple processes and include transcription factors that may be considered master regulators of the previously reported transcriptome remodeling that marks the developmental shift from immature to mature growth. All switch genes, expressed at low levels in vegetative/green tissues, showed a significant increase in mature/woody organs, suggesting a potential regulatory role during the developmental transition. Finally, our analysis of tomato gene expression data sets showed that wild-type switch genes are downregulated in ripening-deficient mutants. The identification of known master regulators of tomato fruit maturation suggests our method is suitable for the detection of key regulators of organ development in different fleshy fruit crops. © 2014 American Society of Plant Biologists. All rights reserved.

  14. Identification of interactive gene networks: a novel approach in gene array profiling of myometrial events during guinea pig pregnancy.

    PubMed

    Mason, Clifford W; Swaan, Peter W; Weiner, Carl P

    2006-06-01

    The transition from myometrial quiescence to activation is poorly understood, and the analysis of array data is limited by the available data mining tools. We applied functional analysis and logical operations along regulatory gene networks to identify molecular processes and pathways underlying quiescence and activation. We analyzed some 18,400 transcripts and variants in guinea pig myometrium at stages corresponding to quiescence and activation, and compared them to the nonpregnant (control) counterpart using a functional mapping tool, MetaCore (GeneGo, St Joseph, MI) to identify novel gene networks composed of biological pathways during mid (MP) and late (LP) pregnancy. Genes altered during quiescence and or activation were identified following gene specific comparisons with myometrium from nonpregnant animals, and then linked to curated pathways and formulated networks. The MP and LP networks were subtracted from each other to identify unique genomic events during those periods. For example, changes 2-fold or greater in genes mediating protein biosynthesis, programmed cell death, microtubule polymerization, and microtubule based movement were noted during the transition to LP. We describe a novel approach combining microarrays and genetic data to identify networks associated with normal myometrial events. The resulting insights help identify potential biomarkers and permit future targeted investigations of these pathways or networks to confirm or refute their importance.

  15. BRAIN NETWORKS. Correlated gene expression supports synchronous activity in brain networks.

    PubMed

    Richiardi, Jonas; Altmann, Andre; Milazzo, Anna-Clare; Chang, Catie; Chakravarty, M Mallar; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Bromberg, Uli; Büchel, Christian; Conrod, Patricia; Fauth-Bühler, Mira; Flor, Herta; Frouin, Vincent; Gallinat, Jürgen; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Lemaître, Hervé; Mann, Karl F; Martinot, Jean-Luc; Nees, Frauke; Paus, Tomáš; Pausova, Zdenka; Rietschel, Marcella; Robbins, Trevor W; Smolka, Michael N; Spanagel, Rainer; Ströhle, Andreas; Schumann, Gunter; Hawrylycz, Mike; Poline, Jean-Baptiste; Greicius, Michael D

    2015-06-12

    During rest, brain activity is synchronized between different regions widely distributed throughout the brain, forming functional networks. However, the molecular mechanisms supporting functional connectivity remain undefined. We show that functional brain networks defined with resting-state functional magnetic resonance imaging can be recapitulated by using measures of correlated gene expression in a post mortem brain tissue data set. The set of 136 genes we identify is significantly enriched for ion channels. Polymorphisms in this set of genes significantly affect resting-state functional connectivity in a large sample of healthy adolescents. Expression levels of these genes are also significantly associated with axonal connectivity in the mouse. The results provide convergent, multimodal evidence that resting-state functional networks correlate with the orchestrated activity of dozens of genes linked to ion channel activity and synaptic function. Copyright © 2015, American Association for the Advancement of Science.

  16. Gene network-based analysis identifies two potential subtypes of small intestinal neuroendocrine tumors.

    PubMed

    Kidd, Mark; Modlin, Irvin M; Drozdov, Ignat

    2014-07-15

    Tumor transcriptomes contain information of critical value to understanding the different capacities of a cell at both a physiological and pathological level. In terms of clinical relevance, they provide information regarding the cellular "toolbox" e.g., pathways associated with malignancy and metastasis or drug dependency. Exploration of this resource can therefore be leveraged as a translational tool to better manage and assess neoplastic behavior. The availability of public genome-wide expression datasets, provide an opportunity to reassess neuroendocrine tumors at a more fundamental level. We hypothesized that stringent analysis of expression profiles as well as regulatory networks of the neoplastic cell would provide novel information that facilitates further delineation of the genomic basis of small intestinal neuroendocrine tumors. We re-analyzed two publically available small intestinal tumor transcriptomes using stringent quality control parameters and network-based approaches and validated expression of core secretory regulatory elements e.g., CPE, PCSK1, secretogranins, including genes involved in depolarization e.g., SCN3A, as well as transcription factors associated with neurodevelopment (NKX2-2, NeuroD1, INSM1) and glucose homeostasis (APLP1). The candidate metastasis-associated transcription factor, ST18, was highly expressed (>14-fold, p < 0.004). Genes previously associated with neoplasia, CEBPA and SDHD, were decreased in expression (-1.5 - -2, p < 0.02). Genomic interrogation indicated that intestinal tumors may consist of two different subtypes, serotonin-producing neoplasms and serotonin/substance P/tachykinin lesions. QPCR validation in an independent dataset (n = 13 neuroendocrine tumors), confirmed up-regulated expression of 87% of genes (13/15). An integrated cellular transcriptomic analysis of small intestinal neuroendocrine tumors identified that they are regulated at a developmental level, have key activation of hypoxic pathways (a known

  17. Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

    PubMed

    Nariai, N; Kim, S; Imoto, S; Miyano, S

    2004-01-01

    We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.

  18. A network-based, integrative study to identify core biological pathways that drive breast cancer clinical subtypes

    PubMed Central

    Dutta, B; Pusztai, L; Qi, Y; André, F; Lazar, V; Bianchini, G; Ueno, N; Agarwal, R; Wang, B; Shiang, C Y; Hortobagyi, G N; Mills, G B; Symmans, W F; Balázsi, G

    2012-01-01

    Background: The rapid collection of diverse genome-scale data raises the urgent need to integrate and utilise these resources for biological discovery or biomedical applications. For example, diverse transcriptomic and gene copy number variation data are currently collected for various cancers, but relatively few current methods are capable to utilise the emerging information. Methods: We developed and tested a data-integration method to identify gene networks that drive the biology of breast cancer clinical subtypes. The method simultaneously overlays gene expression and gene copy number data on protein–protein interaction, transcriptional-regulatory and signalling networks by identifying coincident genomic and transcriptional disturbances in local network neighborhoods. Results: We identified distinct driver-networks for each of the three common clinical breast cancer subtypes: oestrogen receptor (ER)+, human epidermal growth factor receptor 2 (HER2)+, and triple receptor-negative breast cancers (TNBC) from patient and cell line data sets. Driver-networks inferred from independent datasets were significantly reproducible. We also confirmed the functional relevance of a subset of randomly selected driver-network members for TNBC in gene knockdown experiments in vitro. We found that TNBC driver-network members genes have increased functional specificity to TNBC cell lines and higher functional sensitivity compared with genes selected by differential expression alone. Conclusion: Clinical subtype-specific driver-networks identified through data integration are reproducible and functionally important. PMID:22343619

  19. Coexpression network based on natural variation in human gene expression reveals gene interactions and functions

    PubMed Central

    Nayak, Renuka R.; Kearns, Michael; Spielman, Richard S.; Cheung, Vivian G.

    2009-01-01

    Genes interact in networks to orchestrate cellular processes. Analysis of these networks provides insights into gene interactions and functions. Here, we took advantage of normal variation in human gene expression to infer gene networks, which we constructed using correlations in expression levels of more than 8.5 million gene pairs in immortalized B cells from three independent samples. The resulting networks allowed us to identify biological processes and gene functions. Among the biological pathways, we found processes such as translation and glycolysis that co-occur in the same subnetworks. We predicted the functions of poorly characterized genes, including CHCHD2 and TMEM111, and provided experimental evidence that TMEM111 is part of the endoplasmic reticulum-associated secretory pathway. We also found that IFIH1, a susceptibility gene of type 1 diabetes, interacts with YES1, which plays a role in glucose transport. Furthermore, genes that predispose to the same diseases are clustered nonrandomly in the coexpression network, suggesting that networks can provide candidate genes that influence disease susceptibility. Therefore, our analysis of gene coexpression networks offers information on the role of human genes in normal and disease processes. PMID:19797678

  20. Fatigue-Related Gene Networks Identified in CD14+ Cells Isolated From HIV-Infected Patients—Part I: Research Findings

    PubMed Central

    Voss, Joachim G.; Dobra, Adrian; Morse, Caryn; Kovacs, Joseph A.; Danner, Robert L.; Munson, Peter J.; Logan, Carolea; Rangel, Zoila; Adelsberger, Joseph W.; McLaughlin, Mary; Adams, Larry D.; Raju, Raghavan; Dalakas, Marinos C.

    2016-01-01

    Purpose Human immunodeficiency virus (HIV)–related fatigue (HRF) is multicausal and potentially related to mitochondrial dysfunction caused by antiretroviral therapy with nucleoside reverse transcriptase inhibitors (NRTIs). Methodology The authors compared gene expression profiles of CD14+ cells of low versus high fatigued, NRTI-treated HIV patients to healthy controls (n = 5/group). The authors identified 32 genes predictive of low versus high fatigue and 33 genes predictive of healthy versus HIV infection. The authors constructed genetic networks to further elucidate the possible biological pathways in which these genes are involved. Relevance for nursing practice Genes including the actin cytoskeletal regulatory proteins Prokineticin 2 and Cofilin 2 along with mitochondrial inner membrane proteins are involved in multiple pathways and were predictors of fatigue status. Previously identified inflammatory and signaling genes were predictive of HIV status, clearly confirming our results and suggesting a possible further connection between mitochondrial function and HIV. Isolated CD14+ cells are easily accessible cells that could be used for further study of the connection between fatigue and mitochondrial function of HIV patients. Implication for Practice The findings from this pilot study take us one step closer to identifying biomarker targets for fatigue status and mitochondrial dysfunction. Specific biomarkers will be pertinent to the development of methodologies to diagnosis, monitor, and treat fatigue and mitochondrial dysfunction. PMID:23324479

  1. Construct and Compare Gene Coexpression Networks with DAPfinder and DAPview.

    PubMed

    Skinner, Jeff; Kotliarov, Yuri; Varma, Sudhir; Mine, Karina L; Yambartsev, Anatoly; Simon, Richard; Huyen, Yentram; Morgun, Andrey

    2011-07-14

    DAPfinder and DAPview are novel BRB-ArrayTools plug-ins to construct gene coexpression networks and identify significant differences in pairwise gene-gene coexpression between two phenotypes. Each significant difference in gene-gene association represents a Differentially Associated Pair (DAP). Our tools include several choices of filtering methods, gene-gene association metrics, statistical testing methods and multiple comparison adjustments. Network results are easily displayed in Cytoscape. Analyses of glioma experiments and microarray simulations demonstrate the utility of these tools. DAPfinder is a new friendly-user tool for reconstruction and comparison of biological networks.

  2. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network

    PubMed Central

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C.; Mohanty, Bidyut K.; Gao, Nan; Tang, Jijun; Lawson, Andrew B.; Hannun, Yusuf A.; Zheng, W. Jim

    2014-01-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes’ Ontology Fingerprints—a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms’ corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. PMID:25063300

  3. Markov State Models of gene regulatory networks.

    PubMed

    Chu, Brian K; Tse, Margaret J; Sato, Royce R; Read, Elizabeth L

    2017-02-06

    Gene regulatory networks with dynamics characterized by multiple stable states underlie cell fate-decisions. Quantitative models that can link molecular-level knowledge of gene regulation to a global understanding of network dynamics have the potential to guide cell-reprogramming strategies. Networks are often modeled by the stochastic Chemical Master Equation, but methods for systematic identification of key properties of the global dynamics are currently lacking. The method identifies the number, phenotypes, and lifetimes of long-lived states for a set of common gene regulatory network models. Application of transition path theory to the constructed Markov State Model decomposes global dynamics into a set of dominant transition paths and associated relative probabilities for stochastic state-switching. In this proof-of-concept study, we found that the Markov State Model provides a general framework for analyzing and visualizing stochastic multistability and state-transitions in gene networks. Our results suggest that this framework-adopted from the field of atomistic Molecular Dynamics-can be a useful tool for quantitative Systems Biology at the network scale.

  4. A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection.

    PubMed

    Carey, Michelle; Ramírez, Juan Camilo; Wu, Shuang; Wu, Hulin

    2018-07-01

    A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

  5. Identifying differentially expressed genes in cancer patients using a non-parameter Ising model.

    PubMed

    Li, Xumeng; Feltus, Frank A; Sun, Xiaoqian; Wang, James Z; Luo, Feng

    2011-10-01

    Identification of genes and pathways involved in diseases and physiological conditions is a major task in systems biology. In this study, we developed a novel non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also proposed a simulated annealing algorithm to find the optimal configuration of the Ising model. The Ising model was applied to two breast cancer microarray data sets. The results showed that more cancer-related DE sub-networks and genes were identified by the Ising model than those by the Markov random field model. Furthermore, cross-validation experiments showed that DE genes identified by Ising model can improve classification performance compared with DE genes identified by Markov random field model. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Diurnal Transcriptome and Gene Network Represented through Sparse Modeling in Brachypodium distachyon.

    PubMed

    Koda, Satoru; Onda, Yoshihiko; Matsui, Hidetoshi; Takahagi, Kotaro; Yamaguchi-Uehara, Yukiko; Shimizu, Minami; Inoue, Komaki; Yoshida, Takuhiro; Sakurai, Tetsuya; Honda, Hiroshi; Eguchi, Shinto; Nishii, Ryuei; Mochida, Keiichi

    2017-01-01

    We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX) model with a group smoothly clipped absolute deviation (SCAD) method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon . To reveal the diurnal changes in the transcriptome in B. distachyon , we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon . On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon , aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.

  7. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    PubMed

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  8. Gene network biological validity based on gene-gene interaction relevance.

    PubMed

    Gómez-Vela, Francisco; Díaz-Díaz, Norberto

    2014-01-01

    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.

  9. Network Inference Analysis Identifies an APRR2-Like Gene Linked to Pigment Accumulation in Tomato and Pepper Fruits1[W][OA

    PubMed Central

    Pan, Yu; Bradley, Glyn; Pyke, Kevin; Ball, Graham; Lu, Chungui; Fray, Rupert; Marshall, Alexandra; Jayasuta, Subhalai; Baxter, Charles; van Wijk, Rik; Boyden, Laurie; Cade, Rebecca; Chapman, Natalie H.; Fraser, Paul D.; Hodgman, Charlie; Seymour, Graham B.

    2013-01-01

    Carotenoids represent some of the most important secondary metabolites in the human diet, and tomato (Solanum lycopersicum) is a rich source of these health-promoting compounds. In this work, a novel and fruit-related regulator of pigment accumulation in tomato has been identified by artificial neural network inference analysis and its function validated in transgenic plants. A tomato fruit gene regulatory network was generated using artificial neural network inference analysis and transcription factor gene expression profiles derived from fruits sampled at various points during development and ripening. One of the transcription factor gene expression profiles with a sequence related to an Arabidopsis (Arabidopsis thaliana) ARABIDOPSIS PSEUDO RESPONSE REGULATOR2-LIKE gene (APRR2-Like) was up-regulated at the breaker stage in wild-type tomato fruits and, when overexpressed in transgenic lines, increased plastid number, area, and pigment content, enhancing the levels of chlorophyll in immature unripe fruits and carotenoids in red ripe fruits. Analysis of the transcriptome of transgenic lines overexpressing the tomato APPR2-Like gene revealed up-regulation of several ripening-related genes in the overexpression lines, providing a link between the expression of this tomato gene and the ripening process. A putative ortholog of the tomato APPR2-Like gene in sweet pepper (Capsicum annuum) was associated with pigment accumulation in fruit tissues. We conclude that the function of this gene is conserved across taxa and that it encodes a protein that has an important role in ripening. PMID:23292788

  10. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network.

    PubMed

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C; Mohanty, Bidyut K; Gao, Nan; Tang, Jijun; Lawson, Andrew B; Hannun, Yusuf A; Zheng, W Jim

    2014-10-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Cross-platform method for identifying candidate network biomarkers for prostate cancer.

    PubMed

    Jin, G; Zhou, X; Cui, K; Zhang, X-S; Chen, L; Wong, S T C

    2009-11-01

    Discovering biomarkers using mass spectrometry (MS) and microarray expression profiles is a promising strategy in molecular diagnosis. Here, the authors proposed a new pipeline for biomarker discovery that integrates disease information for proteins and genes, expression profiles in both genomic and proteomic levels, and protein-protein interactions (PPIs) to discover high confidence network biomarkers. Using this pipeline, a total of 474 molecules (genes and proteins) related to prostate cancer were identified and a prostate-cancer-related network (PCRN) was derived from the integrative information. Thus, a set of candidate network biomarkers were identified from multiple expression profiles composed by eight microarray datasets and one proteomics dataset. The network biomarkers with PPIs can accurately distinguish the prostate patients from the normal ones, which potentially provide more reliable hits of biomarker candidates than conventional biomarker discovery methods.

  12. Systematic analysis of microarray datasets to identify Parkinson's disease‑associated pathways and genes.

    PubMed

    Feng, Yinling; Wang, Xuefeng

    2017-03-01

    In order to investigate commonly disturbed genes and pathways in various brain regions of patients with Parkinson's disease (PD), microarray datasets from previous studies were collected and systematically analyzed. Different normalization methods were applied to microarray datasets from different platforms. A strategy combining gene co‑expression networks and clinical information was adopted, using weighted gene co‑expression network analysis (WGCNA) to screen for commonly disturbed genes in different brain regions of patients with PD. Functional enrichment analysis of commonly disturbed genes was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Co‑pathway relationships were identified with Pearson's correlation coefficient tests and a hypergeometric distribution‑based test. Common genes in pathway pairs were selected out and regarded as risk genes. A total of 17 microarray datasets from 7 platforms were retained for further analysis. Five gene coexpression modules were identified, containing 9,745, 736, 233, 101 and 93 genes, respectively. One module was significantly correlated with PD samples and thus the 736 genes it contained were considered to be candidate PD‑associated genes. Functional enrichment analysis demonstrated that these genes were implicated in oxidative phosphorylation and PD. A total of 44 pathway pairs and 52 risk genes were revealed, and a risk gene pathway relationship network was constructed. Eight modules were identified and were revealed to be associated with PD, cancers and metabolism. A number of disturbed pathways and risk genes were unveiled in PD, and these findings may help advance understanding of PD pathogenesis.

  13. Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks

    PubMed Central

    Wang, Yi Kan; Hurley, Daniel G.; Schnell, Santiago; Print, Cristin G.; Crampin, Edmund J.

    2013-01-01

    We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data. PMID:23967277

  14. Integrated Network Analysis Identifies Fight-Club Nodes as a Class of Hubs Encompassing Key Putative Switch Genes That Induce Major Transcriptome Reprogramming during Grapevine Development[W][OPEN

    PubMed Central

    Palumbo, Maria Concetta; Zenoni, Sara; Fasoli, Marianna; Massonnet, Mélanie; Farina, Lorenzo; Castiglione, Filippo; Pezzotti, Mario; Paci, Paola

    2014-01-01

    We developed an approach that integrates different network-based methods to analyze the correlation network arising from large-scale gene expression data. By studying grapevine (Vitis vinifera) and tomato (Solanum lycopersicum) gene expression atlases and a grapevine berry transcriptomic data set during the transition from immature to mature growth, we identified a category named “fight-club hubs” characterized by a marked negative correlation with the expression profiles of neighboring genes in the network. A special subset named “switch genes” was identified, with the additional property of many significant negative correlations outside their own group in the network. Switch genes are involved in multiple processes and include transcription factors that may be considered master regulators of the previously reported transcriptome remodeling that marks the developmental shift from immature to mature growth. All switch genes, expressed at low levels in vegetative/green tissues, showed a significant increase in mature/woody organs, suggesting a potential regulatory role during the developmental transition. Finally, our analysis of tomato gene expression data sets showed that wild-type switch genes are downregulated in ripening-deficient mutants. The identification of known master regulators of tomato fruit maturation suggests our method is suitable for the detection of key regulators of organ development in different fleshy fruit crops. PMID:25490918

  15. Extending gene ontology with gene association networks.

    PubMed

    Peng, Jiajie; Wang, Tao; Wang, Jixuan; Wang, Yadong; Chen, Jin

    2016-04-15

    Gene ontology (GO) is a widely used resource to describe the attributes for gene products. However, automatic GO maintenance remains to be difficult because of the complex logical reasoning and the need of biological knowledge that are not explicitly represented in the GO. The existing studies either construct whole GO based on network data or only infer the relations between existing GO terms. None is purposed to add new terms automatically to the existing GO. We proposed a new algorithm 'GOExtender' to efficiently identify all the connected gene pairs labeled by the same parent GO terms. GOExtender is used to predict new GO terms with biological network data, and connect them to the existing GO. Evaluation tests on biological process and cellular component categories of different GO releases showed that GOExtender can extend new GO terms automatically based on the biological network. Furthermore, we applied GOExtender to the recent release of GO and discovered new GO terms with strong support from literature. Software and supplementary document are available at www.msu.edu/%7Ejinchen/GOExtender jinchen@msu.edu or ydwang@hit.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Finding gene regulatory network candidates using the gene expression knowledge base.

    PubMed

    Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin

    2014-12-10

    Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

  17. TimeXNet Web: Identifying cellular response networks from diverse omics time-course data.

    PubMed

    Tan, Phit Ling; López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2018-05-14

    Condition-specific time-course omics profiles are frequently used to study cellular response to stimuli and identify associated signaling pathways. However, few online tools allow users to analyze multiple types of high-throughput time-course data. TimeXNet Web is a web server that extracts a time-dependent gene/protein response network from time-course transcriptomic, proteomic or phospho-proteomic data, and an input interaction network. It classifies the given genes/proteins into time-dependent groups based on the time of their highest activity and identifies the most probable paths connecting genes/proteins in consecutive groups. The response sub-network is enriched in activated genes/proteins and contains novel regulators that do not show any observable change in the input data. Users can view the resultant response network and analyze it for functional enrichment. TimeXNet Web supports the analysis of high-throughput data from multiple species by providing high quality, weighted protein-protein interaction networks for 12 model organisms. http://txnet.hgc.jp/. ashwini@hgc.jp. Supplementary data are available at Bioinformatics online.

  18. Identifying biomarkers of papillary renal cell carcinoma associated with pathological stage by weighted gene co-expression network analysis.

    PubMed

    He, Zhongshi; Sun, Min; Ke, Yuan; Lin, Rongjie; Xiao, Youde; Zhou, Shuliang; Zhao, Hong; Wang, Yan; Zhou, Fuxiang; Zhou, Yunfeng

    2017-04-25

    Although papillary renal cell carcinoma (PRCC) accounts for 10%-15% of renal cell carcinoma (RCC), no predictive molecular biomarker is currently applicable to guiding disease stage of PRCC patients. The mRNASeq data of PRCC and adjacent normal tissue in The Cancer Genome Atlas was analyzed to identify 1148 differentially expressed genes, on which weighted gene co-expression network analysis was performed. Then 11 co-expressed gene modules were identified. The highest association was found between blue module and pathological stage (r = 0.45) by Pearson's correlation analysis. Functional enrichment analysis revealed that biological processes of blue module focused on nuclear division, cell cycle phase, and spindle (all P < 1e-10). All 40 hub genes in blue module can distinguish localized (pathological stage I, II) from non-localized (pathological stage III, IV) PRCC (P < 0.01). A good molecular biomarker for pathological stage of RCC must be a prognostic gene in clinical practice. Survival analysis was performed to reversely validate if hub genes were associated with pathological stage. Survival analysis unveiled that all hub genes were associated with patient prognosis (P < 0.01).The validation cohort GSE2748 verified that 30 hub genes can differentiate localized from non-localized PRCC (P < 0.01), and 18 hub genes are prognosis-associated (P < 0.01).ROC curve indicated that the 17 hub genes exhibited excellent diagnostic efficiency for localized and non-localized PRCC (AUC > 0.7). These hub genes may serve as a biomarker and help to distinguish different pathological stages for PRCC patients.

  19. Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks

    PubMed Central

    Roy, Sushmita; Lagree, Stephen; Hou, Zhonggang; Thomson, James A.; Stewart, Ron; Gasch, Audrey P.

    2013-01-01

    Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development. PMID:24146602

  20. Identifying gene coexpression networks underlying the dynamic regulation of wood-forming tissues in Populus under diverse environmental conditions.

    PubMed

    Zinkgraf, Matthew; Liu, Lijun; Groover, Andrew; Filkov, Vladimir

    2017-06-01

    Trees modify wood formation through integration of environmental and developmental signals in complex but poorly defined transcriptional networks, allowing trees to produce woody tissues appropriate to diverse environmental conditions. In order to identify relationships among genes expressed during wood formation, we integrated data from new and publically available datasets in Populus. These datasets were generated from woody tissue and include transcriptome profiling, transcription factor binding, DNA accessibility and genome-wide association mapping experiments. Coexpression modules were calculated, each of which contains genes showing similar expression patterns across experimental conditions, genotypes and treatments. Conserved gene coexpression modules (four modules totaling 8398 genes) were identified that were highly preserved across diverse environmental conditions and genetic backgrounds. Functional annotations as well as correlations with specific experimental treatments associated individual conserved modules with distinct biological processes underlying wood formation, such as cell-wall biosynthesis, meristem development and epigenetic pathways. Module genes were also enriched for DNase I hypersensitivity footprints and binding from four transcription factors associated with wood formation. The conserved modules are excellent candidates for modeling core developmental pathways common to wood formation in diverse environments and genotypes, and serve as testbeds for hypothesis generation and testing for future studies. No claim to original US government works. New Phytologist © 2017 New Phytologist Trust.

  1. Regulatory network involving miRNAs and genes in serous ovarian carcinoma

    PubMed Central

    Zhao, Haiyan; Xu, Hao; Xue, Luchen

    2017-01-01

    Serous ovarian carcinoma (SOC) is one of the most life-threatening types of gynecological malignancy, but the pathogenesis of SOC remains unknown. Previous studies have indicated that differentially expressed genes and microRNAs (miRNAs) serve important functions in SOC. However, genes and miRNAs are identified in a disperse form, and limited information is known about the regulatory association between miRNAs and genes in SOC. In the present study, three regulatory networks were hierarchically constructed, including a differentially-expressed network, a related network and a global network to reveal associations between each factor. In each network, there were three types of factors, which were genes, miRNAs and transcription factors that interact with each other. Focus was placed on the differentially-expressed network, in which all genes and miRNAs were differentially expressed and therefore may have affected the development of SOC. Following the comparison and analysis between the three networks, a number of signaling pathways which demonstrated differentially expressed elements were highlighted. Subsequently, the upstream and downstream elements of differentially expressed miRNAs and genes were listed, and a number of key elements (differentially expressed miRNAs, genes and TFs predicted using the P-match method) were analyzed. The differentially expressed network partially illuminated the pathogenesis of SOC. It was hypothesized that if there was no differential expression of miRNAs and genes, SOC may be prevented and treatment may be identified. The present study provided a theoretical foundation for gene therapy for SOC. PMID:29113276

  2. Comparative analysis of protein interactome networks prioritizes candidate genes with cancer signatures.

    PubMed

    Li, Yongsheng; Sahni, Nidhi; Yi, Song

    2016-11-29

    Comprehensive understanding of human cancer mechanisms requires the identification of a thorough list of cancer-associated genes, which could serve as biomarkers for diagnoses and therapies in various types of cancer. Although substantial progress has been made in functional studies to uncover genes involved in cancer, these efforts are often time-consuming and costly. Therefore, it remains challenging to comprehensively identify cancer candidate genes. Network-based methods have accelerated this process through the analysis of complex molecular interactions in the cell. However, the extent to which various interactome networks can contribute to prediction of candidate genes responsible for cancer is still enigmatic. In this study, we evaluated different human protein-protein interactome networks and compared their application to cancer gene prioritization. Our results indicate that network analyses can increase the power to identify novel cancer genes. In particular, such predictive power can be enhanced with the use of unbiased systematic protein interaction maps for cancer gene prioritization. Functional analysis reveals that the top ranked genes from network predictions co-occur often with cancer-related terms in literature, and further, these candidate genes are indeed frequently mutated across cancers. Finally, our study suggests that integrating interactome networks with other omics datasets could provide novel insights into cancer-associated genes and underlying molecular mechanisms.

  3. Detection of gene communities in multi-networks reveals cancer drivers

    NASA Astrophysics Data System (ADS)

    Cantini, Laura; Medico, Enzo; Fortunato, Santo; Caselle, Michele

    2015-12-01

    We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein-protein interactions require a tight coregulation of the partners and that such a fine tuned regulation can be obtained only combining both the transcriptional and post-transcriptional layers of regulation. To extract the relevant biological information from the multi-network we studied its partition into communities. To this end we applied a consensus clustering algorithm based on state of art community detection methods. Even if our procedure is valid in principle for any pathology in this work we concentrate on gastric, lung, pancreas and colorectal cancer and identified from the enrichment analysis of the multi-network communities a set of candidate driver cancer genes. Some of them were already known oncogenes while a few are new. The combination of the different layers of information allowed us to extract from the multi-network indications on the regulatory pattern and functional role of both the already known and the new candidate driver genes.

  4. Identification of gene expression profiles and key genes in subchondral bone of osteoarthritis using weighted gene coexpression network analysis.

    PubMed

    Guo, Sheng-Min; Wang, Jian-Xiong; Li, Jin; Xu, Fang-Yuan; Wei, Quan; Wang, Hai-Ming; Huang, Hou-Qiang; Zheng, Si-Lin; Xie, Yu-Jie; Zhang, Chi

    2018-06-15

    Osteoarthritis (OA) significantly influences the quality life of people around the world. It is urgent to find an effective way to understand the genetic etiology of OA. We used weighted gene coexpression network analysis (WGCNA) to explore the key genes involved in the subchondral bone pathological process of OA. Fifty gene expression profiles of GSE51588 were downloaded from the Gene Expression Omnibus database. The OA-associated genes and gene ontologies were acquired from JuniorDoc. Weighted gene coexpression network analysis was used to find disease-related networks based on 21756 gene expression correlation coefficients, hub-genes with the highest connectivity in each module were selected, and the correlation between module eigengene and clinical traits was calculated. The genes in the traits-related gene coexpression modules were subject to functional annotation and pathway enrichment analysis using ClusterProfiler. A total of 73 gene modules were identified, of which, 12 modules were found with high connectivity with clinical traits. Five modules were found with enriched OA-associated genes. Moreover, 310 OA-associated genes were found, and 34 of them were among hub-genes in each module. Consequently, enrichment results indicated some key metabolic pathways, such as extracellular matrix (ECM)-receptor interaction (hsa04512), focal adhesion (hsa04510), the phosphatidylinositol 3'-kinase (PI3K)-Akt signaling pathway (PI3K-AKT) (hsa04151), transforming growth factor beta pathway, and Wnt pathway. We intended to identify some core genes, collagen (COL)6A3, COL6A1, ITGA11, BAMBI, and HCK, which could influence downstream signaling pathways once they were activated. In this study, we identified important genes within key coexpression modules, which associate with a pathological process of subchondral bone in OA. Functional analysis results could provide important information to understand the mechanism of OA. © 2018 Wiley Periodicals, Inc.

  5. Systematic Evaluation of Molecular Networks for Discovery of Disease Genes.

    PubMed

    Huang, Justin K; Carlin, Daniel E; Yu, Michael Ku; Zhang, Wei; Kreisberg, Jason F; Tamayo, Pablo; Ideker, Trey

    2018-04-25

    Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall. A general tendency is that performance scales with network size, suggesting that new interaction discovery currently outweighs the detrimental effects of false positives. Correcting for size, we find that the DIP network provides the highest efficiency (value per interaction). Based on these results, we create a parsimonious composite network with both high efficiency and performance. This work provides a benchmark for selection of molecular networks in human disease research. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Gene expression profiling combined with bioinformatics analysis identify biomarkers for Parkinson disease.

    PubMed

    Diao, Hongyu; Li, Xinxing; Hu, Sheng; Liu, Yunhui

    2012-01-01

    Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result.

  7. Gene Expression Profiling Combined with Bioinformatics Analysis Identify Biomarkers for Parkinson Disease

    PubMed Central

    Diao, Hongyu; Li, Xinxing; Hu, Sheng; Liu, Yunhui

    2012-01-01

    Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result. PMID:23284986

  8. GENE EXPRESSION NETWORKS

    EPA Science Inventory

    "Gene expression network" is the term used to describe the interplay, simple or complex, between two or more gene products in performing a specific cellular function. Although the delineation of such networks is complicated by the existence of multiple and subtle types of intera...

  9. Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

    PubMed Central

    Hwang, Sohyun; Rhee, Seung Y; Marcotte, Edward M; Lee, Insuk

    2012-01-01

    AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests. PMID:21886106

  10. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate

  11. Causal network analysis of head and neck keloid tissue identifies potential master regulators.

    PubMed

    Garcia-Rodriguez, Laura; Jones, Lamont; Chen, Kang Mei; Datta, Indrani; Divine, George; Worsham, Maria J

    2016-10-01

    To generate novel insights and hypotheses in keloid development from potential master regulators. Prospective cohort. Six fresh keloid and six normal skin samples from 12 anonymous donors were used in a prospective cohort study. Genome-wide profiling was done previously on the cohort using the Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA). The 190 statistically significant CpG islands between keloid and normal tissue mapped to 152 genes (P < .05). The top 10 statistically significant genes (VAMP5, ACTR3C, GALNT3, KCNAB2, LRRC61, SCML4, SYNGR1, TNS1, PLEKHG5, PPP1R13-α, false discovery rate <.015) were uploaded into the Ingenuity Pathway Analysis software's Causal Network Analysis (QIAGEN, Redwood City, CA). To reflect expected gene expression direction in the context of methylation changes, the inverse of the methylation ratio from keloid versus normal tissue was used for the analysis. Causal Network Analysis identified disease-specific master regulator molecules based on downstream differentially expressed keloid-specific genes and expected directionality of expression (hypermethylated vs. hypomethylated). Causal Network Analysis software identified four hierarchical networks that included four master regulators (pyroxamide, tributyrin, PRKG2, and PENK) and 19 intermediate regulators. Causal Network Analysis of differentiated methylated gene data of keloid versus normal skin demonstrated four causal networks with four master regulators. These hierarchical networks suggest potential driver roles for their downstream keloid gene targets in the pathogenesis of the keloid phenotype, likely triggered due to perturbation/injury to normal tissue. NA Laryngoscope, 126:E319-E324, 2016. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.

  12. Applying Multivariate Adaptive Splines to Identify Genes With Expressions Varying After Diagnosis in Microarray Experiments.

    PubMed

    Duan, Fenghai; Xu, Ye

    2017-01-01

    To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.

  13. Integrative Analysis of GWASs, Human Protein Interaction, and Gene Expression Identified Gene Modules Associated With BMDs

    PubMed Central

    He, Hao; Zhang, Lei; Li, Jian; Wang, Yu-Ping; Zhang, Ji-Gang; Shen, Jie; Guo, Yan-Fang

    2014-01-01

    Context: To date, few systems genetics studies in the bone field have been performed. We designed our study from a systems-level perspective by integrating genome-wide association studies (GWASs), human protein-protein interaction (PPI) network, and gene expression to identify gene modules contributing to osteoporosis risk. Methods: First we searched for modules significantly enriched with bone mineral density (BMD)-associated genes in human PPI network by using 2 large meta-analysis GWAS datasets through a dense module search algorithm. One included 7 individual GWAS samples (Meta7). The other was from the Genetic Factors for Osteoporosis Consortium (GEFOS2). One was assigned as a discovery dataset and the other as an evaluation dataset, and vice versa. Results: In total, 42 modules and 129 modules were identified significantly in both Meta7 and GEFOS2 datasets for femoral neck and spine BMD, respectively. There were 3340 modules identified for hip BMD only in Meta7. As candidate modules, they were assessed for the biological relevance to BMD by gene set enrichment analysis in 2 expression profiles generated from circulating monocytes in subjects with low versus high BMD values. Interestingly, there were 2 modules significantly enriched in monocytes from the low BMD group in both gene expression datasets (nominal P value <.05). Two modules had 16 nonredundant genes. Functional enrichment analysis revealed that both modules were enriched for genes involved in Wnt receptor signaling and osteoblast differentiation. Conclusion: We highlighted 2 modules and novel genes playing important roles in the regulation of bone mass, providing important clues for therapeutic approaches for osteoporosis. PMID:25119315

  14. Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning.

    PubMed

    Wu, Mengmeng; Zeng, Wanwen; Liu, Wenqiang; Lv, Hairong; Chen, Ting; Jiang, Rui

    2018-06-03

    Genome-wide association studies (GWAS) have successfully discovered a number of disease-associated genetic variants in the past decade, providing an unprecedented opportunity for deciphering genetic basis of human inherited diseases. However, it is still a challenging task to extract biological knowledge from the GWAS data, due to such issues as missing heritability and weak interpretability. Indeed, the fact that the majority of discovered loci fall into noncoding regions without clear links to genes has been preventing the characterization of their functions and appealing for a sophisticated approach to bridge genetic and genomic studies. Towards this problem, network-based prioritization of candidate genes, which performs integrated analysis of gene networks with GWAS data, has emerged as a promising direction and attracted much attention. However, most existing methods overlook the sparse and noisy properties of gene networks and thus may lead to suboptimal performance. Motivated by this understanding, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize candidate genes for complex diseases. We leveraged a technique called the network representation learning to embed a gene network into a compact and robust feature space, and then designed a hierarchical statistical model to integrate features of multiple gene networks with GWAS data for the effective inference of genes associated with a disease of interest. We applied our method to six complex diseases and demonstrated the superior performance of REGENT over existing approaches in recovering known disease-associated genes. We further conducted a pathway analysis and showed that the ability of REGENT to discover disease-associated pathways. We expect to see applications of our method to a broad spectrum of diseases for post-GWAS analysis. REGENT is freely available at https://github.com/wmmthu/REGENT. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Identifying module biomarkers from gastric cancer by differential correlation network

    PubMed Central

    Liu, Xiaoping; Chang, Xiao

    2016-01-01

    Gastric cancer (stomach cancer) is a severe disease caused by dysregulation of many functionally correlated genes or pathways instead of the mutation of individual genes. Systematic identification of gastric cancer biomarkers can provide insights into the mechanisms underlying this deadly disease and help in the development of new drugs. In this paper, we present a novel network-based approach to predict module biomarkers of gastric cancer that can effectively distinguish the disease from normal samples. Specifically, by assuming that gastric cancer has mainly resulted from dysfunction of biomolecular networks rather than individual genes in an organism, the genes in the module biomarkers are potentially related to gastric cancer. Finally, we identified a module biomarker with 27 genes, and by comparing the module biomarker with known gastric cancer biomarkers, we found that our module biomarker exhibited a greater ability to diagnose the samples with gastric cancer. PMID:27703371

  16. Network Analysis of Human Genes Influencing Susceptibility to Mycobacterial Infections

    PubMed Central

    Lipner, Ettie M.; Garcia, Benjamin J.; Strong, Michael

    2016-01-01

    Tuberculosis and nontuberculous mycobacterial infections constitute a high burden of pulmonary disease in humans, resulting in over 1.5 million deaths per year. Building on the premise that genetic factors influence the instance, progression, and defense of infectious disease, we undertook a systems biology approach to investigate relationships among genetic factors that may play a role in increased susceptibility or control of mycobacterial infections. We combined literature and database mining with network analysis and pathway enrichment analysis to examine genes, pathways, and networks, involved in the human response to Mycobacterium tuberculosis and nontuberculous mycobacterial infections. This approach allowed us to examine functional relationships among reported genes, and to identify novel genes and enriched pathways that may play a role in mycobacterial susceptibility or control. Our findings suggest that the primary pathways and genes influencing mycobacterial infection control involve an interplay between innate and adaptive immune proteins and pathways. Signaling pathways involved in autoimmune disease were significantly enriched as revealed in our networks. Mycobacterial disease susceptibility networks were also examined within the context of gene-chemical relationships, in order to identify putative drugs and nutrients with potential beneficial immunomodulatory or anti-mycobacterial effects. PMID:26751573

  17. A hybrid network-based method for the detection of disease-related genes

    NASA Astrophysics Data System (ADS)

    Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene

    2018-02-01

    Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.

  18. Differential Network Analyses of Alzheimer’s Disease Identify Early Events in Alzheimer’s Disease Pathology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xia, Jing; Rocke, David M.; Perry, George

    In late-onset Alzheimer’s disease (AD), multiple brain regions are not affected simultaneously. Comparing the gene expression of the affected regions to identify the differences in the biological processes perturbed can lead to greater insight into AD pathogenesis and early characteristics. We identified differentially expressed (DE) genes from single cell microarray data of four AD affected brain regions: entorhinal cortex (EC), hippocampus (HIP), posterior cingulate cortex (PCC), and middle temporal gyrus (MTG). We organized the DE genes in the four brain regions into region-specific gene coexpression networks. Differential neighborhood analyses in the coexpression networks were performed to identify genes with lowmore » topological overlap (TO) of their direct neighbors. The low TO genes were used to characterize the biological differences between two regions. Our analyses show that increased oxidative stress, along with alterations in lipid metabolism in neurons, may be some of the very early events occurring in AD pathology. Cellular defense mechanisms try to intervene but fail, finally resulting in AD pathology as the disease progresses. Furthermore, disease annotation of the low TO genes in two independent protein interaction networks has resulted in association between cancer, diabetes, renal diseases, and cardiovascular diseases.« less

  19. Differential Network Analyses of Alzheimer’s Disease Identify Early Events in Alzheimer’s Disease Pathology

    DOE PAGES

    Xia, Jing; Rocke, David M.; Perry, George; ...

    2014-01-01

    In late-onset Alzheimer’s disease (AD), multiple brain regions are not affected simultaneously. Comparing the gene expression of the affected regions to identify the differences in the biological processes perturbed can lead to greater insight into AD pathogenesis and early characteristics. We identified differentially expressed (DE) genes from single cell microarray data of four AD affected brain regions: entorhinal cortex (EC), hippocampus (HIP), posterior cingulate cortex (PCC), and middle temporal gyrus (MTG). We organized the DE genes in the four brain regions into region-specific gene coexpression networks. Differential neighborhood analyses in the coexpression networks were performed to identify genes with lowmore » topological overlap (TO) of their direct neighbors. The low TO genes were used to characterize the biological differences between two regions. Our analyses show that increased oxidative stress, along with alterations in lipid metabolism in neurons, may be some of the very early events occurring in AD pathology. Cellular defense mechanisms try to intervene but fail, finally resulting in AD pathology as the disease progresses. Furthermore, disease annotation of the low TO genes in two independent protein interaction networks has resulted in association between cancer, diabetes, renal diseases, and cardiovascular diseases.« less

  20. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks.

    PubMed

    Wu, Siqi; Joseph, Antony; Hammonds, Ann S; Celniker, Susan E; Yu, Bin; Frise, Erwin

    2016-04-19

    Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set ofDrosophilaearly embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation ofDrosophilaexpression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. The performance of PP with theDrosophiladata suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.

  1. Uncovering co-expression gene network modules regulating fruit acidity in diverse apples.

    PubMed

    Bai, Yang; Dougherty, Laura; Cheng, Lailiang; Zhong, Gan-Yuan; Xu, Kenong

    2015-08-16

    Acidity is a major contributor to fruit quality. Several organic acids are present in apple fruit, but malic acid is predominant and determines fruit acidity. The trait is largely controlled by the Malic acid (Ma) locus, underpinning which Ma1 that putatively encodes a vacuolar aluminum-activated malate transporter1 (ALMT1)-like protein is a strong candidate gene. We hypothesize that fruit acidity is governed by a gene network in which Ma1 is key member. The goal of this study is to identify the gene network and the potential mechanisms through which the network operates. Guided by Ma1, we analyzed the transcriptomes of mature fruit of contrasting acidity from six apple accessions of genotype Ma_ (MaMa or Mama) and four of mama using RNA-seq and identified 1301 fruit acidity associated genes, among which 18 were most significant acidity genes (MSAGs). Network inferring using weighted gene co-expression network analysis (WGCNA) revealed five co-expression gene network modules of significant (P < 0.001) correlation with malate. Of these, the Ma1 containing module (Turquoise) of 336 genes showed the highest correlation (0.79). We also identified 12 intramodular hub genes from each of the five modules and 18 enriched gene ontology (GO) terms and MapMan sub-bines, including two GO terms (GO:0015979 and GO:0009765) and two MapMap sub-bins (1.3.4 and 1.1.1.1) related to photosynthesis in module Turquoise. Using Lemon-Tree algorithms, we identified 12 regulator genes of probabilistic scores 35.5-81.0, including MDP0000525602 (a LLR receptor kinase), MDP0000319170 (an IQD2-like CaM binding protein) and MDP0000190273 (an EIN3-like transcription factor) of greater interest for being one of the 18 MSAGs or one of the 12 intramodular hub genes in Turquoise, and/or a regulator to the cluster containing Ma1. The most relevant finding of this study is the identification of the MSAGs, intramodular hub genes, enriched photosynthesis related processes, and regulator genes in a

  2. Network-based analysis of differentially expressed genes in cerebrospinal fluid (CSF) and blood reveals new candidate genes for multiple sclerosis

    PubMed Central

    Safari-Alighiarloo, Nahid; Taghizadeh, Mohammad; Tabatabaei, Seyyed Mohammad; Namaki, Saeed

    2016-01-01

    Background The involvement of multiple genes and missing heritability, which are dominant in complex diseases such as multiple sclerosis (MS), entail using network biology to better elucidate their molecular basis and genetic factors. We therefore aimed to integrate interactome (protein–protein interaction (PPI)) and transcriptomes data to construct and analyze PPI networks for MS disease. Methods Gene expression profiles in paired cerebrospinal fluid (CSF) and peripheral blood mononuclear cells (PBMCs) samples from MS patients, sampled in relapse or remission and controls, were analyzed. Differentially expressed genes which determined only in CSF (MS vs. control) and PBMCs (relapse vs. remission) separately integrated with PPI data to construct the Query-Query PPI (QQPPI) networks. The networks were further analyzed to investigate more central genes, functional modules and complexes involved in MS progression. Results The networks were analyzed and high centrality genes were identified. Exploration of functional modules and complexes showed that the majority of high centrality genes incorporated in biological pathways driving MS pathogenesis. Proteasome and spliceosome were also noticeable in enriched pathways in PBMCs (relapse vs. remission) which were identified by both modularity and clique analyses. Finally, STK4, RB1, CDKN1A, CDK1, RAC1, EZH2, SDCBP genes in CSF (MS vs. control) and CDC37, MAP3K3, MYC genes in PBMCs (relapse vs. remission) were identified as potential candidate genes for MS, which were the more central genes involved in biological pathways. Discussion This study showed that network-based analysis could explicate the complex interplay between biological processes underlying MS. Furthermore, an experimental validation of candidate genes can lead to identification of potential therapeutic targets. PMID:28028462

  3. Gene co-expression networks shed light into diseases of brain iron accumulation

    PubMed Central

    Bettencourt, Conceição; Forabosco, Paola; Wiethoff, Sarah; Heidari, Moones; Johnstone, Daniel M.; Botía, Juan A.; Collingwood, Joanna F.; Hardy, John; Milward, Elizabeth A.; Ryten, Mina; Houlden, Henry

    2016-01-01

    Aberrant brain iron deposition is observed in both common and rare neurodegenerative disorders, including those categorized as Neurodegeneration with Brain Iron Accumulation (NBIA), which are characterized by focal iron accumulation in the basal ganglia. Two NBIA genes are directly involved in iron metabolism, but whether other NBIA-related genes also regulate iron homeostasis in the human brain, and whether aberrant iron deposition contributes to neurodegenerative processes remains largely unknown. This study aims to expand our understanding of these iron overload diseases and identify relationships between known NBIA genes and their main interacting partners by using a systems biology approach. We used whole-transcriptome gene expression data from human brain samples originating from 101 neuropathologically normal individuals (10 brain regions) to generate weighted gene co-expression networks and cluster the 10 known NBIA genes in an unsupervised manner. We investigated NBIA-enriched networks for relevant cell types and pathways, and whether they are disrupted by iron loading in NBIA diseased tissue and in an in vivo mouse model. We identified two basal ganglia gene co-expression modules significantly enriched for NBIA genes, which resemble neuronal and oligodendrocytic signatures. These NBIA gene networks are enriched for iron-related genes, and implicate synapse and lipid metabolism related pathways. Our data also indicates that these networks are disrupted by excessive brain iron loading. We identified multiple cell types in the origin of NBIA disorders. We also found unforeseen links between NBIA networks and iron-related processes, and demonstrate convergent pathways connecting NBIAs and phenotypically overlapping diseases. Our results are of further relevance for these diseases by providing candidates for new causative genes and possible points for therapeutic intervention. PMID:26707700

  4. Patterns of Metabolite Changes Identified from Large-Scale Gene Perturbations in Arabidopsis Using a Genome-Scale Metabolic Network1[OPEN

    PubMed Central

    Kim, Taehyong; Dreher, Kate; Nilo-Poyanco, Ricardo; Lee, Insuk; Fiehn, Oliver; Lange, Bernd Markus; Nikolau, Basil J.; Sumner, Lloyd; Welti, Ruth; Wurtele, Eve S.; Rhee, Seung Y.

    2015-01-01

    Metabolomics enables quantitative evaluation of metabolic changes caused by genetic or environmental perturbations. However, little is known about how perturbing a single gene changes the metabolic system as a whole and which network and functional properties are involved in this response. To answer this question, we investigated the metabolite profiles from 136 mutants with single gene perturbations of functionally diverse Arabidopsis (Arabidopsis thaliana) genes. Fewer than 10 metabolites were changed significantly relative to the wild type in most of the mutants, indicating that the metabolic network was robust to perturbations of single metabolic genes. These changed metabolites were closer to each other in a genome-scale metabolic network than expected by chance, supporting the notion that the genetic perturbations changed the network more locally than globally. Surprisingly, the changed metabolites were close to the perturbed reactions in only 30% of the mutants of the well-characterized genes. To determine the factors that contributed to the distance between the observed metabolic changes and the perturbation site in the network, we examined nine network and functional properties of the perturbed genes. Only the isozyme number affected the distance between the perturbed reactions and changed metabolites. This study revealed patterns of metabolic changes from large-scale gene perturbations and relationships between characteristics of the perturbed genes and metabolic changes. PMID:25670818

  5. Influence maximization in time bounded network identifies transcription factors regulating perturbed pathways

    PubMed Central

    Jo, Kyuri; Jung, Inuk; Moon, Ji Hwan; Kim, Sun

    2016-01-01

    Motivation: To understand the dynamic nature of the biological process, it is crucial to identify perturbed pathways in an altered environment and also to infer regulators that trigger the response. Current time-series analysis methods, however, are not powerful enough to identify perturbed pathways and regulators simultaneously. Widely used methods include methods to determine gene sets such as differentially expressed genes or gene clusters and these genes sets need to be further interpreted in terms of biological pathways using other tools. Most pathway analysis methods are not designed for time series data and they do not consider gene-gene influence on the time dimension. Results: In this article, we propose a novel time-series analysis method TimeTP for determining transcription factors (TFs) regulating pathway perturbation, which narrows the focus to perturbed sub-pathways and utilizes the gene regulatory network and protein–protein interaction network to locate TFs triggering the perturbation. TimeTP first identifies perturbed sub-pathways that propagate the expression changes along the time. Starting points of the perturbed sub-pathways are mapped into the network and the most influential TFs are determined by influence maximization technique. The analysis result is visually summarized in TF-Pathway map in time clock. TimeTP was applied to PIK3CA knock-in dataset and found significant sub-pathways and their regulators relevant to the PIP3 signaling pathway. Availability and Implementation: TimeTP is implemented in Python and available at http://biohealth.snu.ac.kr/software/TimeTP/. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: sunkim.bioinfo@snu.ac.kr PMID:27307609

  6. Gene expression profiles analysis identifies key genes for acute lung injury in patients with sepsis.

    PubMed

    Guo, Zhiqiang; Zhao, Chuncheng; Wang, Zheng

    2014-09-26

    To identify critical genes and biological pathways in acute lung injury (ALI), a comparative analysis of gene expression profiles of patients with ALI + sepsis compared with patients with sepsis alone were performed with bioinformatic tools. GSE10474 was downloaded from Gene Expression Omnibus, including a collective of 13 whole blood samples with ALI + sepsis and 21 whole blood samples with sepsis alone. After pre-treatment with robust multichip averaging (RMA) method, differential analysis was conducted using simpleaffy package based upon t-test and fold change. Hierarchical clustering was also performed using function hclust from package stats. Beisides, functional enrichment analysis was conducted using iGepros. Moreover, the gene regulatory network was constructed with information from Kyoto Encyclopedia of Genes and Genomes (KEGG) and then visualized by Cytoscape. A total of 128 differentially expressed genes (DEGs) were identified, including 47 up- and 81 down-regulated genes. The significantly enriched functions included negative regulation of cell proliferation, regulation of response to stimulus and cellular component morphogenesis. A total of 27 DEGs were significantly enriched in 16 KEGG pathways, such as protein digestion and absorption, fatty acid metabolism, amoebiasis, etc. Furthermore, the regulatory network of these 27 DEGs was constructed, which involved several key genes, including protein tyrosine kinase 2 (PTK2), v-src avian sarcoma (SRC) and Caveolin 2 (CAV2). PTK2, SRC and CAV2 may be potential markers for diagnosis and treatment of ALI. The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/5865162912987143.

  7. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining

    PubMed Central

    2012-01-01

    Background Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Results Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction

  8. Analysis of Gene Regulatory Networks of Maize in Response to Nitrogen.

    PubMed

    Jiang, Lu; Ball, Graham; Hodgman, Charlie; Coules, Anne; Zhao, Han; Lu, Chungui

    2018-03-08

    Nitrogen (N) fertilizer has a major influence on the yield and quality. Understanding and optimising the response of crop plants to nitrogen fertilizer usage is of central importance in enhancing food security and agricultural sustainability. In this study, the analysis of gene regulatory networks reveals multiple genes and biological processes in response to N. Two microarray studies have been used to infer components of the nitrogen-response network. Since they used different array technologies, a map linking the two probe sets to the maize B73 reference genome has been generated to allow comparison. Putative Arabidopsis homologues of maize genes were used to query the Biological General Repository for Interaction Datasets (BioGRID) network, which yielded the potential involvement of three transcription factors (TFs) (GLK5, MADS64 and bZIP108) and a Calcium-dependent protein kinase. An Artificial Neural Network was used to identify influential genes and retrieved bZIP108 and WRKY36 as significant TFs in both microarray studies, along with genes for Asparagine Synthetase, a dual-specific protein kinase and a protein phosphatase. The output from one study also suggested roles for microRNA (miRNA) 399b and Nin-like Protein 15 (NLP15). Co-expression-network analysis of TFs with closely related profiles to known Nitrate-responsive genes identified GLK5, GLK8 and NLP15 as candidate regulators of genes repressed under low Nitrogen conditions, while bZIP108 might play a role in gene activation.

  9. Systematic Evaluation of Molecular Networks for Discovery of Disease Genes. | Office of Cancer Genomics

    Cancer.gov

    Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall.

  10. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method

    PubMed Central

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-01-01

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs. PMID:29113310

  11. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    PubMed

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  12. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2016-07-15

    Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti

  13. LGscore: A method to identify disease-related genes using biological literature and Google data.

    PubMed

    Kim, Jeongwoo; Kim, Hyunjin; Yoon, Youngmi; Park, Sanghyun

    2015-04-01

    Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which identifies disease-related genes using Google data and literature data. To implement this method, first, we construct a disease-related gene network using text-mining results. We then extract gene-gene interactions based on co-occurrences in abstract data obtained from PubMed, and calculate the weights of edges in the gene network by means of Z-scoring. The weights contain two values: the frequency and the Google search results. The frequency value is extracted from literature data, and the Google search result is obtained using Google. We assign a score to each gene through a network analysis. We assume that genes with a large number of links and numerous Google search results and frequency values are more likely to be involved in disease. For validation, we investigated the top 20 inferred genes for five different diseases using answer sets. The answer sets comprised six databases that contain information on disease-gene relationships. We identified a significant number of disease-related genes as well as candidate genes for Alzheimer's disease, diabetes, colon cancer, lung cancer, and prostate cancer. Our method was up to 40% more accurate than existing methods. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Net Venn - An integrated network analysis web platform for gene lists

    USDA-ARS?s Scientific Manuscript database

    Many lists containing biological identifiers such as gene lists have been generated in various genomics projects. Identifying the overlap among gene lists can enable us to understand the similarities and differences between the datasets. Here, we present an interactome network-based web application...

  15. Gene co-expression networks shed light into diseases of brain iron accumulation.

    PubMed

    Bettencourt, Conceição; Forabosco, Paola; Wiethoff, Sarah; Heidari, Moones; Johnstone, Daniel M; Botía, Juan A; Collingwood, Joanna F; Hardy, John; Milward, Elizabeth A; Ryten, Mina; Houlden, Henry

    2016-03-01

    Aberrant brain iron deposition is observed in both common and rare neurodegenerative disorders, including those categorized as Neurodegeneration with Brain Iron Accumulation (NBIA), which are characterized by focal iron accumulation in the basal ganglia. Two NBIA genes are directly involved in iron metabolism, but whether other NBIA-related genes also regulate iron homeostasis in the human brain, and whether aberrant iron deposition contributes to neurodegenerative processes remains largely unknown. This study aims to expand our understanding of these iron overload diseases and identify relationships between known NBIA genes and their main interacting partners by using a systems biology approach. We used whole-transcriptome gene expression data from human brain samples originating from 101 neuropathologically normal individuals (10 brain regions) to generate weighted gene co-expression networks and cluster the 10 known NBIA genes in an unsupervised manner. We investigated NBIA-enriched networks for relevant cell types and pathways, and whether they are disrupted by iron loading in NBIA diseased tissue and in an in vivo mouse model. We identified two basal ganglia gene co-expression modules significantly enriched for NBIA genes, which resemble neuronal and oligodendrocytic signatures. These NBIA gene networks are enriched for iron-related genes, and implicate synapse and lipid metabolism related pathways. Our data also indicates that these networks are disrupted by excessive brain iron loading. We identified multiple cell types in the origin of NBIA disorders. We also found unforeseen links between NBIA networks and iron-related processes, and demonstrate convergent pathways connecting NBIAs and phenotypically overlapping diseases. Our results are of further relevance for these diseases by providing candidates for new causative genes and possible points for therapeutic intervention. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  16. NetDecoder: a network biology platform that decodes context-specific biological networks and gene activities.

    PubMed

    da Rocha, Edroaldo Lummertz; Ung, Choong Yong; McGehee, Cordelia D; Correia, Cristina; Li, Hu

    2016-06-02

    The sequential chain of interactions altering the binary state of a biomolecule represents the 'information flow' within a cellular network that determines phenotypic properties. Given the lack of computational tools to dissect context-dependent networks and gene activities, we developed NetDecoder, a network biology platform that models context-dependent information flows using pairwise phenotypic comparative analyses of protein-protein interactions. Using breast cancer, dyslipidemia and Alzheimer's disease as case studies, we demonstrate NetDecoder dissects subnetworks to identify key players significantly impacting cell behaviour specific to a given disease context. We further show genes residing in disease-specific subnetworks are enriched in disease-related signalling pathways and information flow profiles, which drive the resulting disease phenotypes. We also devise a novel scoring scheme to quantify key genes-network routers, which influence many genes, key targets, which are influenced by many genes, and high impact genes, which experience a significant change in regulation. We show the robustness of our results against parameter changes. Our network biology platform includes freely available source code (http://www.NetDecoder.org) for researchers to explore genome-wide context-dependent information flow profiles and key genes, given a set of genes of particular interest and transcriptome data. More importantly, NetDecoder will enable researchers to uncover context-dependent drug targets. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion.

    PubMed

    Babaei, Sepideh; Hulsman, Marc; Reinders, Marcel; de Ridder, Jeroen

    2013-01-23

    Delineating the molecular drivers of cancer, i.e. determining cancer genes and the pathways which they deregulate, is an important challenge in cancer research. In this study, we aim to identify pathways of frequently mutated genes by exploiting their network neighborhood encoded in the protein-protein interaction network. To this end, we introduce a multi-scale diffusion kernel and apply it to a large collection of murine retroviral insertional mutagenesis data. The diffusion strength plays the role of scale parameter, determining the size of the network neighborhood that is taken into account. As a result, in addition to detecting genes with frequent mutations in their genomic vicinity, we find genes that harbor frequent mutations in their interaction network context. We identify densely connected components of known and putatively novel cancer genes and demonstrate that they are strongly enriched for cancer related pathways across the diffusion scales. Moreover, the mutations in the clusters exhibit a significant pattern of mutual exclusion, supporting the conjecture that such genes are functionally linked. Using multi-scale diffusion kernel, various infrequently mutated genes are found to harbor significant numbers of mutations in their interaction network neighborhood. Many of them are well-known cancer genes. The results demonstrate the importance of defining recurrent mutations while taking into account the interaction network context. Importantly, the putative cancer genes and networks detected in this study are found to be significant at different diffusion scales, confirming the necessity of a multi-scale analysis.

  18. Learning contextual gene set interaction networks of cancer with condition specificity

    PubMed Central

    2013-01-01

    Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further

  19. A study of structural properties of gene network graphs for mathematical modeling of integrated mosaic gene networks.

    PubMed

    Petrovskaya, Olga V; Petrovskiy, Evgeny D; Lavrik, Inna N; Ivanisenko, Vladimir A

    2017-04-01

    Gene network modeling is one of the widely used approaches in systems biology. It allows for the study of complex genetic systems function, including so-called mosaic gene networks, which consist of functionally interacting subnetworks. We conducted a study of a mosaic gene networks modeling method based on integration of models of gene subnetworks by linear control functionals. An automatic modeling of 10,000 synthetic mosaic gene regulatory networks was carried out using computer experiments on gene knockdowns/knockouts. Structural analysis of graphs of generated mosaic gene regulatory networks has revealed that the most important factor for building accurate integrated mathematical models, among those analyzed in the study, is data on expression of genes corresponding to the vertices with high properties of centrality.

  20. Enhancing biological relevance of a weighted gene co-expression network for functional module identification.

    PubMed

    Prom-On, Santitham; Chanthaphan, Atthawut; Chan, Jonathan Hoyin; Meechai, Asawin

    2011-02-01

    Relationships among gene expression levels may be associated with the mechanisms of the disease. While identifying a direct association such as a difference in expression levels between case and control groups links genes to disease mechanisms, uncovering an indirect association in the form of a network structure may help reveal the underlying functional module associated with the disease under scrutiny. This paper presents a method to improve the biological relevance in functional module identification from the gene expression microarray data by enhancing the structure of a weighted gene co-expression network using minimum spanning tree. The enhanced network, which is called a backbone network, contains only the essential structural information to represent the gene co-expression network. The entire backbone network is decoupled into a number of coherent sub-networks, and then the functional modules are reconstructed from these sub-networks to ensure minimum redundancy. The method was tested with a simulated gene expression dataset and case-control expression datasets of autism spectrum disorder and colorectal cancer studies. The results indicate that the proposed method can accurately identify clusters in the simulated dataset, and the functional modules of the backbone network are more biologically relevant than those obtained from the original approach.

  1. Vasohibin-1 is identified as a master-regulator of endothelial cell apoptosis using gene network analysis

    PubMed Central

    2013-01-01

    Background Apoptosis is a critical process in endothelial cell (EC) biology and pathology, which has been extensively studied at protein level. Numerous gene expression studies of EC apoptosis have also been performed, however few attempts have been made to use gene expression data to identify the molecular relationships and master regulators that underlie EC apoptosis. Therefore, we sought to understand these relationships by generating a Bayesian gene regulatory network (GRN) model. Results ECs were induced to undergo apoptosis using serum withdrawal and followed over a time course in triplicate, using microarrays. When generating the GRN, this EC time course data was supplemented by a library of microarray data from EC treated with siRNAs targeting over 350 signalling molecules. The GRN model proposed Vasohibin-1 (VASH1) as one of the candidate master-regulators of EC apoptosis with numerous downstream mRNAs. To evaluate the role played by VASH1 in EC, we used siRNA to reduce the expression of VASH1. Of 10 mRNAs downstream of VASH1 in the GRN that were examined, 7 were significantly up- or down-regulated in the direction predicted by the GRN.Further supporting an important biological role of VASH1 in EC, targeted reduction of VASH1 mRNA abundance conferred resistance to serum withdrawal-induced EC death. Conclusion We have utilised Bayesian GRN modelling to identify a novel candidate master regulator of EC apoptosis. This study demonstrates how GRN technology can complement traditional methods to hypothesise the regulatory relationships that underlie important biological processes. PMID:23324451

  2. Systematic identification of an integrative network module during senescence from time-series gene expression.

    PubMed

    Park, Chihyun; Yun, So Jeong; Ryu, Sung Jin; Lee, Soyoung; Lee, Young-Sam; Yoon, Youngmi; Park, Sang Chul

    2017-03-15

    Cellular senescence irreversibly arrests growth of human diploid cells. In addition, recent studies have indicated that senescence is a multi-step evolving process related to important complex biological processes. Most studies analyzed only the genes and their functions representing each senescence phase without considering gene-level interactions and continuously perturbed genes. It is necessary to reveal the genotypic mechanism inferred by affected genes and their interaction underlying the senescence process. We suggested a novel computational approach to identify an integrative network which profiles an underlying genotypic signature from time-series gene expression data. The relatively perturbed genes were selected for each time point based on the proposed scoring measure denominated as perturbation scores. Then, the selected genes were integrated with protein-protein interactions to construct time point specific network. From these constructed networks, the conserved edges across time point were extracted for the common network and statistical test was performed to demonstrate that the network could explain the phenotypic alteration. As a result, it was confirmed that the difference of average perturbation scores of common networks at both two time points could explain the phenotypic alteration. We also performed functional enrichment on the common network and identified high association with phenotypic alteration. Remarkably, we observed that the identified cell cycle specific common network played an important role in replicative senescence as a key regulator. Heretofore, the network analysis from time series gene expression data has been focused on what topological structure was changed over time point. Conversely, we focused on the conserved structure but its context was changed in course of time and showed it was available to explain the phenotypic changes. We expect that the proposed method will help to elucidate the biological mechanism unrevealed by

  3. Gene regulation is governed by a core network in hepatocellular carcinoma.

    PubMed

    Gu, Zuguang; Zhang, Chenyu; Wang, Jin

    2012-05-01

    Hepatocellular carcinoma (HCC) is one of the most lethal cancers worldwide, and the mechanisms that lead to the disease are still relatively unclear. However, with the development of high-throughput technologies it is possible to gain a systematic view of biological systems to enhance the understanding of the roles of genes associated with HCC. Thus, analysis of the mechanism of molecule interactions in the context of gene regulatory networks can reveal specific sub-networks that lead to the development of HCC. In this study, we aimed to identify the most important gene regulations that are dysfunctional in HCC generation. Our method for constructing gene regulatory network is based on predicted target interactions, experimentally-supported interactions, and co-expression model. Regulators in the network included both transcription factors and microRNAs to provide a complete view of gene regulation. Analysis of gene regulatory network revealed that gene regulation in HCC is highly modular, in which different sets of regulators take charge of specific biological processes. We found that microRNAs mainly control biological functions related to mitochondria and oxidative reduction, while transcription factors control immune responses, extracellular activity and the cell cycle. On the higher level of gene regulation, there exists a core network that organizes regulations between different modules and maintains the robustness of the whole network. There is direct experimental evidence for most of the regulators in the core gene regulatory network relating to HCC. We infer it is the central controller of gene regulation. Finally, we explored the influence of the core gene regulatory network on biological pathways. Our analysis provides insights into the mechanism of transcriptional and post-transcriptional control in HCC. In particular, we highlight the importance of the core gene regulatory network; we propose that it is highly related to HCC and we believe further

  4. Construction of diagnosis system and gene regulatory networks based on microarray analysis.

    PubMed

    Hong, Chun-Fu; Chen, Ying-Chen; Chen, Wei-Chun; Tu, Keng-Chang; Tsai, Meng-Hsiun; Chan, Yung-Kuan; Yu, Shyr Shen

    2018-05-01

    A microarray analysis generally contains expression data of thousands of genes, but most of them are irrelevant to the disease of interest, making analyzing the genes concerning specific diseases complicated. Therefore, filtering out a few essential genes as well as their regulatory networks is critical, and a disease can be easily diagnosed just depending on the expression profiles of a few critical genes. In this study, a target gene screening (TGS) system, which is a microarray-based information system that integrates F-statistics, pattern recognition matching, a two-layer K-means classifier, a Parameter Detection Genetic Algorithm (PDGA), a genetic-based gene selector (GBG selector) and the association rule, was developed to screen out a small subset of genes that can discriminate malignant stages of cancers. During the first stage, F-statistic, pattern recognition matching, and a two-layer K-means classifier were applied in the system to filter out the 20 critical genes most relevant to ovarian cancer from 9600 genes, and the PDGA was used to decide the fittest values of the parameters for these critical genes. Among the 20 critical genes, 15 are associated with cancer progression. In the second stage, we further employed a GBG selector and the association rule to screen out seven target gene sets, each with only four to six genes, and each of which can precisely identify the malignancy stage of ovarian cancer based on their expression profiles. We further deduced the gene regulatory networks of the 20 critical genes by applying the Pearson correlation coefficient to evaluate the correlationship between the expression of each gene at the same stages and at different stages. Correlationships between gene pairs were calculated, and then, three regulatory networks were deduced. Their correlationships were further confirmed by the Ingenuity pathway analysis. The prognostic significances of the genes identified via regulatory networks were examined using online

  5. Optimal design of gene knockout experiments for gene regulatory network inference

    PubMed Central

    Ud-Dean, S. M. Minhaz; Gunawan, Rudiyanto

    2016-01-01

    Motivation: We addressed the problem of inferring gene regulatory network (GRN) from gene expression data of knockout (KO) experiments. This inference is known to be underdetermined and the GRN is not identifiable from data. Past studies have shown that suboptimal design of experiments (DOE) contributes significantly to the identifiability issue of biological networks, including GRNs. However, optimizing DOE has received much less attention than developing methods for GRN inference. Results: We developed REDuction of UnCertain Edges (REDUCE) algorithm for finding the optimal gene KO experiment for inferring directed graphs (digraphs) of GRNs. REDUCE employed ensemble inference to define uncertain gene interactions that could not be verified by prior data. The optimal experiment corresponds to the maximum number of uncertain interactions that could be verified by the resulting data. For this purpose, we introduced the concept of edge separatoid which gave a list of nodes (genes) that upon their removal would allow the verification of a particular gene interaction. Finally, we proposed a procedure that iterates over performing KO experiments, ensemble update and optimal DOE. The case studies including the inference of Escherichia coli GRN and DREAM 4 100-gene GRNs, demonstrated the efficacy of the iterative GRN inference. In comparison to systematic KOs, REDUCE could provide much higher information return per gene KO experiment and consequently more accurate GRN estimates. Conclusions: REDUCE represents an enabling tool for tackling the underdetermined GRN inference. Along with advances in gene deletion and automation technology, the iterative procedure brings an efficient and fully automated GRN inference closer to reality. Availability and implementation: MATLAB and Python scripts of REDUCE are available on www.cabsel.ethz.ch/tools/REDUCE. Contact: rudi.gunawan@chem.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID

  6. Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection.

    PubMed

    Guthke, Reinhard; Möller, Ulrich; Hoffmann, Martin; Thies, Frank; Töpfer, Susanne

    2005-04-15

    The immune response to bacterial infection represents a complex network of dynamic gene and protein interactions. We present an optimized reverse engineering strategy aimed at a reconstruction of this kind of interaction networks. The proposed approach is based on both microarray data and available biological knowledge. The main kinetics of the immune response were identified by fuzzy clustering of gene expression profiles (time series). The number of clusters was optimized using various evaluation criteria. For each cluster a representative gene with a high fuzzy-membership was chosen in accordance with available physiological knowledge. Then hypothetical network structures were identified by seeking systems of ordinary differential equations, whose simulated kinetics could fit the gene expression profiles of the cluster-representative genes. For the construction of hypothetical network structures singular value decomposition (SVD) based methods and a newly introduced heuristic Network Generation Method here were compared. It turned out that the proposed novel method could find sparser networks and gave better fits to the experimental data. Reinhard.Guthke@hki-jena.de.

  7. Applying gene regulatory network logic to the evolution of social behavior.

    PubMed

    Baran, Nicole M; McGrath, Patrick T; Streelman, J Todd

    2017-06-06

    Animal behavior is ultimately the product of gene regulatory networks (GRNs) for brain development and neural networks for brain function. The GRN approach has advanced the fields of genomics and development, and we identify organizational similarities between networks of genes that build the brain and networks of neurons that encode brain function. In this perspective, we engage the analogy between developmental networks and neural networks, exploring the advantages of using GRN logic to study behavior. Applying the GRN approach to the brain and behavior provides a quantitative and manipulative framework for discovery. We illustrate features of this framework using the example of social behavior and the neural circuitry of aggression.

  8. Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weighill, Deborah; Jones, Piet; Shah, Manesh

    Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for

  9. Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery

    DOE PAGES

    Weighill, Deborah; Jones, Piet; Shah, Manesh; ...

    2018-05-11

    Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for

  10. Transcriptome and metabolite analysis identifies nitrogen utilization genes in tea plant (Camellia sinensis).

    PubMed

    Li, Wei; Xiang, Fen; Zhong, Micai; Zhou, Lingyun; Liu, Hongyan; Li, Saijun; Wang, Xuewen

    2017-05-10

    Applied nitrogen (N) fertilizer significantly increases the leaf yield. However, most N is not utilized by the plant, negatively impacting the environment. To date, little is known regarding N utilization genes and mechanisms in the leaf production. To understand this, we investigated transcriptomes using RNA-seq and amino acid levels with N treatment in tea (Camellia sinensis), the most popular beverage crop. We identified 196 and 29 common differentially expressed genes in roots and leaves, respectively, in response to ammonium in two tea varieties. Among those genes, AMT, NRT and AQP for N uptake and GOGAT and GS for N assimilation were the key genes, validated by RT-qPCR, which expressed in a network manner with tissue specificity. Importantly, only AQP and three novel DEGs associated with stress, manganese binding, and gibberellin-regulated transcription factor were common in N responses across all tissues and varieties. A hypothesized gene regulatory network for N was proposed. A strong statistical correlation between key genes' expression and amino acid content was revealed. The key genes and regulatory network improve our understanding of the molecular mechanism of N usage and offer gene targets for plant improvement.

  11. Statistical indicators of collective behavior and functional clusters in gene networks of yeast

    NASA Astrophysics Data System (ADS)

    Živković, J.; Tadić, B.; Wick, N.; Thurner, S.

    2006-03-01

    We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.

  12. Identifying critical transitions and their leading biomolecular networks in complex diseases.

    PubMed

    Liu, Rui; Li, Meiyi; Liu, Zhi-Ping; Wu, Jiarui; Chen, Luonan; Aihara, Kazuyuki

    2012-01-01

    Identifying a critical transition and its leading biomolecular network during the initiation and progression of a complex disease is a challenging task, but holds the key to early diagnosis and further elucidation of the essential mechanisms of disease deterioration at the network level. In this study, we developed a novel computational method for identifying early-warning signals of the critical transition and its leading network during a disease progression, based on high-throughput data using a small number of samples. The leading network makes the first move from the normal state toward the disease state during a transition, and thus is causally related with disease-driving genes or networks. Specifically, we first define a state-transition-based local network entropy (SNE), and prove that SNE can serve as a general early-warning indicator of any imminent transitions, regardless of specific differences among systems. The effectiveness of this method was validated by functional analysis and experimental data.

  13. Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes

    PubMed Central

    Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca

    2006-01-01

    Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which

  14. Functional modules by relating protein interaction networks and gene expression.

    PubMed

    Tornow, Sabine; Mewes, H W

    2003-11-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships.

  15. Functional modules by relating protein interaction networks and gene expression

    PubMed Central

    Tornow, Sabine; Mewes, H. W.

    2003-01-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships. PMID:14576317

  16. Identifying influencers from sampled social networks

    NASA Astrophysics Data System (ADS)

    Tsugawa, Sho; Kimura, Kazuma

    2018-10-01

    Identifying influencers who can spread information to many other individuals from a social network is a fundamental research task in the network science research field. Several measures for identifying influencers have been proposed, and the effectiveness of these influence measures has been evaluated for the case where the complete social network structure is known. However, it is difficult in practice to obtain the complete structure of a social network because of missing data, false data, or node/link sampling from the social network. In this paper, we investigate the effects of node sampling from a social network on the effectiveness of influence measures at identifying influencers. Our experimental results show that the negative effect of biased sampling, such as sample edge count, on the identification of influencers is generally small. For social media networks, we can identify influencers whose influence is comparable with that of those identified from the complete social networks by sampling only 10%-30% of the networks. Moreover, our results also suggest the possible benefit of network sampling in the identification of influencers. Our results show that, for some networks, nodes with higher influence can be discovered from sampled social networks than from complete social networks.

  17. Recursive regularization for inferring gene networks from time-course gene expression profiles

    PubMed Central

    Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Fujita, André; Nagasaki, Masao; Miyano, Satoru

    2009-01-01

    Background Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives. Results By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG. Conclusion The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles. PMID:19386091

  18. Sequence-based model of gap gene regulatory network.

    PubMed

    Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria

    2014-01-01

    The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3

  19. Constructing an integrated gene similarity network for the identification of disease genes.

    PubMed

    Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin

    2017-09-20

    Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .

  20. Identifying key nodes in multilayer networks based on tensor decomposition.

    PubMed

    Wang, Dingjie; Wang, Haitao; Zou, Xiufen

    2017-06-01

    The identification of essential agents in multilayer networks characterized by different types of interactions is a crucial and challenging topic, one that is essential for understanding the topological structure and dynamic processes of multilayer networks. In this paper, we use the fourth-order tensor to represent multilayer networks and propose a novel method to identify essential nodes based on CANDECOMP/PARAFAC (CP) tensor decomposition, referred to as the EDCPTD centrality. This method is based on the perspective of multilayer networked structures, which integrate the information of edges among nodes and links between different layers to quantify the importance of nodes in multilayer networks. Three real-world multilayer biological networks are used to evaluate the performance of the EDCPTD centrality. The bar chart and ROC curves of these multilayer networks indicate that the proposed approach is a good alternative index to identify real important nodes. Meanwhile, by comparing the behavior of both the proposed method and the aggregated single-layer methods, we demonstrate that neglecting the multiple relationships between nodes may lead to incorrect identification of the most versatile nodes. Furthermore, the Gene Ontology functional annotation demonstrates that the identified top nodes based on the proposed approach play a significant role in many vital biological processes. Finally, we have implemented many centrality methods of multilayer networks (including our method and the published methods) and created a visual software based on the MATLAB GUI, called ENMNFinder, which can be used by other researchers.

  1. Identifying key nodes in multilayer networks based on tensor decomposition

    NASA Astrophysics Data System (ADS)

    Wang, Dingjie; Wang, Haitao; Zou, Xiufen

    2017-06-01

    The identification of essential agents in multilayer networks characterized by different types of interactions is a crucial and challenging topic, one that is essential for understanding the topological structure and dynamic processes of multilayer networks. In this paper, we use the fourth-order tensor to represent multilayer networks and propose a novel method to identify essential nodes based on CANDECOMP/PARAFAC (CP) tensor decomposition, referred to as the EDCPTD centrality. This method is based on the perspective of multilayer networked structures, which integrate the information of edges among nodes and links between different layers to quantify the importance of nodes in multilayer networks. Three real-world multilayer biological networks are used to evaluate the performance of the EDCPTD centrality. The bar chart and ROC curves of these multilayer networks indicate that the proposed approach is a good alternative index to identify real important nodes. Meanwhile, by comparing the behavior of both the proposed method and the aggregated single-layer methods, we demonstrate that neglecting the multiple relationships between nodes may lead to incorrect identification of the most versatile nodes. Furthermore, the Gene Ontology functional annotation demonstrates that the identified top nodes based on the proposed approach play a significant role in many vital biological processes. Finally, we have implemented many centrality methods of multilayer networks (including our method and the published methods) and created a visual software based on the MATLAB GUI, called ENMNFinder, which can be used by other researchers.

  2. Gene co-expression network analysis in Rhodobacter capsulatus and application to comparative expression analysis of Rhodobacter sphaeroides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pena-Castillo, Lourdes; Mercer, Ryan; Gurinovich, Anastasia

    2014-08-28

    The genus Rhodobacter contains purple nonsulfur bacteria found mostly in freshwater environments. Representative strains of two Rhodobacter species, R. capsulatus and R. sphaeroides, have had their genomes fully sequenced and both have been the subject of transcriptional profiling studies. Gene co-expression networks can be used to identify modules of genes with similar expression profiles. Functional analysis of gene modules can then associate co-expressed genes with biological pathways, and network statistics can determine the degree of module preservation in related networks. In this paper, we constructed an R. capsulatus gene co-expression network, performed functional analysis of identified gene modules, and investigatedmore » preservation of these modules in R. capsulatus proteomics data and in R. sphaeroides transcriptomics data. Results: The analysis identified 40 gene co-expression modules in R. capsulatus. Investigation of the module gene contents and expression profiles revealed patterns that were validated based on previous studies supporting the biological relevance of these modules. We identified two R. capsulatus gene modules preserved in the protein abundance data. We also identified several gene modules preserved between both Rhodobacter species, which indicate that these cellular processes are conserved between the species and are candidates for functional information transfer between species. Many gene modules were non-preserved, providing insight into processes that differentiate the two species. In addition, using Local Network Similarity (LNS), a recently proposed metric for expression divergence, we assessed the expression conservation of between-species pairs of orthologs, and within-species gene-protein expression profiles. Conclusions: Our analyses provide new sources of information for functional annotation in R. capsulatus because uncharacterized genes in modules are now connected with groups of genes that constitute a joint functional

  3. Text mining and network analysis to find functional associations of genes in high altitude diseases.

    PubMed

    Bhasuran, Balu; Subramanian, Devika; Natarajan, Jeyakumar

    2018-05-02

    Travel to elevations above 2500 m is associated with the risk of developing one or more forms of acute altitude illness such as acute mountain sickness (AMS), high altitude cerebral edema (HACE) or high altitude pulmonary edema (HAPE). Our work aims to identify the functional association of genes involved in high altitude diseases. In this work we identified the gene networks responsible for high altitude diseases by using the principle of gene co-occurrence statistics from literature and network analysis. First, we mined the literature data from PubMed on high-altitude diseases, and extracted the co-occurring gene pairs. Next, based on their co-occurrence frequency, gene pairs were ranked. Finally, a gene association network was created using statistical measures to explore potential relationships. Network analysis results revealed that EPO, ACE, IL6 and TNF are the top five genes that were found to co-occur with 20 or more genes, while the association between EPAS1 and EGLN1 genes is strongly substantiated. The network constructed from this study proposes a large number of genes that work in-toto in high altitude conditions. Overall, the result provides a good reference for further study of the genetic relationships in high altitude diseases. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Therapeutic synthetic gene networks.

    PubMed

    Karlsson, Maria; Weber, Wilfried

    2012-10-01

    The field of synthetic biology is rapidly expanding and has over the past years evolved from the development of simple gene networks to complex treatment-oriented circuits. The reprogramming of cell fate with open-loop or closed-loop synthetic control circuits along with biologically implemented logical functions have fostered applications spanning over a wide range of disciplines, including artificial insemination, personalized medicine and the treatment of cancer and metabolic disorders. In this review we describe several applications of interactive gene networks, a synthetic biology-based approach for future gene therapy, as well as the utilization of synthetic gene circuits as blueprints for the design of stimuli-responsive biohybrid materials. The recent progress in synthetic biology, including the rewiring of biosensing devices with the body's endogenous network as well as novel therapeutic approaches originating from interdisciplinary work, generates numerous opportunities for future biomedical applications. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Identifying causal networks linking cancer processes and anti-tumor immunity using Bayesian network inference and metagene constructs.

    PubMed

    Kaiser, Jacob L; Bland, Cassidy L; Klinke, David J

    2016-03-01

    Cancer arises from a deregulation of both intracellular and intercellular networks that maintain system homeostasis. Identifying the architecture of these networks and how they are changed in cancer is a pre-requisite for designing drugs to restore homeostasis. Since intercellular networks only appear in intact systems, it is difficult to identify how these networks become altered in human cancer using many of the common experimental models. To overcome this, we used the diversity in normal and malignant human tissue samples from the Cancer Genome Atlas (TCGA) database of human breast cancer to identify the topology associated with intercellular networks in vivo. To improve the underlying biological signals, we constructed Bayesian networks using metagene constructs, which represented groups of genes that are concomitantly associated with different immune and cancer states. We also used bootstrap resampling to establish the significance associated with the inferred networks. In short, we found opposing relationships between cell proliferation and epithelial-to-mesenchymal transformation (EMT) with regards to macrophage polarization. These results were consistent across multiple carcinomas in that proliferation was associated with a type 1 cell-mediated anti-tumor immune response and EMT was associated with a pro-tumor anti-inflammatory response. To address the identifiability of these networks from other datasets, we could identify the relationship between EMT and macrophage polarization with fewer samples when the Bayesian network was generated from malignant samples alone. However, the relationship between proliferation and macrophage polarization was identified with fewer samples when the samples were taken from a combination of the normal and malignant samples. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:470-479, 2016. © 2016 American Institute of Chemical Engineers.

  6. Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells

    PubMed Central

    de Luis Balaguer, Maria Angels; Fisher, Adam P.; Clark, Natalie M.; Fernandez-Espinosa, Maria Guadalupe; Möller, Barbara K.; Weijers, Dolf; Williams, Cranos; Lorenzo, Oscar; Sozzani, Rosangela

    2017-01-01

    Identifying the transcription factors (TFs) and associated networks involved in stem cell regulation is essential for understanding the initiation and growth of plant tissues and organs. Although many TFs have been shown to have a role in the Arabidopsis root stem cells, a comprehensive view of the transcriptional signature of the stem cells is lacking. In this work, we used spatial and temporal transcriptomic data to predict interactions among the genes involved in stem cell regulation. To accomplish this, we transcriptionally profiled several stem cell populations and developed a gene regulatory network inference algorithm that combines clustering with dynamic Bayesian network inference. We leveraged the topology of our networks to infer potential major regulators. Specifically, through mathematical modeling and experimental validation, we identified PERIANTHIA (PAN) as an important molecular regulator of quiescent center function. The results presented in this work show that our combination of molecular biology, computational biology, and mathematical modeling is an efficient approach to identify candidate factors that function in the stem cells. PMID:28827319

  7. Summing up the noise in gene networks

    NASA Astrophysics Data System (ADS)

    Paulsson, Johan

    2004-01-01

    Random fluctuations in genetic networks are inevitable as chemical reactions are probabilistic and many genes, RNAs and proteins are present in low numbers per cell. Such `noise' affects all life processes and has recently been measured using green fluorescent protein (GFP). Two studies show that negative feedback suppresses noise, and three others identify the sources of noise in gene expression. Here I critically analyse these studies and present a simple equation that unifies and extends both the mathematical and biological perspectives.

  8. Visual gene-network analysis reveals the cancer gene co-expression in human endometrial cancer

    PubMed Central

    2014-01-01

    Background Endometrial cancers (ECs) are the most common form of gynecologic malignancy. Recent studies have reported that ECs reveal distinct markers for molecular pathogenesis, which in turn is linked to the various histological types of ECs. To understand further the molecular events contributing to ECs and endometrial tumorigenesis in general, a more precise identification of cancer-associated molecules and signaling networks would be useful for the detection and monitoring of malignancy, improving clinical cancer therapy, and personalization of treatments. Results ECs-specific gene co-expression networks were constructed by differential expression analysis and weighted gene co-expression network analysis (WGCNA). Important pathways and putative cancer hub genes contribution to tumorigenesis of ECs were identified. An elastic-net regularized classification model was built using the cancer hub gene signatures to predict the phenotypic characteristics of ECs. The 19 cancer hub gene signatures had high predictive power to distinguish among three key principal features of ECs: grade, type, and stage. Intriguingly, these hub gene networks seem to contribute to ECs progression and malignancy via cell-cycle regulation, antigen processing and the citric acid (TCA) cycle. Conclusions The results of this study provide a powerful biomarker discovery platform to better understand the progression of ECs and to uncover potential therapeutic targets in the treatment of ECs. This information might lead to improved monitoring of ECs and resulting improvement of treatment of ECs, the 4th most common of cancer in women. PMID:24758163

  9. Modelling the influence of parental effects on gene-network evolution.

    PubMed

    Odorico, Andreas; Rünneburger, Estelle; Le Rouzic, Arnaud

    2018-05-01

    Understanding the importance of nongenetic heredity in the evolutionary process is a major topic in modern evolutionary biology. We modified a classical gene-network model by allowing parental transmission of gene expression and studied its evolutionary properties through individual-based simulations. We identified ontogenetic time (i.e. the time gene networks have to stabilize before being submitted to natural selection) as a crucial factor in determining the evolutionary impact of this phenotypic inheritance. Indeed, fast-developing organisms display enhanced adaptation and greater robustness to mutations when evolving in presence of nongenetic inheritance (NGI). In contrast, in our model, long development reduces the influence of the inherited state of the gene network. NGI thus had a negligible effect on the evolution of gene networks when the speed at which transcription levels reach equilibrium is not constrained. Nevertheless, simulations show that intergenerational transmission of the gene-network state negatively affects the evolution of robustness to environmental disturbances for either fast- or slow-developing organisms. Therefore, these results suggest that the evolutionary consequences of NGI might not be sought only in the way species respond to selection, but also on the evolution of emergent properties (such as environmental and genetic canalization) in complex genetic architectures. © 2018 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2018 European Society For Evolutionary Biology.

  10. Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.

    PubMed

    Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H

    2017-12-20

    Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.

  11. Novel candidate genes important for asthma and hypertension comorbidity revealed from associative gene networks.

    PubMed

    Saik, Olga V; Demenkov, Pavel S; Ivanisenko, Timofey V; Bragina, Elena Yu; Freidin, Maxim B; Goncharova, Irina A; Dosenko, Victor E; Zolotareva, Olga I; Hofestaedt, Ralf; Lavrik, Inna N; Rogaev, Evgeny I; Ivanisenko, Vladimir A

    2018-02-13

    Hypertension and bronchial asthma are a major issue for people's health. As of 2014, approximately one billion adults, or ~ 22% of the world population, have had hypertension. As of 2011, 235-330 million people globally have been affected by asthma and approximately 250,000-345,000 people have died each year from the disease. The development of the effective treatment therapies against these diseases is complicated by their comorbidity features. This is often a major problem in diagnosis and their treatment. Hence, in this study the bioinformatical methodology for the analysis of the comorbidity of these two diseases have been developed. As such, the search for candidate genes related to the comorbid conditions of asthma and hypertension can help in elucidating the molecular mechanisms underlying the comorbid condition of these two diseases, and can also be useful for genotyping and identifying new drug targets. Using ANDSystem, the reconstruction and analysis of gene networks associated with asthma and hypertension was carried out. The gene network of asthma included 755 genes/proteins and 62,603 interactions, while the gene network of hypertension - 713 genes/proteins and 45,479 interactions. Two hundred and five genes/proteins and 9638 interactions were shared between asthma and hypertension. An approach for ranking genes implicated in the comorbid condition of two diseases was proposed. The approach is based on nine criteria for ranking genes by their importance, including standard methods of gene prioritization (Endeavor, ToppGene) as well as original criteria that take into account the characteristics of an associative gene network and the presence of known polymorphisms in the analysed genes. According to the proposed approach, the genes IL10, TLR4, and CAT had the highest priority in the development of comorbidity of these two diseases. Additionally, it was revealed that the list of top genes is enriched with apoptotic genes and genes involved in

  12. Transcriptional Network Analysis Identifies BACH1 as a Master Regulator of Breast Cancer Bone Metastasis

    PubMed Central

    Liang, Yajun; Wu, Heng; Lei, Rong; Chong, Robert A.; Wei, Yong; Lu, Xin; Tagkopoulos, Ilias; Kung, Sun-Yuan; Yang, Qifeng; Hu, Guohong; Kang, Yibin

    2012-01-01

    The application of functional genomic analysis of breast cancer metastasis has led to the identification of a growing number of organ-specific metastasis genes, which often function in concert to facilitate different steps of the metastatic cascade. However, the gene regulatory network that controls the expression of these metastasis genes remains largely unknown. Here, we demonstrate a computational approach for the deconvolution of transcriptional networks to discover master regulators of breast cancer bone metastasis. Several known regulators of breast cancer bone metastasis such as Smad4 and HIF1 were identified in our analysis. Experimental validation of the networks revealed BACH1, a basic leucine zipper transcription factor, as the common regulator of several functional metastasis genes, including MMP1 and CXCR4. Ectopic expression of BACH1 enhanced the malignance of breast cancer cells, and conversely, BACH1 knockdown significantly reduced bone metastasis. The expression of BACH1 and its target genes was linked to the higher risk of breast cancer recurrence in patients. This study established BACH1 as the master regulator of breast cancer bone metastasis and provided a paradigm to identify molecular determinants in complex pathological processes. PMID:22875853

  13. Introduction: Cancer Gene Networks.

    PubMed

    Clarke, Robert

    2017-01-01

    differential equations and related tools to create dynamic, semi-mechanistic models of low dimensional data including gene/protein signaling as a function of time/dose. More recently, the integration of imaging technologies into predictive multiscale modeling has begun to extend further the scales across which data can be obtained and used to gain insight into system function.There are several goals for predictive multiscale modeling including the more academic pursuit of understanding how the system or local feature thereof is regulated or functions, to the more practical or translational goals of identifying predictive (selecting which patient should receive which drug/therapy) or prognostic (disease progress and outcome in an individual patient) biomarkers and/or identifying network vulnerabilities that represent potential targets for therapeutic benefit with existing drugs (including drug repurposing) or for the development of new drugs. These various goals are not necessarily mutually exclusive or inclusive. Within this volume, readers will find examples of many of the activities noted above. Each chapter contains practical and/or methodological insights to guide readers in the design and interpretation of their own and published work.

  14. Identification of potential crucial genes and construction of microRNA-mRNA negative regulatory networks in osteosarcoma.

    PubMed

    Pan, Yue; Lu, Lingyun; Chen, Junquan; Zhong, Yong; Dai, Zhehao

    2018-01-01

    This study aimed to identify potential crucial genes and construction of microRNA-mRNA negative regulatory networks in osteosarcoma by comprehensive bioinformatics analysis. Data of gene expression profiles (GSE28424) and miRNA expression profiles (GSE28423) were downloaded from GEO database. The differentially expressed genes (DEGs) and miRNAs (DEMIs) were obtained by R Bioconductor packages. Functional and enrichment analyses of selected genes were performed using DAVID database. Protein-protein interaction (PPI) network was constructed by STRING and visualized in Cytoscape. The relationships among the DEGs and module in PPI network were analyzed by plug-in NetworkAnalyzer and MCODE seperately. Through the TargetScan and comparing target genes with DEGs, the miRNA-mRNA regulation network was established. Totally 346 DEGs and 90 DEMIs were found to be differentially expressed. These DEGs were enriched in biological processes and KEGG pathway of inflammatory immune response. 25 genes in the PPI network were selected as hub genes. Top 10 hub genes were TYROBP, HLA-DRA, VWF, PPBP, SERPING1, HLA-DPA1, SERPINA1, KIF20A, FERMT3, HLA-E. PPI network of DEGs followed a pattern of power law network and met the characteristics of small-world network. MCODE analysis identified 4 clusters and the most significant cluster consisted of 11 nodes and 55 edges. SEPP1, CKS2, TCAP, BPI were identified as the seed genes in their own clusters, respectively. The miRNA-mRNA regulation network which was composed of 89 pairs was established. MiR-210 had the highest connectivity with 12 target genes. Among the predicted target of MiR-96, HLA-DPA1 and TYROBP were the hub genes. Our study indicated possible differentially expressed genes and miRNA, and microRNA-mRNA negative regulatory networks in osteosarcoma by bioinformatics analysis, which may provide novel insights for unraveling pathogenesis of osteosarcoma.

  15. The transfer and transformation of collective network information in gene-matched networks.

    PubMed

    Kitsukawa, Takashi; Yagi, Takeshi

    2015-10-09

    Networks, such as the human society network, social and professional networks, and biological system networks, contain vast amounts of information. Information signals in networks are distributed over nodes and transmitted through intricately wired links, making the transfer and transformation of such information difficult to follow. Here we introduce a novel method for describing network information and its transfer using a model network, the Gene-matched network (GMN), in which nodes (neurons) possess attributes (genes). In the GMN, nodes are connected according to their expression of common genes. Because neurons have multiple genes, the GMN is cluster-rich. We show that, in the GMN, information transfer and transformation were controlled systematically, according to the activity level of the network. Furthermore, information transfer and transformation could be traced numerically with a vector using genes expressed in the activated neurons, the active-gene array, which was used to assess the relative activity among overlapping neuronal groups. Interestingly, this coding style closely resembles the cell-assembly neural coding theory. The method introduced here could be applied to many real-world networks, since many systems, including human society and various biological systems, can be represented as a network of this type.

  16. A network approach to analyzing highly recombinant malaria parasite genes.

    PubMed

    Larremore, Daniel B; Clauset, Aaron; Buckee, Caroline O

    2013-01-01

    The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.

  17. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes

    PubMed Central

    Larremore, Daniel B.; Clauset, Aaron; Buckee, Caroline O.

    2013-01-01

    The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences. PMID:24130474

  18. Single-nucleotide polymorphism-gene intermixed networking reveals co-linkers connected to multiple gene expression phenotypes

    PubMed Central

    Gong, Bin-Sheng; Zhang, Qing-Pu; Zhang, Guang-Mei; Zhang, Shao-Jun; Zhang, Wei; Lv, Hong-Chao; Zhang, Fan; Lv, Sa-Li; Li, Chuan-Xing; Rao, Shao-Qi; Li, Xia

    2007-01-01

    Gene expression profiles and single-nucleotide polymorphism (SNP) profiles are modern data for genetic analysis. It is possible to use the two types of information to analyze the relationships among genes by some genetical genomics approaches. In this study, gene expression profiles were used as expression traits. And relationships among the genes, which were co-linked to a common SNP(s), were identified by integrating the two types of information. Further research on the co-expressions among the co-linked genes was carried out after the gene-SNP relationships were established using the Haseman-Elston sib-pair regression. The results showed that the co-expressions among the co-linked genes were significantly higher if the number of connections between the genes and a SNP(s) was more than six. Then, the genes were interconnected via one or more SNP co-linkers to construct a gene-SNP intermixed network. The genes sharing more SNPs tended to have a stronger correlation. Finally, a gene-gene network was constructed with their intensities of relationships (the number of SNP co-linkers shared) as the weights for the edges. PMID:18466544

  19. Systems Biology-Based Investigation of Cellular Antiviral Drug Targets Identified by Gene-Trap Insertional Mutagenesis.

    PubMed

    Cheng, Feixiong; Murray, James L; Zhao, Junfei; Sheng, Jinsong; Zhao, Zhongming; Rubin, Donald H

    2016-09-01

    Viruses require host cellular factors for successful replication. A comprehensive systems-level investigation of the virus-host interactome is critical for understanding the roles of host factors with the end goal of discovering new druggable antiviral targets. Gene-trap insertional mutagenesis is a high-throughput forward genetics approach to randomly disrupt (trap) host genes and discover host genes that are essential for viral replication, but not for host cell survival. In this study, we used libraries of randomly mutagenized cells to discover cellular genes that are essential for the replication of 10 distinct cytotoxic mammalian viruses, 1 gram-negative bacterium, and 5 toxins. We herein reported 712 candidate cellular genes, characterizing distinct topological network and evolutionary signatures, and occupying central hubs in the human interactome. Cell cycle phase-specific network analysis showed that host cell cycle programs played critical roles during viral replication (e.g. MYC and TAF4 regulating G0/1 phase). Moreover, the viral perturbation of host cellular networks reflected disease etiology in that host genes (e.g. CTCF, RHOA, and CDKN1B) identified were frequently essential and significantly associated with Mendelian and orphan diseases, or somatic mutations in cancer. Computational drug repositioning framework via incorporating drug-gene signatures from the Connectivity Map into the virus-host interactome identified 110 putative druggable antiviral targets and prioritized several existing drugs (e.g. ajmaline) that may be potential for antiviral indication (e.g. anti-Ebola). In summary, this work provides a powerful methodology with a tight integration of gene-trap insertional mutagenesis testing and systems biology to identify new antiviral targets and drugs for the development of broadly acting and targeted clinical antiviral therapeutics.

  20. Combining Genome-Scale Experimental and Computational Methods To Identify Essential Genes in Rhodobacter sphaeroides

    DOE PAGES

    Burger, Brian T.; Imam, Saheed; Scarborough, Matthew J.; ...

    2017-06-06

    Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosyntheticmore » growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species.« less

  1. Identification of conserved drought stress responsive gene-network across tissues and developmental stages in rice.

    PubMed

    Smita, Shuchi; Katiyar, Amit; Pandey, Dev Mani; Chinnusamy, Viswanathan; Archak, Sunil; Bansal, Kailash Chander

    2013-01-01

    Identification of genes that are coexpressed across various tissues and environmental stresses is biologically interesting, since they may play coordinated role in similar biological processes. Genes with correlated expression patterns can be best identified by using coexpression network analysis of transcriptome data. In the present study, we analyzed the temporal-spatial coordination of gene expression in root, leaf and panicle of rice under drought stress and constructed network using WGCNA and Cytoscape. Total of 2199 differentially expressed genes (DEGs) were identified in at least three or more tissues, wherein 88 genes have coordinated expression profile among all the six tissues under drought stress. These 88 highly coordinated genes were further subjected to module identification in the coexpression network. Based on chief topological properties we identified 18 hub genes such as ABC transporter, ATP-binding protein, dehydrin, protein phosphatase 2C, LTPL153 - Protease inhibitor, phosphatidylethanolaminebinding protein, lactose permease-related, NADP-dependent malic enzyme, etc. Motif enrichment analysis showed the presence of ABRE cis-elements in the promoters of > 62% of the coordinately expressed genes. Our results suggest that drought stress mediated upregulated gene expression was coordinated through an ABA-dependent signaling pathway across tissues, at least for the subset of genes identified in this study, while down regulation appears to be regulated by tissue specific pathways in rice.

  2. Network-based approach to identify prognostic biomarkers for estrogen receptor-positive breast cancer treatment with tamoxifen.

    PubMed

    Liu, Rong; Guo, Cheng-Xian; Zhou, Hong-Hao

    2015-01-01

    This study aims to identify effective gene networks and prognostic biomarkers associated with estrogen receptor positive (ER+) breast cancer using human mRNA studies. Weighted gene coexpression network analysis was performed with a complex ER+ breast cancer transcriptome to investigate the function of networks and key genes in the prognosis of breast cancer. We found a significant correlation of an expression module with distant metastasis-free survival (HR = 2.25; 95% CI .21.03-4.88 in discovery set; HR = 1.78; 95% CI = 1.07-2.93 in validation set). This module contained genes enriched in the biological process of the M phase. From this module, we further identified and validated 5 hub genes (CDK1, DLGAP5, MELK, NUSAP1, and RRM2), the expression levels of which were strongly associated with poor survival. Highly expressed MELK indicated poor survival in luminal A and luminal B breast cancer molecular subtypes. This gene was also found to be associated with tamoxifen resistance. Results indicated that a network-based approach may facilitate the discovery of biomarkers for the prognosis of ER+ breast cancer and may also be used as a basis for establishing personalized therapies. Nevertheless, before the application of this approach in clinical settings, in vivo and in vitro experiments and multi-center randomized controlled clinical trials are still needed.

  3. Probabilistic representation of gene regulatory networks.

    PubMed

    Mao, Linyong; Resat, Haluk

    2004-09-22

    Recent experiments have established unambiguously that biological systems can have significant cell-to-cell variations in gene expression levels even in isogenic populations. Computational approaches to studying gene expression in cellular systems should capture such biological variations for a more realistic representation. In this paper, we present a new fully probabilistic approach to the modeling of gene regulatory networks that allows for fluctuations in the gene expression levels. The new algorithm uses a very simple representation for the genes, and accounts for the repression or induction of the genes and for the biological variations among isogenic populations simultaneously. Because of its simplicity, introduced algorithm is a very promising approach to model large-scale gene regulatory networks. We have tested the new algorithm on the synthetic gene network library bioengineered recently. The good agreement between the computed and the experimental results for this library of networks, and additional tests, demonstrate that the new algorithm is robust and very successful in explaining the experimental data. The simulation software is available upon request. Supplementary material will be made available on the OUP server.

  4. Omics analysis of human bone to identify genes and molecular networks regulating skeletal remodeling in health and disease.

    PubMed

    Reppe, Sjur; Datta, Harish K; Gautvik, Kaare M

    2017-08-01

    The skeleton is a metabolically active organ throughout life where specific bone cell activity and paracrine/endocrine factors regulate its morphogenesis and remodeling. In recent years, an increasing number of reports have used multi-omics technologies to characterize subsets of bone biological molecular networks. The skeleton is affected by primary and secondary disease, lifestyle and many drugs. Therefore, to obtain relevant and reliable data from well characterized patient and control cohorts are vital. Here we provide a brief overview of omics studies performed on human bone, of which our own studies performed on trans-iliacal bone biopsies from postmenopausal women with osteoporosis (OP) and healthy controls are among the first and largest. Most other studies have been performed on smaller groups of patients, undergoing hip replacement for osteoarthritis (OA) or fracture, and without healthy controls. The major findings emerging from the combined studies are: 1. Unstressed and stressed bone show profoundly different gene expression reflecting differences in bone turnover and remodeling and 2. Omics analyses comparing healthy/OP and control/OA cohorts reveal characteristic changes in transcriptomics, epigenomics (DNA methylation), proteomics and metabolomics. These studies, together with genome-wide association studies, in vitro observations and transgenic animal models have identified a number of genes and gene products that act via Wnt and other signaling systems and are highly associated to bone density and fracture. Future challenge is to understand the functional interactions between bone-related molecular networks and their significance in OP and OA pathogenesis, and also how the genomic architecture is affected in health and disease. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells

    PubMed Central

    Mason, Mike J; Fan, Guoping; Plath, Kathrin; Zhou, Qing; Horvath, Steve

    2009-01-01

    Background Recent work has revealed that a core group of transcription factors (TFs) regulates the key characteristics of embryonic stem (ES) cells: pluripotency and self-renewal. Current efforts focus on identifying genes that play important roles in maintaining pluripotency and self-renewal in ES cells and aim to understand the interactions among these genes. To that end, we investigated the use of unsigned and signed network analysis to identify pluripotency and differentiation related genes. Results We show that signed networks provide a better systems level understanding of the regulatory mechanisms of ES cells than unsigned networks, using two independent murine ES cell expression data sets. Specifically, using signed weighted gene co-expression network analysis (WGCNA), we found a pluripotency module and a differentiation module, which are not identified in unsigned networks. We confirmed the importance of these modules by incorporating genome-wide TF binding data for key ES cell regulators. Interestingly, we find that the pluripotency module is enriched with genes related to DNA damage repair and mitochondrial function in addition to transcriptional regulation. Using a connectivity measure of module membership, we not only identify known regulators of ES cells but also show that Mrpl15, Msh6, Nrf1, Nup133, Ppif, Rbpj, Sh3gl2, and Zfp39, among other genes, have important roles in maintaining ES cell pluripotency and self-renewal. We also report highly significant relationships between module membership and epigenetic modifications (histone modifications and promoter CpG methylation status), which are known to play a role in controlling gene expression during ES cell self-renewal and differentiation. Conclusion Our systems biologic re-analysis of gene expression, transcription factor binding, epigenetic and gene ontology data provides a novel integrative view of ES cell biology. PMID:19619308

  6. The Schizophrenia Risk Gene MIR137 Acts as a Hippocampal Gene Network Node Orchestrating the Expression of Genes Relevant to Nervous System Development and Function

    PubMed Central

    Loohuis, Nikkie FM Olde; Kasri, Nael Nadif; Glennon, Jeffrey C; van Bokhoven, Hans; Hébert, Sébastien S; Kaplan, Barry B.; Martens, Gerard JM; Aschrafi, Armaz

    2016-01-01

    MicroRNAs (miRs) are small regulatory molecules, which orchestrate neuronal development and plasticity through modulation of complex gene networks. microRNA-137 (miR-137) is a brain-enriched RNA with a critical role in regulating brain development and in mediating synaptic plasticity. Importantly, mutations in this miR are associated with the pathoetiology of schizophrenia (SZ), and there is a widespread assumption that disruptions in miR-137 expression lead to aberrant expression of gene regulatory networks associated with SZ. To systematically identify the mRNA targets for this miR, we performed miR-137 gain- and loss-of-function experiments in primary rat hippocampal neurons and profiled differentially expressed mRNAs through next-generation sequencing. We identified 500 genes that were bidirectionally activated or repressed in their expression by the modulation of miR-137 levels. Gene ontology analysis using two independent software resources suggested functions for these miR-137-regulated genes in neurodevelopmental processes, neuronal maturation processes and cell maintenance, all of which known to be critical for proper brain circuitry formation. Since many of the putative miR-137 targets identified here also have been previously shown to be associated with SZ, we propose that this miR acts as a critical gene network hub contributing to the pathophysiology of this neurodevelopmental disorder. PMID:26925706

  7. A network-based method for the identification of putative genes related to infertility.

    PubMed

    Wang, ShaoPeng; Huang, GuoHua; Hu, Qinghua; Zou, Quan

    2016-11-01

    Infertility has become one of the major health problems worldwide, with its incidence having risen markedly in recent decades. There is an urgent need to investigate the pathological mechanisms behind infertility and to design effective treatments. However, this is made difficult by the fact that various biological factors have been identified to be related to infertility, including genetic factors. A network-based method was established to identify new genes potentially related to infertility. A network constructed using human protein-protein interactions based on previously validated infertility-related genes enabled the identification of some novel candidate genes. These genes were then filtered by a permutation test and their functional and structural associations with infertility-related genes. Our method identified 23 novel genes, which have strong functional and structural associations with previously validated infertility-related genes. Substantial evidence indicates that the identified genes are strongly related to dysfunction of the four main biological processes of fertility: reproductive development and physiology, gametogenesis, meiosis and recombination, and hormone regulation. The newly discovered genes may provide new directions for investigating infertility. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Linking disease-associated genes to regulatory networks via promoter organization

    PubMed Central

    Döhr, S.; Klingenhoff, A.; Maier, H.; de Angelis, M. Hrabé; Werner, T.; Schneider, R.

    2005-01-01

    Pathway- or disease-associated genes may participate in more than one transcriptional co-regulation network. Such gene groups can be readily obtained by literature analysis or by high-throughput techniques such as microarrays or protein-interaction mapping. We developed a strategy that defines regulatory networks by in silico promoter analysis, finding potentially co-regulated subgroups without a priori knowledge. Pairs of transcription factor binding sites conserved in orthologous genes (vertically) as well as in promoter sequences of co-regulated genes (horizontally) were used as seeds for the development of promoter models representing potential co-regulation. This approach was applied to a Maturity Onset Diabetes of the Young (MODY)-associated gene list, which yielded two models connecting functionally interacting genes within MODY-related insulin/glucose signaling pathways. Additional genes functionally connected to our initial gene list were identified by database searches with these promoter models. Thus, data-driven in silico promoter analysis allowed integrating molecular mechanisms with biological functions of the cell. PMID:15701758

  9. A fast and high performance multiple data integration algorithm for identifying human disease genes

    PubMed Central

    2015-01-01

    Background Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. Results In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. Conclusions The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms. PMID:26399620

  10. Expression profiling of Crambe abyssinica under arsenate stress identifies genes and gene networks involved in arsenic metabolism and detoxification

    PubMed Central

    2010-01-01

    Background Arsenic contamination is widespread throughout the world and this toxic metalloid is known to cause cancers of organs such as liver, kidney, skin, and lung in human. In spite of a recent surge in arsenic related studies, we are still far from a comprehensive understanding of arsenic uptake, detoxification, and sequestration in plants. Crambe abyssinica, commonly known as 'abyssinian mustard', is a non-food, high biomass oil seed crop that is naturally tolerant to heavy metals. Moreover, it accumulates significantly higher levels of arsenic as compared to other species of the Brassicaceae family. Thus, C. abyssinica has great potential to be utilized as an ideal inedible crop for phytoremediation of heavy metals and metalloids. However, the mechanism of arsenic metabolism in higher plants, including C. abyssinica, remains elusive. Results To identify the differentially expressed transcripts and the pathways involved in arsenic metabolism and detoxification, C. abyssinica plants were subjected to arsenate stress and a PCR-Select Suppression Subtraction Hybridization (SSH) approach was employed. A total of 105 differentially expressed subtracted cDNAs were sequenced which were found to represent 38 genes. Those genes encode proteins functioning as antioxidants, metal transporters, reductases, enzymes involved in the protein degradation pathway, and several novel uncharacterized proteins. The transcripts corresponding to the subtracted cDNAs showed strong upregulation by arsenate stress as confirmed by the semi-quantitative RT-PCR. Conclusions Our study revealed novel insights into the plant defense mechanisms and the regulation of genes and gene networks in response to arsenate toxicity. The differential expression of transcripts encoding glutathione-S-transferases, antioxidants, sulfur metabolism, heat-shock proteins, metal transporters, and enzymes in the ubiquitination pathway of protein degradation as well as several unknown novel proteins serve as

  11. Differential Connectivity in Colorectal Cancer Gene Expression Network

    PubMed

    Izadi, Fereshteh

    2018-05-30

    Colorectal cancer (CRC) is one of the challenging types of cancers; thus, exploring effective biomarkers related to colorectal could lead to significant progresses toward the treatment of this disease. In the present study, CRC gene expression datasets have been reanalyzed. Mutual differentially expressed genes across 294 normal mucosa and adjacent tumoral samples were then utilized in order to build two independent transcriptional regulatory networks. By analyzing the networks topologically, genes with differential global connectivity related to cancer state were determined for which the potential transcriptional regulators including transcription factors were identified. The majority of differentially connected genes (DCGs) were up-regulated in colorectal transcriptome experiments. Moreover, a number of these genes have been experimentally validated as cancer or CRC-associated genes. The DCGs, including GART, TGFB1, ITGA2, SLC16A5, SOX9, and MMP7, were investigated across 12 cancer types. Functional enrichment analysis followed by detailed data mining exhibited that these candidate genes could be related to CRC by mediating in metastatic cascade in addition to shared pathways with 12 cancer types by triggering the inflammatory events Our study uncovered correlated alterations in gene expression related to CRC susceptibility and progression that the potent candidate biomarkers could provide a link to disease.

  12. A network approach to predict pathogenic genes for Fusarium graminearum.

    PubMed

    Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan

    2010-10-04

    Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which

  13. Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren’s Disease

    PubMed Central

    Becker, Kerstin; Siegert, Sabine; Toliat, Mohammad Reza; Du, Juanjiangmeng; Casper, Ramona; Dolmans, Guido H.; Werker, Paul M.; Tinschert, Sigrid; Franke, Andre; Gieger, Christian; Strauch, Konstantin; Nothnagel, Michael; Nürnberg, Peter; Hennies, Hans Christian

    2016-01-01

    Dupuytren´s disease, a fibromatosis of the connective tissue in the palm, is a common complex disease with a strong genetic component. Up to date nine genetic loci have been found to be associated with the disease. Six of these loci contain genes that code for Wnt signalling proteins. In spite of this striking first insight into the genetic factors in Dupuytren´s disease, much of the inherited risk in Dupuytren´s disease still needs to be discovered. The already identified loci jointly explain ~1% of the heritability in this disease. To further elucidate the genetic basis of Dupuytren´s disease, we performed a genome-wide meta-analysis combining three genome-wide association study (GWAS) data sets, comprising 1,580 cases and 4,480 controls. We corroborated all nine previously identified loci, six of these with genome-wide significance (p-value < 5x10-8). In addition, we identified 14 new suggestive loci (p-value < 10−5). Intriguingly, several of these new loci contain genes associated with Wnt signalling and therefore represent excellent candidates for replication. Next, we compared whole-transcriptome data between patient- and control-derived tissue samples and found the Wnt/β-catenin pathway to be the top deregulated pathway in patient samples. We then conducted network and pathway analyses in order to identify protein networks that are enriched for genes highlighted in the GWAS meta-analysis and expression data sets. We found further evidence that the Wnt signalling pathways in conjunction with other pathways may play a critical role in Dupuytren´s disease. PMID:27467239

  14. Diverse types of genetic variation converge on functional gene networks involved in schizophrenia.

    PubMed

    Gilman, Sarah R; Chang, Jonathan; Xu, Bin; Bawa, Tejdeep S; Gogos, Joseph A; Karayiorgou, Maria; Vitkup, Dennis

    2012-12-01

    Despite the successful identification of several relevant genomic loci, the underlying molecular mechanisms of schizophrenia remain largely unclear. We developed a computational approach (NETBAG+) that allows an integrated analysis of diverse disease-related genetic data using a unified statistical framework. The application of this approach to schizophrenia-associated genetic variations, obtained using unbiased whole-genome methods, allowed us to identify several cohesive gene networks related to axon guidance, neuronal cell mobility, synaptic function and chromosomal remodeling. The genes forming the networks are highly expressed in the brain, with higher brain expression during prenatal development. The identified networks are functionally related to genes previously implicated in schizophrenia, autism and intellectual disability. A comparative analysis of copy number variants associated with autism and schizophrenia suggests that although the molecular networks implicated in these distinct disorders may be related, the mutations associated with each disease are likely to lead, at least on average, to different functional consequences.

  15. Prediction of C. elegans Longevity Genes by Human and Worm Longevity Networks

    PubMed Central

    de Magalhães, João Pedro; Ruvkun, Gary; Fraifeld, Vadim E.; Curran, Sean P.

    2012-01-01

    Intricate and interconnected pathways modulate longevity, but screens to identify the components of these pathways have not been saturating. Because biological processes are often executed by protein complexes and fine-tuned by regulatory factors, the first-order protein-protein interactors of known longevity genes are likely to participate in the regulation of longevity. Data-rich maps of protein interactions have been established for many cardinal organisms such as yeast, worms, and humans. We propose that these interaction maps could be mined for the identification of new putative regulators of longevity. For this purpose, we have constructed longevity networks in both humans and worms. We reasoned that the essential first-order interactors of known longevity-associated genes in these networks are more likely to have longevity phenotypes than randomly chosen genes. We have used C. elegans to determine whether post-developmental inactivation of these essential genes modulates lifespan. Our results suggest that the worm and human longevity networks are functionally relevant and possess a high predictive power for identifying new longevity regulators. PMID:23144747

  16. An Integrative Genetics Approach to Identify Candidate Genes Regulating BMD: Combining Linkage, Gene Expression, and Association

    PubMed Central

    Farber, Charles R; van Nas, Atila; Ghazalpour, Anatole; Aten, Jason E; Doss, Sudheer; Sos, Brandon; Schadt, Eric E; Ingram-Drake, Leslie; Davis, Richard C; Horvath, Steve; Smith, Desmond J; Drake, Thomas A; Lusis, Aldons J

    2009-01-01

    Numerous quantitative trait loci (QTLs) affecting bone traits have been identified in the mouse; however, few of the underlying genes have been discovered. To improve the process of transitioning from QTL to gene, we describe an integrative genetics approach, which combines linkage analysis, expression QTL (eQTL) mapping, causality modeling, and genetic association in outbred mice. In C57BL/6J × C3H/HeJ (BXH) F2 mice, nine QTLs regulating femoral BMD were identified. To select candidate genes from within each QTL region, microarray gene expression profiles from individual F2 mice were used to identify 148 genes whose expression was correlated with BMD and regulated by local eQTLs. Many of the genes that were the most highly correlated with BMD have been previously shown to modulate bone mass or skeletal development. Candidates were further prioritized by determining whether their expression was predicted to underlie variation in BMD. Using network edge orienting (NEO), a causality modeling algorithm, 18 of the 148 candidates were predicted to be causally related to differences in BMD. To fine-map QTLs, markers in outbred MF1 mice were tested for association with BMD. Three chromosome 11 SNPs were identified that were associated with BMD within the Bmd11 QTL. Finally, our approach provides strong support for Wnt9a, Rasd1, or both underlying Bmd11. Integration of multiple genetic and genomic data sets can substantially improve the efficiency of QTL fine-mapping and candidate gene identification. PMID:18767929

  17. Identifying Stress Transcription Factors Using Gene Expression and TF-Gene Association Data

    PubMed Central

    Wu, Wei-Sheng; Chen, Bor-Sen

    2007-01-01

    Unicellular organisms such as yeasts have evolved to survive environmental stresses by rapidly reorganizing the genomic expression program to meet the challenges of harsh environments. The complex adaptation mechanisms to stress remain to be elucidated. In this study, we developed Stress Transcription Factor Identification Algorithm (STFIA), which integrates gene expression and TF-gene association data to identify the stress transcription factors (TFs) of six kinds of stresses. We identified some general stress TFs that are in response to various stresses, and some specific stress TFs that are in response to one specific stress. The biological significance of our findings is validated by the literature. We found that a small number of TFs may be sufficient to control a wide variety of expression patterns in yeast under different stresses. Two implications can be inferred from this observation. First, the adaptation mechanisms to different stresses may have a bow-tie structure. Second, there may exist extensive regulatory cross-talk among different stress responses. In conclusion, this study proposes a network of the regulators of stress responses and their mechanism of action. PMID:20066130

  18. Modeling stochasticity and robustness in gene regulatory networks.

    PubMed

    Garg, Abhishek; Mohanram, Kartik; Di Cara, Alessandro; De Micheli, Giovanni; Xenarios, Ioannis

    2009-06-15

    Understanding gene regulation in biological processes and modeling the robustness of underlying regulatory networks is an important problem that is currently being addressed by computational systems biologists. Lately, there has been a renewed interest in Boolean modeling techniques for gene regulatory networks (GRNs). However, due to their deterministic nature, it is often difficult to identify whether these modeling approaches are robust to the addition of stochastic noise that is widespread in gene regulatory processes. Stochasticity in Boolean models of GRNs has been addressed relatively sparingly in the past, mainly by flipping the expression of genes between different expression levels with a predefined probability. This stochasticity in nodes (SIN) model leads to over representation of noise in GRNs and hence non-correspondence with biological observations. In this article, we introduce the stochasticity in functions (SIF) model for simulating stochasticity in Boolean models of GRNs. By providing biological motivation behind the use of the SIF model and applying it to the T-helper and T-cell activation networks, we show that the SIF model provides more biologically robust results than the existing SIN model of stochasticity in GRNs. Algorithms are made available under our Boolean modeling toolbox, GenYsis. The software binaries can be downloaded from http://si2.epfl.ch/ approximately garg/genysis.html.

  19. Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations

    PubMed Central

    Logsdon, Benjamin A.; Mezey, Jason

    2010-01-01

    Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data. PMID:21152011

  20. Identification of Causal Genes, Networks, and Transcriptional Regulators of REM Sleep and Wake

    PubMed Central

    Millstein, Joshua; Winrow, Christopher J.; Kasarskis, Andrew; Owens, Joseph R.; Zhou, Lili; Summa, Keith C.; Fitzpatrick, Karrie; Zhang, Bin; Vitaterna, Martha H.; Schadt, Eric E.; Renger, John J.; Turek, Fred W.

    2011-01-01

    Study Objective: Sleep-wake traits are well-known to be under substantial genetic control, but the specific genes and gene networks underlying primary sleep-wake traits have largely eluded identification using conventional approaches, especially in mammals. Thus, the aim of this study was to use systems genetics and statistical approaches to uncover the genetic networks underlying 2 primary sleep traits in the mouse: 24-h duration of REM sleep and wake. Design: Genome-wide RNA expression data from 3 tissues (anterior cortex, hypothalamus, thalamus/midbrain) were used in conjunction with high-density genotyping to identify candidate causal genes and networks mediating the effects of 2 QTL regulating the 24-h duration of REM sleep and one regulating the 24-h duration of wake. Setting: Basic sleep research laboratory. Patients or Participants: Male [C57BL/6J × (BALB/cByJ × C57BL/6J*) F1] N2 mice (n = 283). Interventions: None. Measurements and Results: The genetic variation of a mouse N2 mapping cross was leveraged against sleep-state phenotypic variation as well as quantitative gene expression measurement in key brain regions using integrative genomics approaches to uncover multiple causal sleep-state regulatory genes, including several surprising novel candidates, which interact as components of networks that modulate REM sleep and wake. In particular, it was discovered that a core network module, consisting of 20 genes, involved in the regulation of REM sleep duration is conserved across the cortex, hypothalamus, and thalamus. A novel application of a formal causal inference test was also used to identify those genes directly regulating sleep via control of expression. Conclusion: Systems genetics approaches reveal novel candidate genes, complex networks and specific transcriptional regulators of REM sleep and wake duration in mammals. Citation: Millstein J; Winrow CJ; Kasarskis A; Owens JR; Zhou L; Summa KC; Fitzpatrick K; Zhang B; Vitaterna MH; Schadt EE

  1. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Siqi; Joseph, Antony; Hammonds, Ann S.

    Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set of Drosophila early embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identifiedmore » 21 principal patterns (PP). Providing a compact yet biologically interpretable representation of Drosophila expression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. In conclusion, the performance of PP with the Drosophila data suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.« less

  2. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks

    DOE PAGES

    Wu, Siqi; Joseph, Antony; Hammonds, Ann S.; ...

    2016-04-06

    Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set of Drosophila early embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identifiedmore » 21 principal patterns (PP). Providing a compact yet biologically interpretable representation of Drosophila expression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. In conclusion, the performance of PP with the Drosophila data suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.« less

  3. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns

    PubMed Central

    Lezon, Timothy R.; Banavar, Jayanth R.; Cieplak, Marek; Maritan, Amos; Fedoroff, Nina V.

    2006-01-01

    We describe a method based on the principle of entropy maximization to identify the gene interaction network with the highest probability of giving rise to experimentally observed transcript profiles. In its simplest form, the method yields the pairwise gene interaction network, but it can also be extended to deduce higher-order interactions. Analysis of microarray data from genes in Saccharomyces cerevisiae chemostat cultures exhibiting energy metabolic oscillations identifies a gene interaction network that reflects the intracellular communication pathways that adjust cellular metabolic activity and cell division to the limiting nutrient conditions that trigger metabolic oscillations. The success of the present approach in extracting meaningful genetic connections suggests that the maximum entropy principle is a useful concept for understanding living systems, as it is for other complex, nonequilibrium systems. PMID:17138668

  4. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns.

    PubMed

    Lezon, Timothy R; Banavar, Jayanth R; Cieplak, Marek; Maritan, Amos; Fedoroff, Nina V

    2006-12-12

    We describe a method based on the principle of entropy maximization to identify the gene interaction network with the highest probability of giving rise to experimentally observed transcript profiles. In its simplest form, the method yields the pairwise gene interaction network, but it can also be extended to deduce higher-order interactions. Analysis of microarray data from genes in Saccharomyces cerevisiae chemostat cultures exhibiting energy metabolic oscillations identifies a gene interaction network that reflects the intracellular communication pathways that adjust cellular metabolic activity and cell division to the limiting nutrient conditions that trigger metabolic oscillations. The success of the present approach in extracting meaningful genetic connections suggests that the maximum entropy principle is a useful concept for understanding living systems, as it is for other complex, nonequilibrium systems.

  5. Gene co-expression analysis identifies gene clusters associated with isotropic and polarized growth in Aspergillus fumigatus conidia.

    PubMed

    Baltussen, Tim J H; Coolen, Jordy P M; Zoll, Jan; Verweij, Paul E; Melchers, Willem J G

    2018-04-26

    Aspergillus fumigatus is a saprophytic fungus that extensively produces conidia. These microscopic asexually reproductive structures are small enough to reach the lungs. Germination of conidia followed by hyphal growth inside human lungs is a key step in the establishment of infection in immunocompromised patients. RNA-Seq was used to analyze the transcriptome of dormant and germinating A. fumigatus conidia. Construction of a gene co-expression network revealed four gene clusters (modules) correlated with a growth phase (dormant, isotropic growth, polarized growth). Transcripts levels of genes encoding for secondary metabolites were high in dormant conidia. During isotropic growth, transcript levels of genes involved in cell wall modifications increased. Two modules encoding for growth and cell cycle/DNA processing were associated with polarized growth. In addition, the co-expression network was used to identify highly connected intermodular hub genes. These genes may have a pivotal role in the respective module and could therefore be compelling therapeutic targets. Generally, cell wall remodeling is an important process during isotropic and polarized growth, characterized by an increase of transcripts coding for hyphal growth and cell cycle/DNA processing when polarized growth is initiated. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  6. Unveiling network-based functional features through integration of gene expression into protein networks.

    PubMed

    Jalili, Mahdi; Gebhardt, Tom; Wolkenhauer, Olaf; Salehzadeh-Yazdi, Ali

    2018-06-01

    Decoding health and disease phenotypes is one of the fundamental objectives in biomedicine. Whereas high-throughput omics approaches are available, it is evident that any single omics approach might not be adequate to capture the complexity of phenotypes. Therefore, integrated multi-omics approaches have been used to unravel genotype-phenotype relationships such as global regulatory mechanisms and complex metabolic networks in different eukaryotic organisms. Some of the progress and challenges associated with integrated omics studies have been reviewed previously in comprehensive studies. In this work, we highlight and review the progress, challenges and advantages associated with emerging approaches, integrating gene expression and protein-protein interaction networks to unravel network-based functional features. This includes identifying disease related genes, gene prioritization, clustering protein interactions, developing the modules, extract active subnetworks and static protein complexes or dynamic/temporal protein complexes. We also discuss how these approaches contribute to our understanding of the biology of complex traits and diseases. This article is part of a Special Issue entitled: Cardiac adaptations to obesity, diabetes and insulin resistance, edited by Professors Jan F.C. Glatz, Jason R.B. Dyck and Christine Des Rosiers. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Integrative Genomics Reveals Novel Molecular Pathways and Gene Networks for Coronary Artery Disease

    PubMed Central

    Mäkinen, Ville-Petteri; Civelek, Mete; Meng, Qingying; Zhang, Bin; Zhu, Jun; Levian, Candace; Huan, Tianxiao; Segrè, Ayellet V.; Ghosh, Sujoy; Vivar, Juan; Nikpay, Majid; Stewart, Alexandre F. R.; Nelson, Christopher P.; Willenborg, Christina; Erdmann, Jeanette; Blakenberg, Stefan; O'Donnell, Christopher J.; März, Winfried; Laaksonen, Reijo; Epstein, Stephen E.; Kathiresan, Sekar; Shah, Svati H.; Hazen, Stanley L.; Reilly, Muredach P.; Lusis, Aldons J.; Samani, Nilesh J.; Schunkert, Heribert; Quertermous, Thomas; McPherson, Ruth; Yang, Xia; Assimes, Themistocles L.

    2014-01-01

    The majority of the heritability of coronary artery disease (CAD) remains unexplained, despite recent successes of genome-wide association studies (GWAS) in identifying novel susceptibility loci. Integrating functional genomic data from a variety of sources with a large-scale meta-analysis of CAD GWAS may facilitate the identification of novel biological processes and genes involved in CAD, as well as clarify the causal relationships of established processes. Towards this end, we integrated 14 GWAS from the CARDIoGRAM Consortium and two additional GWAS from the Ottawa Heart Institute (25,491 cases and 66,819 controls) with 1) genetics of gene expression studies of CAD-relevant tissues in humans, 2) metabolic and signaling pathways from public databases, and 3) data-driven, tissue-specific gene networks from a multitude of human and mouse experiments. We not only detected CAD-associated gene networks of lipid metabolism, coagulation, immunity, and additional networks with no clear functional annotation, but also revealed key driver genes for each CAD network based on the topology of the gene regulatory networks. In particular, we found a gene network involved in antigen processing to be strongly associated with CAD. The key driver genes of this network included glyoxalase I (GLO1) and peptidylprolyl isomerase I (PPIL1), which we verified as regulatory by siRNA experiments in human aortic endothelial cells. Our results suggest genetic influences on a diverse set of both known and novel biological processes that contribute to CAD risk. The key driver genes for these networks highlight potential novel targets for further mechanistic studies and therapeutic interventions. PMID:25033284

  8. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

    PubMed Central

    Song, Min

    2016-01-01

    In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695

  9. Pan- and core- network analysis of co-expression genes in a model plant

    DOE PAGES

    He, Fei; Maslov, Sergei

    2016-12-16

    Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ andmore » ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. Here, we showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.« less

  10. Pan- and core- network analysis of co-expression genes in a model plant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Fei; Maslov, Sergei

    Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ andmore » ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. Here, we showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.« less

  11. Dynamics of Bacterial Gene Regulatory Networks.

    PubMed

    Shis, David L; Bennett, Matthew R; Igoshin, Oleg A

    2018-05-20

    The ability of bacterial cells to adjust their gene expression program in response to environmental perturbation is often critical for their survival. Recent experimental advances allowing us to quantitatively record gene expression dynamics in single cells and in populations coupled with mathematical modeling enable mechanistic understanding on how these responses are shaped by the underlying regulatory networks. Here, we review how the combination of local and global factors affect dynamical responses of gene regulatory networks. Our goal is to discuss the general principles that allow extrapolation from a few model bacteria to less understood microbes. We emphasize that, in addition to well-studied effects of network architecture, network dynamics are shaped by global pleiotropic effects and cell physiology.

  12. Inference of gene regulatory networks from genome-wide knockout fitness data

    PubMed Central

    Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

    2013-01-01

    Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information

  13. SurvNet: a web server for identifying network-based biomarkers that most correlate with patient survival data.

    PubMed

    Li, Jun; Roebuck, Paul; Grünewald, Stefan; Liang, Han

    2012-07-01

    An important task in biomedical research is identifying biomarkers that correlate with patient clinical data, and these biomarkers then provide a critical foundation for the diagnosis and treatment of disease. Conventionally, such an analysis is based on individual genes, but the results are often noisy and difficult to interpret. Using a biological network as the searching platform, network-based biomarkers are expected to be more robust and provide deep insights into the molecular mechanisms of disease. We have developed a novel bioinformatics web server for identifying network-based biomarkers that most correlate with patient survival data, SurvNet. The web server takes three input files: one biological network file, representing a gene regulatory or protein interaction network; one molecular profiling file, containing any type of gene- or protein-centred high-throughput biological data (e.g. microarray expression data or DNA methylation data); and one patient survival data file (e.g. patients' progression-free survival data). Given user-defined parameters, SurvNet will automatically search for subnetworks that most correlate with the observed patient survival data. As the output, SurvNet will generate a list of network biomarkers and display them through a user-friendly interface. SurvNet can be accessed at http://bioinformatics.mdanderson.org/main/SurvNet.

  14. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  15. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  16. Portrait of Candida Species Biofilm Regulatory Network Genes.

    PubMed

    Araújo, Daniela; Henriques, Mariana; Silva, Sónia

    2017-01-01

    Most cases of candidiasis have been attributed to Candida albicans, but Candida glabrata, Candida parapsilosis and Candida tropicalis, designated as non-C. albicans Candida (NCAC), have been identified as frequent human pathogens. Moreover, Candida biofilms are an escalating clinical problem associated with significant rates of mortality. Biofilms have distinct developmental phases, including adhesion/colonisation, maturation and dispersal, controlled by complex regulatory networks. This review discusses recent advances regarding Candida species biofilm regulatory network genes, which are key components for candidiasis. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Semantic integration to identify overlapping functional modules in protein interaction networks

    PubMed Central

    Cho, Young-Rae; Hwang, Woochang; Ramanathan, Murali; Zhang, Aidong

    2007-01-01

    Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification. PMID:17650343

  18. A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning

    PubMed Central

    2018-01-01

    Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968

  19. Modeling gene regulatory network motifs using statecharts

    PubMed Central

    2012-01-01

    Background Gene regulatory networks are widely used by biologists to describe the interactions among genes, proteins and other components at the intra-cellular level. Recently, a great effort has been devoted to give gene regulatory networks a formal semantics based on existing computational frameworks. For this purpose, we consider Statecharts, which are a modular, hierarchical and executable formal model widely used to represent software systems. We use Statecharts for modeling small and recurring patterns of interactions in gene regulatory networks, called motifs. Results We present an improved method for modeling gene regulatory network motifs using Statecharts and we describe the successful modeling of several motifs, including those which could not be modeled or whose models could not be distinguished using the method of a previous proposal. We model motifs in an easy and intuitive way by taking advantage of the visual features of Statecharts. Our modeling approach is able to simulate some interesting temporal properties of gene regulatory network motifs: the delay in the activation and the deactivation of the "output" gene in the coherent type-1 feedforward loop, the pulse in the incoherent type-1 feedforward loop, the bistability nature of double positive and double negative feedback loops, the oscillatory behavior of the negative feedback loop, and the "lock-in" effect of positive autoregulation. Conclusions We present a Statecharts-based approach for the modeling of gene regulatory network motifs in biological systems. The basic motifs used to build more complex networks (that is, simple regulation, reciprocal regulation, feedback loop, feedforward loop, and autoregulation) can be faithfully described and their temporal dynamics can be analyzed. PMID:22536967

  20. Pathway Interaction Network Analysis Identifies Dysregulated Pathways in Human Monocytes Infected by Listeria monocytogenes.

    PubMed

    Fan, Wufeng; Zhou, Yuhan; Li, Hao

    2017-01-01

    In our study, we aimed to extract dysregulated pathways in human monocytes infected by Listeria monocytogenes (LM) based on pathway interaction network (PIN) which presented the functional dependency between pathways. After genes were aligned to the pathways, principal component analysis (PCA) was used to calculate the pathway activity for each pathway, followed by detecting seed pathway. A PIN was constructed based on gene expression profile, protein-protein interactions (PPIs), and cellular pathways. Identifying dysregulated pathways from the PIN was performed relying on seed pathway and classification accuracy. To evaluate whether the PIN method was feasible or not, we compared the introduced method with standard network centrality measures. The pathway of RNA polymerase II pretranscription events was selected as the seed pathway. Taking this seed pathway as start, one pathway set (9 dysregulated pathways) with AUC score of 1.00 was identified. Among the 5 hub pathways obtained using standard network centrality measures, 4 pathways were the common ones between the two methods. RNA polymerase II transcription and DNA replication owned a higher number of pathway genes and DEGs. These dysregulated pathways work together to influence the progression of LM infection, and they will be available as biomarkers to diagnose LM infection.

  1. A gene network bioinformatics analysis for pemphigoid autoimmune blistering diseases.

    PubMed

    Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo

    2015-07-01

    In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.

  2. Convergent evolution of gene networks by single-gene duplications in higher eukaryotes.

    PubMed

    Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich

    2004-03-01

    By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix-loop-helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks emerging through single-gene duplications, the dominant importance of molecular modularity in the bottom-up construction of complex biological entities, and the convergent evolution of networks.

  3. Gene networks and the evolution of plant morphology.

    PubMed

    Das Gupta, Mainak; Tsiantis, Miltos

    2018-06-06

    Elaboration of morphology depends on the precise orchestration of gene expression by key regulatory genes. The hierarchy and relationship among the participating genes is commonly known as gene regulatory network (GRN). Therefore, the evolution of morphology ultimately occurs by the rewiring of gene network structures or by the co-option of gene networks to novel domains. The availability of high-resolution expression data combined with powerful statistical tools have opened up new avenues to formulate and test hypotheses on how diverse gene networks influence trait development and diversity. Here we summarize recent studies based on both big-data and genetics approaches to understand the evolution of plant form and physiology. We also discuss recent genome-wide investigations on how studying open-chromatin regions may help study the evolution of gene expression patterns. Copyright © 2018. Published by Elsevier Ltd.

  4. Gene expression in bovine rumen epithelium during weaning identifies molecular regulators of rumen development and growth.

    PubMed

    Connor, Erin E; Baldwin, Ransom L; Li, Cong-jun; Li, Robert W; Chung, Hoyoung

    2013-03-01

    During weaning, epithelial cell function in the rumen transitions in response to conversion from a pre-ruminant to a true ruminant environment to ensure efficient nutrient absorption and metabolism. To identify gene networks affected by weaning in bovine rumen, Holstein bull calves were fed commercial milk replacer only (MRO) until 42 days of age, then were provided diets of either milk + orchardgrass hay (MH) or milk + grain-based calf starter (MG). Rumen epithelial RNA was extracted from calves sacrificed at four time points: day 14 (n = 3) and day 42 (n = 3) of age while fed the MRO diet and day 56 (n = 3/diet) and day 70 (n = 3/diet) while fed the MH and MG diets for transcript profiling by microarray hybridization. Five two-group comparisons were made using Permutation Analysis of Differential Expression® to identify differentially expressed genes over time and developmental stage between days 14 and 42 within the MRO diet, between day 42 on the MRO diet and day 56 on the MG or MH diets, and between the MG and MH diets at days 56 and 70. Ingenuity Pathway Analysis (IPA) of differentially expressed genes during weaning indicated the top 5 gene networks involving molecules participating in lipid metabolism, cell morphology and death, cellular growth and proliferation, molecular transport, and the cell cycle. Putative genes functioning in the establishment of the rumen microbial population and associated rumen epithelial inflammation during weaning were identified. Activation of transcription factor PPAR-α was identified by IPA software as an important regulator of molecular changes in rumen epithelium that function in papillary development and fatty acid oxidation during the transition from pre-rumination to rumination. Thus, molecular markers of rumen development and gene networks regulating differentiation and growth of rumen epithelium were identified for selecting targets and methods for improving and assessing rumen development and

  5. Combinatorial explosion in model gene networks

    NASA Astrophysics Data System (ADS)

    Edwards, R.; Glass, L.

    2000-09-01

    The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such

  6. Combinatorial explosion in model gene networks.

    PubMed

    Edwards, R.; Glass, L.

    2000-09-01

    The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such

  7. Circuit-wide Transcriptional Profiling Reveals Brain Region-Specific Gene Networks Regulating Depression Susceptibility.

    PubMed

    Bagot, Rosemary C; Cates, Hannah M; Purushothaman, Immanuel; Lorsch, Zachary S; Walker, Deena M; Wang, Junshi; Huang, Xiaojie; Schlüter, Oliver M; Maze, Ian; Peña, Catherine J; Heller, Elizabeth A; Issler, Orna; Wang, Minghui; Song, Won-Min; Stein, Jason L; Liu, Xiaochuan; Doyle, Marie A; Scobie, Kimberly N; Sun, Hao Sheng; Neve, Rachael L; Geschwind, Daniel; Dong, Yan; Shen, Li; Zhang, Bin; Nestler, Eric J

    2016-06-01

    Depression is a complex, heterogeneous disorder and a leading contributor to the global burden of disease. Most previous research has focused on individual brain regions and genes contributing to depression. However, emerging evidence in humans and animal models suggests that dysregulated circuit function and gene expression across multiple brain regions drive depressive phenotypes. Here, we performed RNA sequencing on four brain regions from control animals and those susceptible or resilient to chronic social defeat stress at multiple time points. We employed an integrative network biology approach to identify transcriptional networks and key driver genes that regulate susceptibility to depressive-like symptoms. Further, we validated in vivo several key drivers and their associated transcriptional networks that regulate depression susceptibility and confirmed their functional significance at the levels of gene transcription, synaptic regulation, and behavior. Our study reveals novel transcriptional networks that control stress susceptibility and offers fundamentally new leads for antidepressant drug discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Vitamin D and gene networks in human osteoblasts

    PubMed Central

    van de Peppel, Jeroen; van Leeuwen, Johannes P. T. M.

    2014-01-01

    Bone formation is indirectly influenced by 1,25-dihydroxyvitamin D3 (1,25D3) through the stimulation of calcium uptake in the intestine and re-absorption in the kidneys. Direct effects on osteoblasts and bone formation have also been established. The vitamin D receptor (VDR) is expressed in osteoblasts and 1,25D3 modifies gene expression of various osteoblast differentiation and mineralization-related genes, such as alkaline phosphatase (ALPL), osteocalcin (BGLAP), and osteopontin (SPP1). 1,25D3 is known to stimulate mineralization of human osteoblasts in vitro, and recently it was shown that 1,25D3 induces mineralization via effects in the period preceding mineralization during the pre-mineralization period. For a full understanding of the action of 1,25D3 in osteoblasts it is important to get an integrated network view of the 1,25D3-regulated genes during osteoblast differentiation and mineralization. The current data will be presented and discussed alluding to future studies to fully delineate the 1,25D3 action in osteoblast. Describing and understanding the vitamin D regulatory networks and identifying the dominant players in these networks may help develop novel (personalized) vitamin D-based treatments. The following topics will be discussed in this overview: (1) Bone metabolism and osteoblasts, (2) Vitamin D, bone metabolism and osteoblast function, (3) Vitamin D induced transcriptional networks in the context of osteoblast differentiation and bone formation. PMID:24782782

  9. Exome Sequencing Identifies Three Novel Candidate Genes Implicated in Intellectual Disability

    PubMed Central

    Azam, Maleeha; Ayub, Humaira; Vissers, Lisenka E. L. M.; Gilissen, Christian; Ali, Syeda Hafiza Benish; Riaz, Moeen; Veltman, Joris A.; Pfundt, Rolph; van Bokhoven, Hans; Qamar, Raheel

    2014-01-01

    Intellectual disability (ID) is a major health problem mostly with an unknown etiology. Recently exome sequencing of individuals with ID identified novel genes implicated in the disease. Therefore the purpose of the present study was to identify the genetic cause of ID in one syndromic and two non-syndromic Pakistani families. Whole exome of three ID probands was sequenced. Missense variations in two plausible novel genes implicated in autosomal recessive ID were identified: lysine (K)-specific methyltransferase 2B (KMT2B), zinc finger protein 589 (ZNF589), as well as hedgehog acyltransferase (HHAT) with a de novo mutation with autosomal dominant mode of inheritance. The KMT2B recessive variant is the first report of recessive Kleefstra syndrome-like phenotype. Identification of plausible causative mutations for two recessive and a dominant type of ID, in genes not previously implicated in disease, underscores the large genetic heterogeneity of ID. These results also support the viewpoint that large number of ID genes converge on limited number of common networks i.e. ZNF589 belongs to KRAB-domain zinc-finger proteins previously implicated in ID, HHAT is predicted to affect sonic hedgehog, which is involved in several disorders with ID, KMT2B associated with syndromic ID fits the epigenetic module underlying the Kleefstra syndromic spectrum. The association of these novel genes in three different Pakistani ID families highlights the importance of screening these genes in more families with similar phenotypes from different populations to confirm the involvement of these genes in pathogenesis of ID. PMID:25405613

  10. Candidate gene prioritization by network analysis of differential expression using machine learning approaches

    PubMed Central

    2010-01-01

    Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising

  11. Identification of the key regulating genes of diminished ovarian reserve (DOR) by network and gene ontology analysis.

    PubMed

    Pashaiasl, Maryam; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2016-09-01

    Diminished ovarian reserve (DOR) is one of the reasons for infertility that not only affects both older and young women. Ovarian reserve assessment can be used as a new prognostic tool for infertility treatment decision making. Here, up- and down-regulated gene expression profiles of granulosa cells were analysed to generate a putative interaction map of the involved genes. In addition, gene ontology (GO) analysis was used to get insight intol the biological processes and molecular functions of involved proteins in DOR. Eleven up-regulated genes and nine down-regulated genes were identified and assessed by constructing interaction networks based on their biological processes. PTGS2, CTGF, LHCGR, CITED, SOCS2, STAR and FSTL3 were the key nodes in the up-regulated networks, while the IGF2, AMH, GREM, and FOXC1 proteins were key in the down-regulated networks. MIRN101-1, MIRN153-1 and MIRN194-1 inhibited the expression of SOCS2, while CSH1 and BMP2 positively regulated IGF1 and IGF2. Ossification, ovarian follicle development, vasculogenesis, sequence-specific DNA binding transcription factor activity, and golgi apparatus are the major differential groups between up-regulated and down-regulated genes in DOR. Meta-analysis of publicly available transcriptomic data highlighted the high coexpression of CTGF, connective tissue growth factor, with the other key regulators of DOR. CTGF is involved in organ senescence and focal adhesion pathway according to GO analysis. These findings provide a comprehensive system biology based insight into the aetiology of DOR through network and gene ontology analyses.

  12. Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis.

    PubMed

    van Dam, Jesse C J; Schaap, Peter J; Martins dos Santos, Vitor A P; Suárez-Diez, María

    2014-09-26

    Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each separate set has an intrinsic value that is diluted and partly lost when building a consensus network. Here we present a methodology to generate co-expression networks and, instead of a consensus network, we propose an integration framework where the different networks are kept and analysed with additional tools to efficiently combine the information extracted from each network. We developed a workflow to efficiently analyse information generated by different inference and prediction methods. Our methodology relies on providing the user the means to simultaneously visualise and analyse the coexisting networks generated by different algorithms, heterogeneous datasets, and a suite of analysis tools. As a show case, we have analysed the gene co-expression networks of Mycobacterium tuberculosis generated using over 600 expression experiments. Regarding DNA damage repair, we identified SigC as a key control element, 12 new targets for LexA, an updated LexA binding motif, and a potential mismatch repair system. We expanded the DevR regulon with 27 genes while identifying 9 targets wrongly assigned to this regulon. We discovered 10 new genes linked to zinc uptake and a new regulatory mechanism for ZuR. The use of co-expression networks to perform system level analysis allows the development of custom made methodologies. As show cases we implemented a pipeline to integrate ChIP-seq data and another method to uncover multiple regulatory layers. Our workflow is based on representing the multiple types of information as network representations and presenting these networks in a synchronous framework that allows their simultaneous visualization while keeping specific associations from the different

  13. Learning a Markov Logic network for supervised gene regulatory network inference

    PubMed Central

    2013-01-01

    Background Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. Results We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate “regulates”, starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black

  14. Learning a Markov Logic network for supervised gene regulatory network inference.

    PubMed

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a

  15. Gene expression, signal transduction pathways and functional networks associated with growth of sporadic vestibular schwannomas.

    PubMed

    Sass, Hjalte C R; Borup, Rehannah; Alanin, Mikkel; Nielsen, Finn Cilius; Cayé-Thomasen, Per

    2017-01-01

    The objective of this study was to determine global gene expression in relation to Vestibular schwannomas (VS) growth rate and to identify signal transduction pathways and functional molecular networks associated with growth. Repeated magnetic resonance imaging (MRI) prior to surgery determined tumor growth rate. Following tissue sampling during surgery, mRNA was extracted from 16 sporadic VS. Double stranded cDNA was synthesized from the mRNA and used as template for in vitro transcription reaction to synthesize biotin-labeled antisense cRNA, which was hybridized to Affymetrix HG-U133A arrays and analyzed by dChip software. Differential gene expression was defined as a 1.5-fold difference between fast and slow growing tumors (><0.5 ccm/year), employing a p-value <0.01. Deregulated transcripts were matched against established gene ontology. Ingenuity Pathway Analysis was used for identification of signal transduction pathways and functional molecular networks associated with tumor growth. In total 109 genes were deregulated in relation to tumor growth rate. Genes associated with apoptosis, growth and cell proliferation were deregulated. Gene ontology included regulation of the cell cycle, cell differentiation and proliferation, among other functions. Fourteen pathways were associated with tumor growth. Five functional molecular networks were generated. This first study on global gene expression in relation to vestibular schwannoma growth rate identified several genes, signal transduction pathways and functional networks associated with tumor progression. Specific genes involved in apoptosis, cell growth and proliferation were deregulated in fast growing tumors. Fourteen pathways were associated with tumor growth. Generated functional networks underlined the importance of the PI3K family, among others.

  16. Detecting complexes from edge-weighted PPI networks via genes expression analysis.

    PubMed

    Zhang, Zehua; Song, Jian; Tang, Jijun; Xu, Xinying; Guo, Fei

    2018-04-24

    Identifying complexes from PPI networks has become a key problem to elucidate protein functions and identify signal and biological processes in a cell. Proteins binding as complexes are important roles of life activity. Accurate determination of complexes in PPI networks is crucial for understanding principles of cellular organization. We propose a novel method to identify complexes on PPI networks, based on different co-expression information. First, we use Markov Cluster Algorithm with an edge-weighting scheme to calculate complexes on PPI networks. Then, we propose some significant features, such as graph information and gene expression analysis, to filter and modify complexes predicted by Markov Cluster Algorithm. To evaluate our method, we test on two experimental yeast PPI networks. On DIP network, our method has Precision and F-Measure values of 0.6004 and 0.5528. On MIPS network, our method has F-Measure and S n values of 0.3774 and 0.3453. Comparing to existing methods, our method improves Precision value by at least 0.1752, F-Measure value by at least 0.0448, S n value by at least 0.0771. Experiments show that our method achieves better results than some state-of-the-art methods for identifying complexes on PPI networks, with the prediction quality improved in terms of evaluation criteria.

  17. Differential reconstructed gene interaction networks for deriving toxicity threshold in chemical risk assessment.

    PubMed

    Yang, Yi; Maxwell, Andrew; Zhang, Xiaowei; Wang, Nan; Perkins, Edward J; Zhang, Chaoyang; Gong, Ping

    2013-01-01

    Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) approach to connect pathway perturbation with toxicity threshold setting. Our DNs approach consists of 6 steps: time-series gene expression data collection, identification of altered genes, gene interaction network reconstruction, differential edge inference, mapping of genes with differential edges to pathways, and establishment of causal relationships between chemical concentration and perturbed pathways. A one-sample Gaussian process model and a linear regression model were used to identify genes that exhibited significant profile changes across an entire time course and between treatments, respectively. Interaction networks of differentially expressed (DE) genes were reconstructed for different treatments using a state space model and then compared to infer differential edges/interactions. DE genes possessing differential edges were mapped to biological pathways in databases such as KEGG pathways. Using the DNs approach, we analyzed a time-series Escherichia coli live cell gene expression dataset consisting of 4 treatments (control, 10, 100, 1000 mg/L naphthenic acids, NAs) and 18 time points. Through comparison of reconstructed networks and construction of differential networks, 80 genes were identified as DE genes with a significant number of differential edges, and 22 KEGG pathways were altered in a concentration-dependent manner. Some of these pathways were perturbed to a degree as high as 70% even at the lowest exposure concentration, implying a high sensitivity of our DNs approach

  18. Long-Term Oil Contamination Alters the Molecular Ecological Networks of Soil Microbial Functional Genes

    PubMed Central

    Liang, Yuting; Zhao, Huihui; Deng, Ye; Zhou, Jizhong; Li, Guanghe; Sun, Bo

    2016-01-01

    With knowledge on microbial composition and diversity, investigation of within-community interactions is a further step to elucidate microbial ecological functions, such as the biodegradation of hazardous contaminants. In this work, microbial functional molecular ecological networks were studied in both contaminated and uncontaminated soils to determine the possible influences of oil contamination on microbial interactions and potential functions. Soil samples were obtained from an oil-exploring site located in South China, and the microbial functional genes were analyzed with GeoChip, a high-throughput functional microarray. By building random networks based on null model, we demonstrated that overall network structures and properties were significantly different between contaminated and uncontaminated soils (P < 0.001). Network connectivity, module numbers, and modularity were all reduced with contamination. Moreover, the topological roles of the genes (module hub and connectors) were altered with oil contamination. Subnetworks of genes involved in alkane and polycyclic aromatic hydrocarbon degradation were also constructed. Negative co-occurrence patterns prevailed among functional genes, thereby indicating probable competition relationships. The potential “keystone” genes, defined as either “hubs” or genes with highest connectivities in the network, were further identified. The network constructed in this study predicted the potential effects of anthropogenic contamination on microbial community co-occurrence interactions. PMID:26870020

  19. Resistance Genes in Global Crop Breeding Networks.

    PubMed

    Garrett, K A; Andersen, K F; Asche, F; Bowden, R L; Forbes, G A; Kulakow, P A; Zhou, B

    2017-10-01

    Resistance genes are a major tool for managing crop diseases. The networks of crop breeders who exchange resistance genes and deploy them in varieties help to determine the global landscape of resistance and epidemics, an important system for maintaining food security. These networks function as a complex adaptive system, with associated strengths and vulnerabilities, and implications for policies to support resistance gene deployment strategies. Extensions of epidemic network analysis can be used to evaluate the multilayer agricultural networks that support and influence crop breeding networks. Here, we evaluate the general structure of crop breeding networks for cassava, potato, rice, and wheat. All four are clustered due to phytosanitary and intellectual property regulations, and linked through CGIAR hubs. Cassava networks primarily include public breeding groups, whereas others are more mixed. These systems must adapt to global change in climate and land use, the emergence of new diseases, and disruptive breeding technologies. Research priorities to support policy include how best to maintain both diversity and redundancy in the roles played by individual crop breeding groups (public versus private and global versus local), and how best to manage connectivity to optimize resistance gene deployment while avoiding risks to the useful life of resistance genes. [Formula: see text] Copyright © 2017 The Author(s). This is an open access article distributed under the CC BY 4.0 International license .

  20. Massive-scale gene co-expression network construction and robustness testing using random matrix theory.

    PubMed

    Gibson, Scott M; Ficklin, Stephen P; Isaacson, Sven; Luo, Feng; Feltus, Frank A; Smith, Melissa C

    2013-01-01

    The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust.

  1. Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory

    PubMed Central

    Isaacson, Sven; Luo, Feng; Feltus, Frank A.; Smith, Melissa C.

    2013-01-01

    The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust. PMID:23409071

  2. Global Landscape of a Co-Expressed Gene Network in Barley and its Application to Gene Discovery in Triticeae Crops

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2011-01-01

    Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235

  3. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    PubMed Central

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  4. Gene identification for risk of relapse in stage I lung adenocarcinoma patients: a combined methodology of gene expression profiling and computational gene network analysis.

    PubMed

    Ludovini, Vienna; Bianconi, Fortunato; Siggillino, Annamaria; Piobbico, Danilo; Vannucci, Jacopo; Metro, Giulio; Chiari, Rita; Bellezza, Guido; Puma, Francesco; Della Fazia, Maria Agnese; Servillo, Giuseppe; Crinò, Lucio

    2016-05-24

    Risk assessment and treatment choice remains a challenge in early non-small-cell lung cancer (NSCLC). The aim of this study was to identify novel genes involved in the risk of early relapse (ER) compared to no relapse (NR) in resected lung adenocarcinoma (AD) patients using a combination of high throughput technology and computational analysis. We identified 18 patients (n.13 NR and n.5 ER) with stage I AD. Frozen samples of patients in ER, NR and corresponding normal lung (NL) were subjected to Microarray technology and quantitative-PCR (Q-PCR). A gene network computational analysis was performed to select predictive genes. An independent set of 79 ADs stage I samples was used to validate selected genes by Q-PCR.From microarray analysis we selected 50 genes, using the fold change ratio of ER versus NR. They were validated both in pool and individually in patient samples (ER and NR) by Q-PCR. Fourteen increased and 25 decreased genes showed a concordance between two methods. They were used to perform a computational gene network analysis that identified 4 increased (HOXA10, CLCA2, AKR1B10, FABP3) and 6 decreased (SCGB1A1, PGC, TFF1, PSCA, SPRR1B and PRSS1) genes. Moreover, in an independent dataset of ADs samples, we showed that both high FABP3 expression and low SCGB1A1 expression was associated with a worse disease-free survival (DFS).Our results indicate that it is possible to define, through gene expression and computational analysis, a characteristic gene profiling of patients with an increased risk of relapse that may become a tool for patient selection for adjuvant therapy.

  5. NIH Researchers Identify OCD Risk Gene

    MedlinePlus

    ... News From NIH NIH Researchers Identify OCD Risk Gene Past Issues / Summer 2006 Table of Contents For ... and Alcoholism (NIAAA) have identified a previously unknown gene variant that doubles an individual's risk for obsessive- ...

  6. Gene network polymorphism is the raw material of natural selection: the selfish gene network hypothesis.

    PubMed

    Boldogköi, Zsolt

    2004-09-01

    Population genetics, the mathematical theory of modern evolutionary biology, defines evolution as the alteration of the frequency of distinct gene variants (alleles) differing in fitness over the time. The major problem with this view is that in gene and protein sequences we can find little evidence concerning the molecular basis of phenotypic variance, especially those that would confer adaptive benefit to the bearers. Some novel data, however, suggest that a large amount of genetic variation exists in the regulatory region of genes within populations. In addition, comparison of homologous DNA sequences of various species shows that evolution appears to depend more strongly on gene expression than on the genes themselves. Furthermore, it has been demonstrated in several systems that genes form functional networks, whose products exhibit interrelated expression profiles. Finally, it has been found that regulatory circuits of development behave as evolutionary units. These data demonstrate that our view of evolution calls for a new synthesis. In this article I propose a novel concept, termed the selfish gene network hypothesis, which is based on an overall consideration of the above findings. The major statements of this hypothesis are as follows. (1) Instead of individual genes, gene networks (GNs) are responsible for the determination of traits and behaviors. (2) The primary source of microevolution is the intraspecific polymorphism in GNs and not the allelic variation in either the coding or the regulatory sequences of individual genes. (3) GN polymorphism is generated by the variation in the regulatory regions of the component genes and not by the variance in their coding sequences. (4) Evolution proceeds through continuous restructuring of the composition of GNs rather than fixing of specific alleles or GN variants.

  7. Networking of differentially expressed genes in human cancer cells resistant to methotrexate

    PubMed Central

    2009-01-01

    Background The need for an integrated view of data obtained from high-throughput technologies gave rise to network analyses. These are especially useful to rationalize how external perturbations propagate through the expression of genes. To address this issue in the case of drug resistance, we constructed biological association networks of genes differentially expressed in cell lines resistant to methotrexate (MTX). Methods Seven cell lines representative of different types of cancer, including colon cancer (HT29 and Caco2), breast cancer (MCF-7 and MDA-MB-468), pancreatic cancer (MIA PaCa-2), erythroblastic leukemia (K562) and osteosarcoma (Saos-2), were used. The differential expression pattern between sensitive and MTX-resistant cells was determined by whole human genome microarrays and analyzed with the GeneSpring GX software package. Genes deregulated in common between the different cancer cell lines served to generate biological association networks using the Pathway Architect software. Results Dikkopf homolog-1 (DKK1) is a highly interconnected node in the network generated with genes in common between the two colon cancer cell lines, and functional validations of this target using small interfering RNAs (siRNAs) showed a chemosensitization toward MTX. Members of the UDP-glucuronosyltransferase 1A (UGT1A) family formed a network of genes differentially expressed in the two breast cancer cell lines. siRNA treatment against UGT1A also showed an increase in MTX sensitivity. Eukaryotic translation elongation factor 1 alpha 1 (EEF1A1) was overexpressed among the pancreatic cancer, leukemia and osteosarcoma cell lines, and siRNA treatment against EEF1A1 produced a chemosensitization toward MTX. Conclusions Biological association networks identified DKK1, UGT1As and EEF1A1 as important gene nodes in MTX-resistance. Treatments using siRNA technology against these three genes showed chemosensitization toward MTX. PMID:19732436

  8. Network Analysis Reveals Putative Genes Affecting Meat Quality in Angus Cattle.

    PubMed

    Mateescu, Raluca G; Garrick, Dorian J; Reecy, James M

    2017-01-01

    Improvements in eating satisfaction will benefit consumers and should increase beef demand which is of interest to the beef industry. Tenderness, juiciness, and flavor are major determinants of the palatability of beef and are often used to reflect eating satisfaction. Carcass qualities are used as indicator traits for meat quality, with higher quality grade carcasses expected to relate to more tender and palatable meat. However, meat quality is a complex concept determined by many component traits making interpretation of genome-wide association studies (GWAS) on any one component challenging to interpret. Recent approaches combining traditional GWAS with gene network interactions theory could be more efficient in dissecting the genetic architecture of complex traits. Phenotypic measures of 23 traits reflecting carcass characteristics, components of meat quality, along with mineral and peptide concentrations were used along with Illumina 54k bovine SNP genotypes to derive an annotated gene network associated with meat quality in 2,110 Angus beef cattle. The efficient mixed model association (EMMAX) approach in combination with a genomic relationship matrix was used to directly estimate the associations between 54k SNP genotypes and each of the 23 component traits. Genomic correlated regions were identified by partial correlations which were further used along with an information theory algorithm to derive gene network clusters. Correlated SNP across 23 component traits were subjected to network scoring and visualization software to identify significant SNP. Significant pathways implicated in the meat quality complex through GO term enrichment analysis included angiogenesis, inflammation, transmembrane transporter activity, and receptor activity. These results suggest that network analysis using partial correlations and annotation of significant SNP can reveal the genetic architecture of complex traits and provide novel information regarding biological mechanisms

  9. Analyzing the genes related to Alzheimer's disease via a network and pathway-based approach.

    PubMed

    Hu, Yan-Shi; Xin, Juncai; Hu, Ying; Zhang, Lei; Wang, Ju

    2017-04-27

    Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the relation between the pathways was explored by pathway crosstalk analysis. Furthermore, the network features of these AD-related genes were analyzed in the context of human interactome and an AD-specific network was inferred using the Steiner minimal tree algorithm. We compiled 430 human genes reported to be associated with AD from 823 publications. Biological theme analysis indicated that the biological processes and biochemical pathways related to neurodevelopment, metabolism, cell growth and/or survival, and immunology were enriched in these genes. Pathway crosstalk analysis then revealed that the significantly enriched pathways could be grouped into three interlinked modules-neuronal and metabolic module, cell growth/survival and neuroendocrine pathway module, and immune response-related module-indicating an AD-specific immune-endocrine-neuronal regulatory network. Furthermore, an AD-specific protein network was inferred and novel genes potentially associated with AD were identified. By

  10. Targeted sequencing identifies 91 neurodevelopmental disorder risk genes with autism and developmental disability biases

    PubMed Central

    Stessman, Holly A. F.; Xiong, Bo; Coe, Bradley P.; Wang, Tianyun; Hoekzema, Kendra; Fenckova, Michaela; Kvarnung, Malin; Gerdts, Jennifer; Trinh, Sandy; Cosemans, Nele; Vives, Laura; Lin, Janice; Turner, Tychele N.; Santen, Gijs; Ruivenkamp, Claudia; Kriek, Marjolein; van Haeringen, Arie; Aten, Emmelien; Friend, Kathryn; Liebelt, Jan; Barnett, Christopher; Haan, Eric; Shaw, Marie; Gecz, Jozef; Anderlid, Britt-Marie; Nordgren, Ann; Lindstrand, Anna; Schwartz, Charles; Kooy, R. Frank; Vandeweyer, Geert; Helsmoortel, Celine; Romano, Corrado; Alberti, Antonino; Vinci, Mirella; Avola, Emanuela; Giusto, Stefania; Courchesne, Eric; Pramparo, Tiziano; Pierce, Karen; Nalabolu, Srinivasa; Amaral, David; Scheffer, Ingrid E.; Delatycki, Martin B.; Lockhart, Paul J.; Hormozdiari, Fereydoun; Harich, Benjamin; Castells-Nobau, Anna; Xia, Kun; Peeters, Hilde; Nordenskjöld, Magnus; Schenck, Annette; Bernier, Raphael A.; Eichler, Evan E.

    2017-01-01

    Gene-disruptive mutations contribute to the biology of neurodevelopmental disorders (NDDs), but most pathogenic genes are not known. We sequenced 208 candidate genes from >11,730 patients and >2,867 controls. We report 91 genes with an excess of de novo mutations or private disruptive mutations in 5.7% of patients, including 38 novel NDD genes. Drosophila functional assays of a subset bolster their involvement in NDDs. We identify 25 genes that show a bias for autism versus intellectual disability and highlight a network associated with high-functioning autism (FSIQ>100). Clinical follow-up for NAA15, KMT5B, and ASH1L reveals novel syndromic and non-syndromic forms of disease. PMID:28191889

  11. Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology

    PubMed Central

    2010-01-01

    Background In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships. Results The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches. Conclusions The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms. PMID:21172053

  12. Gene regulatory networks in lactation: identification of global principles using bioinformatics.

    PubMed

    Lemay, Danielle G; Neville, Margaret C; Rudolph, Michael C; Pollard, Katherine S; German, J Bruce

    2007-11-27

    The molecular events underlying mammary development during pregnancy, lactation, and involution are incompletely understood. Mammary gland microarray data, cellular localization data, protein-protein interactions, and literature-mined genes were integrated and analyzed using statistics, principal component analysis, gene ontology analysis, pathway analysis, and network analysis to identify global biological principles that govern molecular events during pregnancy, lactation, and involution. Several key principles were derived: (1) nearly a third of the transcriptome fluctuates to build, run, and disassemble the lactation apparatus; (2) genes encoding the secretory machinery are transcribed prior to lactation; (3) the diversity of the endogenous portion of the milk proteome is derived from fewer than 100 transcripts; (4) while some genes are differentially transcribed near the onset of lactation, the lactation switch is primarily post-transcriptionally mediated; (5) the secretion of materials during lactation occurs not by up-regulation of novel genomic functions, but by widespread transcriptional suppression of functions such as protein degradation and cell-environment communication; (6) the involution switch is primarily transcriptionally mediated; and (7) during early involution, the transcriptional state is partially reverted to the pre-lactation state. A new hypothesis for secretory diminution is suggested - milk production gradually declines because the secretory machinery is not transcriptionally replenished. A comprehensive network of protein interactions during lactation is assembled and new regulatory gene targets are identified. Less than one fifth of the transcriptionally regulated nodes in this lactation network have been previously explored in the context of lactation. Implications for future research in mammary and cancer biology are discussed.

  13. Identifying Candidate Reprogramming Genes in Mouse Induced Pluripotent Stem Cells.

    PubMed

    Gao, Fang; Li, Jingyu; Zhang, Heng; Yang, Xu; An, Tiezhu

    2017-08-01

    Factor-based induced reprogramming approaches have tremendous potential for human regenerative medicine, but the efficiencies of these approaches are still low. In this study, we analyzed the global transcriptional profiles of mouse induced pluripotent stem cells (miPSCs) and mouse embryonic stem cells (mESCs) from seven different labs and present here the first successful clustering according to cell type, not by lab of origin. We identified 2131 different expression genes (DEs) as candidate pluripotency-associated genes by comparing mESCs/miPSCs with somatic cells and 720 DEs between miPSCs and mESCs. Interestingly, there was a significant overlap between the two DE sets. Therefore, we defined the overlap DEs as "consensus DEs" including 313 miPSC-specific genes expressed at a higher level in miPSCs versus mESCs and 184 mESC-specific genes in total and reasoned that these may contribute to the differences in pluripotency between mESCs and miPSCs. A classification of "consensus DEs" according to their different expression levels between somatic cells and mESCs/miPSCs shows that 86% of the miPSC-specific genes are more highly expressed in somatic cells, while 73% of mESC-specific genes are highly expressed in mESCs/miPSCs, indicating that the miPSCs have not efficiently silenced the expression pattern of the somatic cells from which they are derived and failed to completely induce the genes with high expression levels in mESCs. We further revealed a strong correlation between oocyte-enriched factors and insufficiently induced mESC-specific genes and identified 11 hub genes via network analysis. In light of these findings, we postulated that these key hub genes might not only drive somatic cell nuclear transfer (SCNT) reprogramming but also augment the efficiency and quality of miPSC reprogramming.

  14. Bioinformatics, interaction network analysis, and neural networks to characterize gene expression of radicular cyst and periapical granuloma.

    PubMed

    Poswar, Fabiano de Oliveira; Farias, Lucyana Conceição; Fraga, Carlos Alberto de Carvalho; Bambirra, Wilson; Brito-Júnior, Manoel; Sousa-Neto, Manoel Damião; Santos, Sérgio Henrique Souza; de Paula, Alfredo Maurício Batista; D'Angelo, Marcos Flávio Silveira Vasconcelos; Guimarães, André Luiz Sena

    2015-06-01

    Bioinformatics has emerged as an important tool to analyze the large amount of data generated by research in different diseases. In this study, gene expression for radicular cysts (RCs) and periapical granulomas (PGs) was characterized based on a leader gene approach. A validated bioinformatics algorithm was applied to identify leader genes for RCs and PGs. Genes related to RCs and PGs were first identified in PubMed, GenBank, GeneAtlas, and GeneCards databases. The Web-available STRING software (The European Molecular Biology Laboratory [EMBL], Heidelberg, Baden-Württemberg, Germany) was used in order to build the interaction map among the identified genes by a significance score named weighted number of links. Based on the weighted number of links, genes were clustered using k-means. The genes in the highest cluster were considered leader genes. Multilayer perceptron neural network analysis was used as a complementary supplement for gene classification. For RCs, the suggested leader genes were TP53 and EP300, whereas PGs were associated with IL2RG, CCL2, CCL4, CCL5, CCR1, CCR3, and CCR5 genes. Our data revealed different gene expression for RCs and PGs, suggesting that not only the inflammatory nature but also other biological processes might differentiate RCs and PGs. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  15. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

    PubMed

    Ni, Jingchao; Koyuturk, Mehmet; Tong, Hanghang; Haines, Jonathan; Xu, Rong; Zhang, Xiang

    2016-11-10

    Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods

  16. Transcriptional Regulatory Network Analysis of MYB Transcription Factor Family Genes in Rice.

    PubMed

    Smita, Shuchi; Katiyar, Amit; Chinnusamy, Viswanathan; Pandey, Dev M; Bansal, Kailash C

    2015-01-01

    MYB transcription factor (TF) is one of the largest TF families and regulates defense responses to various stresses, hormone signaling as well as many metabolic and developmental processes in plants. Understanding these regulatory hierarchies of gene expression networks in response to developmental and environmental cues is a major challenge due to the complex interactions between the genetic elements. Correlation analyses are useful to unravel co-regulated gene pairs governing biological process as well as identification of new candidate hub genes in response to these complex processes. High throughput expression profiling data are highly useful for construction of co-expression networks. In the present study, we utilized transcriptome data for comprehensive regulatory network studies of MYB TFs by "top-down" and "guide-gene" approaches. More than 50% of OsMYBs were strongly correlated under 50 experimental conditions with 51 hub genes via "top-down" approach. Further, clusters were identified using Markov Clustering (MCL). To maximize the clustering performance, parameter evaluation of the MCL inflation score (I) was performed in terms of enriched GO categories by measuring F-score. Comparison of co-expressed cluster and clads analyzed from phylogenetic analysis signifies their evolutionarily conserved co-regulatory role. We utilized compendium of known interaction and biological role with Gene Ontology enrichment analysis to hypothesize function of coexpressed OsMYBs. In the other part, the transcriptional regulatory network analysis by "guide-gene" approach revealed 40 putative targets of 26 OsMYB TF hubs with high correlation value utilizing 815 microarray data. The putative targets with MYB-binding cis-elements enrichment in their promoter region, functional co-occurrence as well as nuclear localization supports our finding. Specially, enrichment of MYB binding regions involved in drought-inducibility implying their regulatory role in drought response in rice

  17. Temporal network analysis identifies early physiological and transcriptomic indicators of mild drought in Brassica rapa

    PubMed Central

    Gehan, Malia A; Mockler, Todd C; Weinig, Cynthia; Ewers, Brent E

    2017-01-01

    The dynamics of local climates make development of agricultural strategies challenging. Yield improvement has progressed slowly, especially in drought-prone regions where annual crop production suffers from episodic aridity. Underlying drought responses are circadian and diel control of gene expression that regulate daily variations in metabolic and physiological pathways. To identify transcriptomic changes that occur in the crop Brassica rapa during initial perception of drought, we applied a co-expression network approach to associate rhythmic gene expression changes with physiological responses. Coupled analysis of transcriptome and physiological parameters over a two-day time course in control and drought-stressed plants provided temporal resolution necessary for correlation of network modules with dynamic changes in stomatal conductance, photosynthetic rate, and photosystem II efficiency. This approach enabled the identification of drought-responsive genes based on their differential rhythmic expression profiles in well-watered versus droughted networks and provided new insights into the dynamic physiological changes that occur during drought. PMID:28826479

  18. Transcriptome analysis of an apple (Malus × domestica) yellow fruit somatic mutation identifies a gene network module highly associated with anthocyanin and epigenetic regulation

    PubMed Central

    El-Sharkawy, Islam; Liang, Dong; Xu, Kenong

    2015-01-01

    Using RNA-seq, this study analysed an apple (Malus×domestica) anthocyanin-deficient yellow-skin somatic mutant ‘Blondee’ (BLO) and its red-skin parent ‘Kidd’s D-8’ (KID), the original name of ‘Gala’, to understand the molecular mechanisms underlying the mutation. A total of 3299 differentially expressed genes (DEGs) were identified between BLO and KID at four developmental stages and/or between two adjacent stages within BLO and/or KID. A weighted gene co-expression network analysis (WGCNA) of the DEGs uncovered a network module of 34 genes highly correlated (r=0.95, P=9.0×10–13) with anthocyanin contents. Although 12 of the 34 genes in the WGCNA module were characterized and known of roles in anthocyanin, the remainder 22 appear to be novel. Examining the expression of ten representative genes in the module in 14 diverse apples revealed that at least eight were significantly correlated with anthocyanin variation. MdMYB10 (MDP0000259614) and MdGST (MDP0000252292) were among the most suppressed module member genes in BLO despite being undistinguishable in their corresponding sequences between BLO and KID. Methylation assay of MdMYB10 and MdGST in fruit skin revealed that two regions (MR3 and MR7) in the MdMYB10 promoter exhibited remarkable differences between BLO and KID. In particular, methylation was high and progressively increased alongside fruit development in BLO while was correspondingly low and constant in KID. The methylation levels in both MR3 and MR7 were negatively correlated with anthocyanin content as well as the expression of MdMYB10 and MdGST. Clearly, the collective repression of the 34 genes explains the loss-of-colour in BLO while the methylation in MdMYB10 promoter is likely causal for the mutation. PMID:26417021

  19. Fine-tuning gene networks using simple sequence repeats

    PubMed Central

    Egbert, Robert G.; Klavins, Eric

    2012-01-01

    The parameters in a complex synthetic gene network must be extensively tuned before the network functions as designed. Here, we introduce a simple and general approach to rapidly tune gene networks in Escherichia coli using hypermutable simple sequence repeats embedded in the spacer region of the ribosome binding site. By varying repeat length, we generated expression libraries that incrementally and predictably sample gene expression levels over a 1,000-fold range. We demonstrate the utility of the approach by creating a bistable switch library that programmatically samples the expression space to balance the two states of the switch, and we illustrate the need for tuning by showing that the switch’s behavior is sensitive to host context. Further, we show that mutation rates of the repeats are controllable in vivo for stability or for targeted mutagenesis—suggesting a new approach to optimizing gene networks via directed evolution. This tuning methodology should accelerate the process of engineering functionally complex gene networks. PMID:22927382

  20. Efficient Reverse-Engineering of a Developmental Gene Regulatory Network

    PubMed Central

    Cicin-Sain, Damjan; Ashyraliyev, Maksat; Jaeger, Johannes

    2012-01-01

    Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to

  1. MyGeneFriends: A Social Network Linking Genes, Genetic Diseases, and Researchers

    PubMed Central

    Allot, Alexis; Chennen, Kirsley; Nevers, Yannis; Poidevin, Laetitia; Kress, Arnaud; Ripp, Raymond; Thompson, Julie Dawn; Poch, Olivier

    2017-01-01

    Background The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. Objective MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. Methods MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. Results MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user’s specific interests and provides an efficient way to share information with collaborators. Furthermore, the user’s behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. Conclusions We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi

  2. Co-expression network analysis of duplicate genes in maize (Zea mays L.) reveals no subgenome bias.

    PubMed

    Li, Lin; Briskine, Roman; Schaefer, Robert; Schnable, Patrick S; Myers, Chad L; Flagel, Lex E; Springer, Nathan M; Muehlbauer, Gary J

    2016-11-04

    Gene duplication is prevalent in many species and can result in coding and regulatory divergence. Gene duplications can be classified as whole genome duplication (WGD), tandem and inserted (non-syntenic). In maize, WGD resulted in the subgenomes maize1 and maize2, of which maize1 is considered the dominant subgenome. However, the landscape of co-expression network divergence of duplicate genes in maize is still largely uncharacterized. To address the consequence of gene duplication on co-expression network divergence, we developed a gene co-expression network from RNA-seq data derived from 64 different tissues/stages of the maize reference inbred-B73. WGD, tandem and inserted gene duplications exhibited distinct regulatory divergence. Inserted duplicate genes were more likely to be singletons in the co-expression networks, while WGD duplicate genes were likely to be co-expressed with other genes. Tandem duplicate genes were enriched in the co-expression pattern where co-expressed genes were nearly identical for the duplicates in the network. Older gene duplications exhibit more extensive co-expression variation than younger duplications. Overall, non-syntenic genes primarily from inserted duplications show more co-expression divergence. Also, such enlarged co-expression divergence is significantly related to duplication age. Moreover, subgenome dominance was not observed in the co-expression networks - maize1 and maize2 exhibit similar levels of intra subgenome correlations. Intriguingly, the level of inter subgenome co-expression was similar to the level of intra subgenome correlations, and genes from specific subgenomes were not likely to be the enriched in co-expression network modules and the hub genes were not predominantly from any specific subgenomes in maize. Our work provides a comprehensive analysis of maize co-expression network divergence for three different types of gene duplications and identifies potential relationships between duplication types

  3. Functional Profiling Identifies Genes Involved in Organ-Specific Branches of the PIF3 Regulatory Network in Arabidopsis[C][W

    PubMed Central

    Sentandreu, Maria; Martín, Guiomar; González-Schain, Nahuel; Leivar, Pablo; Soy, Judit; Tepperman, James M.; Quail, Peter H.; Monte, Elena

    2011-01-01

    The phytochrome (phy)-interacting basic helix-loop-helix transcription factors (PIFs) constitutively sustain the etiolated state of dark-germinated seedlings by actively repressing deetiolation in darkness. This action is rapidly reversed upon light exposure by phy-induced proteolytic degradation of the PIFs. Here, we combined a microarray-based approach with a functional profiling strategy and identified four PIF3-regulated genes misexpressed in the dark (MIDAs) that are novel regulators of seedling deetiolation. We provide evidence that each one of these four MIDA genes regulates a specific facet of etiolation (hook maintenance, cotyledon appression, or hypocotyl elongation), indicating that there is branching in the signaling that PIF3 relays. Furthermore, combining inferred MIDA gene function from mutant analyses with their expression profiles in response to light-induced degradation of PIF3 provides evidence consistent with a model where the action of the PIF3/MIDA regulatory network enables an initial fast response to the light and subsequently prevents an overresponse to the initial light trigger, thus optimizing the seedling deetiolation process. Collectively, the data suggest that at least part of the phy/PIF system acts through these four MIDAs to initiate and optimize seedling deetiolation, and that this mechanism might allow the implementation of spatial (i.e., organ-specific) and temporal responses during the photomorphogenic program. PMID:22108407

  4. Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

    PubMed

    Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso

    2013-09-26

    Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function

  5. Trainable Gene Regulation Networks with Applications to Drosophila Pattern Formation

    NASA Technical Reports Server (NTRS)

    Mjolsness, Eric

    2000-01-01

    This chapter will very briefly introduce and review some computational experiments in using trainable gene regulation network models to simulate and understand selected episodes in the development of the fruit fly, Drosophila melanogaster. For details the reader is referred to the papers introduced below. It will then introduce a new gene regulation network model which can describe promoter-level substructure in gene regulation. As described in chapter 2, gene regulation may be thought of as a combination of cis-acting regulation by the extended promoter of a gene (including all regulatory sequences) by way of the transcription complex, and of trans-acting regulation by the transcription factor products of other genes. If we simplify the cis-action by using a phenomenological model which can be tuned to data, such as a unit or other small portion of an artificial neural network, then the full transacting interaction between multiple genes during development can be modelled as a larger network which can again be tuned or trained to data. The larger network will in general need to have recurrent (feedback) connections since at least some real gene regulation networks do. This is the basic modeling approach taken, which describes how a set of recurrent neural networks can be used as a modeling language for multiple developmental processes including gene regulation within a single cell, cell-cell communication, and cell division. Such network models have been called "gene circuits", "gene regulation networks", or "genetic regulatory networks", sometimes without distinguishing the models from the actual modeled systems.

  6. Listening to the Noise: Random Fluctuations Reveal Gene Network Parameters

    NASA Astrophysics Data System (ADS)

    Munsky, Brian; Trinh, Brooke; Khammash, Mustafa

    2010-03-01

    The cellular environment is abuzz with noise originating from the inherent random motion of reacting molecules in the living cell. In this noisy environment, clonal cell populations exhibit cell-to-cell variability that can manifest significant prototypical differences. Noise induced stochastic fluctuations in cellular constituents can be measured and their statistics quantified using flow cytometry, single molecule fluorescence in situ hybridization, time lapse fluorescence microscopy and other single cell and single molecule measurement techniques. We show that these random fluctuations carry within them valuable information about the underlying genetic network. Far from being a nuisance, the ever-present cellular noise acts as a rich source of excitation that, when processed through a gene network, carries its distinctive fingerprint that encodes a wealth of information about that network. We demonstrate that in some cases the analysis of these random fluctuations enables the full identification of network parameters, including those that may otherwise be difficult to measure. We use theoretical investigations to establish experimental guidelines for the identification of gene regulatory networks, and we apply these guideline to experimentally identify predictive models for different regulatory mechanisms in bacteria and yeast.

  7. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    PubMed

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an

  8. Heart morphogenesis gene regulatory networks revealed by temporal expression analysis.

    PubMed

    Hill, Jonathon T; Demarest, Bradley; Gorsi, Bushra; Smith, Megan; Yost, H Joseph

    2017-10-01

    During embryogenesis the heart forms as a linear tube that then undergoes multiple simultaneous morphogenetic events to obtain its mature shape. To understand the gene regulatory networks (GRNs) driving this phase of heart development, during which many congenital heart disease malformations likely arise, we conducted an RNA-seq timecourse in zebrafish from 30 hpf to 72 hpf and identified 5861 genes with altered expression. We clustered the genes by temporal expression pattern, identified transcription factor binding motifs enriched in each cluster, and generated a model GRN for the major gene batteries in heart morphogenesis. This approach predicted hundreds of regulatory interactions and found batteries enriched in specific cell and tissue types, indicating that the approach can be used to narrow the search for novel genetic markers and regulatory interactions. Subsequent analyses confirmed the GRN using two mutants, Tbx5 and nkx2-5 , and identified sets of duplicated zebrafish genes that do not show temporal subfunctionalization. This dataset provides an essential resource for future studies on the genetic/epigenetic pathways implicated in congenital heart defects and the mechanisms of cardiac transcriptional regulation. © 2017. Published by The Company of Biologists Ltd.

  9. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues.

    PubMed

    Wong, Aaron K; Krishnan, Arjun; Troyanskaya, Olga G

    2018-05-25

    GIANT2 (Genome-wide Integrated Analysis of gene Networks in Tissues) is an interactive web server that enables biomedical researchers to analyze their proteins and pathways of interest and generate hypotheses in the context of genome-scale functional maps of human tissues. The precise actions of genes are frequently dependent on their tissue context, yet direct assay of tissue-specific protein function and interactions remains infeasible in many normal human tissues and cell-types. With GIANT2, researchers can explore predicted tissue-specific functional roles of genes and reveal changes in those roles across tissues, all through interactive multi-network visualizations and analyses. Additionally, the NetWAS approach available through the server uses tissue-specific/cell-type networks predicted by GIANT2 to re-prioritize statistical associations from GWAS studies and identify disease-associated genes. GIANT2 predicts tissue-specific interactions by integrating diverse functional genomics data from now over 61 400 experiments for 283 diverse tissues and cell-types. GIANT2 does not require any registration or installation and is freely available for use at http://giant-v2.princeton.edu.

  10. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

    PubMed

    Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

    2016-04-26

    Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.

  11. Reverse engineering of TLX oncogenic transcriptional networks identifies RUNX1 as tumor suppressor in T-ALL.

    PubMed

    Della Gatta, Giusy; Palomero, Teresa; Perez-Garcia, Arianne; Ambesi-Impiombato, Alberto; Bansal, Mukesh; Carpenter, Zachary W; De Keersmaecker, Kim; Sole, Xavier; Xu, Luyao; Paietta, Elisabeth; Racevskis, Janis; Wiernik, Peter H; Rowe, Jacob M; Meijerink, Jules P; Califano, Andrea; Ferrando, Adolfo A

    2012-02-26

    The TLX1 and TLX3 transcription factor oncogenes have a key role in the pathogenesis of T cell acute lymphoblastic leukemia (T-ALL). Here we used reverse engineering of global transcriptional networks to decipher the oncogenic regulatory circuit controlled by TLX1 and TLX3. This systems biology analysis defined T cell leukemia homeobox 1 (TLX1) and TLX3 as master regulators of an oncogenic transcriptional circuit governing T-ALL. Notably, a network structure analysis of this hierarchical network identified RUNX1 as a key mediator of the T-ALL induced by TLX1 and TLX3 and predicted a tumor-suppressor role for RUNX1 in T cell transformation. Consistent with these results, we identified recurrent somatic loss-of-function mutations in RUNX1 in human T-ALL. Overall, these results place TLX1 and TLX3 at the top of an oncogenic transcriptional network controlling leukemia development, show the power of network analyses to identify key elements in the regulatory circuits governing human cancer and identify RUNX1 as a tumor-suppressor gene in T-ALL.

  12. Gene function prediction with gene interaction networks: a context graph kernel approach.

    PubMed

    Li, Xin; Chen, Hsinchun; Li, Jiexun; Zhang, Zhu

    2010-01-01

    Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.

  13. Discovering disease-associated genes in weighted protein-protein interaction networks

    NASA Astrophysics Data System (ADS)

    Cui, Ying; Cai, Meng; Stanley, H. Eugene

    2018-04-01

    Although there have been many network-based attempts to discover disease-associated genes, most of them have not taken edge weight - which quantifies their relative strength - into consideration. We use connection weights in a protein-protein interaction (PPI) network to locate disease-related genes. We analyze the topological properties of both weighted and unweighted PPI networks and design an improved random forest classifier to distinguish disease genes from non-disease genes. We use a cross-validation test to confirm that weighted networks are better able to discover disease-associated genes than unweighted networks, which indicates that including link weight in the analysis of network properties provides a better model of complex genotype-phenotype associations.

  14. Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks.

    PubMed

    Muetze, Tanja; Goenawan, Ivan H; Wiencko, Heather L; Bernal-Llinares, Manuel; Bryan, Kenneth; Lynn, David J

    2016-01-01

    Highly connected nodes (hubs) in biological networks are topologically important to the structure of the network and have also been shown to be preferentially associated with a range of phenotypes of interest. The relative importance of a hub node, however, can change depending on the biological context. Here, we report a Cytoscape app, the Contextual Hub Analysis Tool (CHAT), which enables users to easily construct and visualize a network of interactions from a gene or protein list of interest, integrate contextual information, such as gene expression or mass spectrometry data, and identify hub nodes that are more highly connected to contextual nodes (e.g. genes or proteins that are differentially expressed) than expected by chance. In a case study, we use CHAT to construct a network of genes that are differentially expressed in Dengue fever, a viral infection. CHAT was used to identify and compare contextual and degree-based hubs in this network. The top 20 degree-based hubs were enriched in pathways related to the cell cycle and cancer, which is likely due to the fact that proteins involved in these processes tend to be highly connected in general. In comparison, the top 20 contextual hubs were enriched in pathways commonly observed in a viral infection including pathways related to the immune response to viral infection. This analysis shows that such contextual hubs are considerably more biologically relevant than degree-based hubs and that analyses which rely on the identification of hubs solely based on their connectivity may be biased towards nodes that are highly connected in general rather than in the specific context of interest. CHAT is available for Cytoscape 3.0+ and can be installed via the Cytoscape App Store ( http://apps.cytoscape.org/apps/chat).

  15. MIR@NT@N: a framework integrating transcription factors, microRNAs and their targets to identify sub-network motifs in a meta-regulation network model

    PubMed Central

    2011-01-01

    Background To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regulatory molecular interactions at a genome-wide scale would be of great interest to biologists, but is not available to date. Results To identify and analyze molecular interaction networks, we developed MIR@NT@N, an integrative approach based on a meta-regulation network model and a large-scale database. MIR@NT@N uses a graph-based approach to predict novel molecular actors across multiple regulatory processes (i.e. TFs acting on protein-coding or miRNA genes, or miRNAs acting on messenger RNAs). Exploiting these predictions, the user can generate networks and further analyze them to identify sub-networks, including motifs such as feedback and feedforward loops (FBL and FFL). In addition, networks can be built from lists of molecular actors with an a priori role in a given biological process to predict novel and unanticipated interactions. Analyses can be contextualized and filtered by integrating additional information such as microarray expression data. All results, including generated graphs, can be visualized, saved and exported into various formats. MIR@NT@N performances have been evaluated using published data and then applied to the regulatory program underlying epithelium to mesenchyme transition (EMT), an evolutionary-conserved process which is implicated in embryonic development and disease. Conclusions MIR@NT@N is an effective computational approach to identify novel molecular regulations and to predict gene regulatory networks and sub-networks including conserved motifs within a given biological context. Taking advantage of the M@IA environment, MIR@NT@N is a user-friendly web resource freely available at http://mironton.uni.lu which will be

  16. Pathway cross-talk network analysis identifies critical pathways in neonatal sepsis.

    PubMed

    Meng, Yu-Xiu; Liu, Quan-Hong; Chen, Deng-Hong; Meng, Ying

    2017-06-01

    Despite advances in neonatal care, sepsis remains a major cause of morbidity and mortality in neonates worldwide. Pathway cross-talk analysis might contribute to the inference of the driving forces in bacterial sepsis and facilitate a better understanding of underlying pathogenesis of neonatal sepsis. This study aimed to explore the critical pathways associated with the progression of neonatal sepsis by the pathway cross-talk analysis. By integrating neonatal transcriptome data with known pathway data and protein-protein interaction data, we systematically uncovered the disease pathway cross-talks and constructed a disease pathway cross-talk network for neonatal sepsis. Then, attract method was employed to explore the dysregulated pathways associated with neonatal sepsis. To determine the critical pathways in neonatal sepsis, rank product (RP) algorithm, centrality analysis and impact factor (IF) were introduced sequentially, which synthetically considered the differential expression of genes and pathways, pathways cross-talks and pathway parameters in the network. The dysregulated pathways with the highest IF values as well as RP<0.01 were defined as critical pathways in neonatal sepsis. By integrating three kinds of data, only 6919 common genes were included to perform the pathway cross-talk analysis. By statistic analysis, a total of 1249 significant pathway cross-talks were selected to construct the pathway cross-talk network. Moreover, 47 dys-regulated pathways were identified via attract method, 20 pathways were identified under RP<0.01, and the top 10 pathways with the highest IF were also screened from the pathway cross-talk network. Among them, we selected 8 common pathways, i.e. critical pathways. In this study, we systematically tracked 8 critical pathways involved in neonatal sepsis by integrating attract method and pathway cross-talk network. These pathways might be responsible for the host response in infection, and of great value for advancing

  17. Chronic Ethanol Exposure Produces Time- and Brain Region-Dependent Changes in Gene Coexpression Networks

    PubMed Central

    Osterndorff-Kahanek, Elizabeth A.; Becker, Howard C.; Lopez, Marcelo F.; Farris, Sean P.; Tiwari, Gayatri R.; Nunez, Yury O.; Harris, R. Adron; Mayfield, R. Dayne

    2015-01-01

    Repeated ethanol exposure and withdrawal in mice increases voluntary drinking and represents an animal model of physical dependence. We examined time- and brain region-dependent changes in gene coexpression networks in amygdala (AMY), nucleus accumbens (NAC), prefrontal cortex (PFC), and liver after four weekly cycles of chronic intermittent ethanol (CIE) vapor exposure in C57BL/6J mice. Microarrays were used to compare gene expression profiles at 0-, 8-, and 120-hours following the last ethanol exposure. Each brain region exhibited a large number of differentially expressed genes (2,000-3,000) at the 0- and 8-hour time points, but fewer changes were detected at the 120-hour time point (400-600). Within each region, there was little gene overlap across time (~20%). All brain regions were significantly enriched with differentially expressed immune-related genes at the 8-hour time point. Weighted gene correlation network analysis identified modules that were highly enriched with differentially expressed genes at the 0- and 8-hour time points with virtually no enrichment at 120 hours. Modules enriched for both ethanol-responsive and cell-specific genes were identified in each brain region. These results indicate that chronic alcohol exposure causes global ‘rewiring‘ of coexpression systems involving glial and immune signaling as well as neuronal genes. PMID:25803291

  18. Preferential Allele Expression Analysis Identifies Shared Germline and Somatic Driver Genes in Advanced Ovarian Cancer

    PubMed Central

    Halabi, Najeeb M.; Martinez, Alejandra; Al-Farsi, Halema; Mery, Eliane; Puydenus, Laurence; Pujol, Pascal; Khalak, Hanif G.; McLurcan, Cameron; Ferron, Gwenael; Querleu, Denis; Al-Azwani, Iman; Al-Dous, Eman; Mohamoud, Yasmin A.; Malek, Joel A.; Rafii, Arash

    2016-01-01

    Identifying genes where a variant allele is preferentially expressed in tumors could lead to a better understanding of cancer biology and optimization of targeted therapy. However, tumor sample heterogeneity complicates standard approaches for detecting preferential allele expression. We therefore developed a novel approach combining genome and transcriptome sequencing data from the same sample that corrects for sample heterogeneity and identifies significant preferentially expressed alleles. We applied this analysis to epithelial ovarian cancer samples consisting of matched primary ovary and peritoneum and lymph node metastasis. We find that preferentially expressed variant alleles include germline and somatic variants, are shared at a relatively high frequency between patients, and are in gene networks known to be involved in cancer processes. Analysis at a patient level identifies patient-specific preferentially expressed alleles in genes that are targets for known drugs. Analysis at a site level identifies patterns of site specific preferential allele expression with similar pathways being impacted in the primary and metastasis sites. We conclude that genes with preferentially expressed variant alleles can act as cancer drivers and that targeting those genes could lead to new therapeutic strategies. PMID:26735499

  19. Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series.

    PubMed

    Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina

    2015-01-01

    Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.

  20. Protein Interaction Networks Reveal Novel Autism Risk Genes within GWAS Statistical Noise

    PubMed Central

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M.

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical “noise” that warrant further analysis for causal variants. PMID:25409314

  1. Protein interaction networks reveal novel autism risk genes within GWAS statistical noise.

    PubMed

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical "noise" that warrant further analysis for causal variants.

  2. The Reconstruction and Analysis of Gene Regulatory Networks.

    PubMed

    Zheng, Guangyong; Huang, Tao

    2018-01-01

    In post-genomic era, an important task is to explore the function of individual biological molecules (i.e., gene, noncoding RNA, protein, metabolite) and their organization in living cells. For this end, gene regulatory networks (GRNs) are constructed to show relationship between biological molecules, in which the vertices of network denote biological molecules and the edges of network present connection between nodes (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). Biologists can understand not only the function of biological molecules but also the organization of components of living cells through interpreting the GRNs, since a gene regulatory network is a comprehensively physiological map of living cells and reflects influence of genetic and epigenetic factors (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). In this paper, we will review the inference methods of GRN reconstruction and analysis approaches of network structure. As a powerful tool for studying complex diseases and biological processes, the applications of the network method in pathway analysis and disease gene identification will be introduced.

  3. In silico identification of miRNAs and their target genes and analysis of gene co-expression network in saffron (Crocus sativus L.) stigma

    PubMed Central

    Zinati, Zahra; Shamloo-Dashtpagerdi, Roohollah; Behpouri, Ali

    2016-01-01

    As an aromatic and colorful plant of substantive taste, saffron (Crocus sativus L.) owes such properties of matter to growing class of the secondary metabolites derived from the carotenoids, apocarotenoids. Regarding the critical role of microRNAs in secondary metabolic synthesis and the limited number of identified miRNAs in C. sativus, on the other hand, one may see the point how the characterization of miRNAs along with the corresponding target genes in C. sativus might expand our perspectives on the roles of miRNAs in carotenoid/apocarotenoid biosynthetic pathway. A computational analysis was used to identify miRNAs and their targets using EST (Expressed Sequence Tag) library from mature saffron stigmas. Then, a gene co- expression network was constructed to identify genes which are potentially involved in carotenoid/apocarotenoid biosynthetic pathways. EST analysis led to the identification of two putative miRNAs (miR414 and miR837-5p) along with the corresponding stem- looped precursors. To our knowledge, this is the first report on miR414 and miR837-5p in C. sativus. Co-expression network analysis indicated that miR414 and miR837-5p may play roles in C. sativus metabolic pathways and led to identification of candidate genes including six transcription factors and one protein kinase probably involved in carotenoid/apocarotenoid biosynthetic pathway. Presence of transcription factors, miRNAs and protein kinase in the network indicated multiple layers of regulation in saffron stigma. The candidate genes from this study may help unraveling regulatory networks underlying the carotenoid/apocarotenoid biosynthesis in saffron and designing metabolic engineering for enhanced secondary metabolites. PMID:28261627

  4. Data identification for improving gene network inference using computational algebra.

    PubMed

    Dimitrova, Elena; Stigler, Brandilyn

    2014-11-01

    Identification of models of gene regulatory networks is sensitive to the amount of data used as input. Considering the substantial costs in conducting experiments, it is of value to have an estimate of the amount of data required to infer the network structure. To minimize wasted resources, it is also beneficial to know which data are necessary to identify the network. Knowledge of the data and knowledge of the terms in polynomial models are often required a priori in model identification. In applications, it is unlikely that the structure of a polynomial model will be known, which may force data sets to be unnecessarily large in order to identify a model. Furthermore, none of the known results provides any strategy for constructing data sets to uniquely identify a model. We provide a specialization of an existing criterion for deciding when a set of data points identifies a minimal polynomial model when its monomial terms have been specified. Then, we relax the requirement of the knowledge of the monomials and present results for model identification given only the data. Finally, we present a method for constructing data sets that identify minimal polynomial models.

  5. Diversified Control Paths: A Significant Way Disease Genes Perturb the Human Regulatory Network

    PubMed Central

    Wang, Bingbo; Gao, Lin; Zhang, Qingfang; Li, Aimin; Deng, Yue; Guo, Xingli

    2015-01-01

    Background The complexity of biological systems motivates us to use the underlying networks to provide deep understanding of disease etiology and the human diseases are viewed as perturbations of dynamic properties of networks. Control theory that deals with dynamic systems has been successfully used to capture systems-level knowledge in large amount of quantitative biological interactions. But from the perspective of system control, the ways by which multiple genetic factors jointly perturb a disease phenotype still remain. Results In this work, we combine tools from control theory and network science to address the diversified control paths in complex networks. Then the ways by which the disease genes perturb biological systems are identified and quantified by the control paths in a human regulatory network. Furthermore, as an application, prioritization of candidate genes is presented by use of control path analysis and gene ontology annotation for definition of similarities. We use leave-one-out cross-validation to evaluate the ability of finding the gene-disease relationship. Results have shown compatible performance with previous sophisticated works, especially in directed systems. Conclusions Our results inspire a deeper understanding of molecular mechanisms that drive pathological processes. Diversified control paths offer a basis for integrated intervention techniques which will ultimately lead to the development of novel therapeutic strategies. PMID:26284649

  6. MyGeneFriends: A Social Network Linking Genes, Genetic Diseases, and Researchers.

    PubMed

    Allot, Alexis; Chennen, Kirsley; Nevers, Yannis; Poidevin, Laetitia; Kress, Arnaud; Ripp, Raymond; Thompson, Julie Dawn; Poch, Olivier; Lecompte, Odile

    2017-06-16

    The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends. ©Alexis Allot, Kirsley Chennen, Yannis

  7. Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.

    PubMed

    Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia

    2017-07-24

    Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.

  8. Stationary and structural control in gene regulatory networks: basic concepts

    NASA Astrophysics Data System (ADS)

    Dougherty, Edward R.; Pal, Ranadip; Qian, Xiaoning; Bittner, Michael L.; Datta, Aniruddha

    2010-01-01

    A major reason for constructing gene regulatory networks is to use them as models for determining therapeutic intervention strategies by deriving ways of altering their long-run dynamics in such a way as to reduce the likelihood of entering undesirable states. In general, two paradigms have been taken for gene network intervention: (1) stationary external control is based on optimally altering the status of a control gene (or genes) over time to drive network dynamics; and (2) structural intervention involves an optimal one-time change of the network structure (wiring) to beneficially alter the long-run behaviour of the network. These intervention approaches have mainly been developed within the context of the probabilistic Boolean network model for gene regulation. This article reviews both types of intervention and applies them to reducing the metastatic competence of cells via intervention in a melanoma-related network.

  9. Gene and Metabolite Regulatory Network Analysis of Early Developing Fruit Tissues Highlights New Candidate Genes for the Control of Tomato Fruit Composition and Development1[C][W][OA

    PubMed Central

    Mounet, Fabien; Moing, Annick; Garcia, Virginie; Petit, Johann; Maucourt, Michael; Deborde, Catherine; Bernillon, Stéphane; Le Gall, Gwénaëlle; Colquhoun, Ian; Defernez, Marianne; Giraudel, Jean-Luc; Rolin, Dominique; Rothan, Christophe; Lemaire-Chamley, Martine

    2009-01-01

    Variations in early fruit development and composition may have major impacts on the taste and the overall quality of ripe tomato (Solanum lycopersicum) fruit. To get insights into the networks involved in these coordinated processes and to identify key regulatory genes, we explored the transcriptional and metabolic changes in expanding tomato fruit tissues using multivariate analysis and gene-metabolite correlation networks. To this end, we demonstrated and took advantage of the existence of clear structural and compositional differences between expanding mesocarp and locular tissue during fruit development (12–35 d postanthesis). Transcriptome and metabolome analyses were carried out with tomato microarrays and analytical methods including proton nuclear magnetic resonance and liquid chromatography-mass spectrometry, respectively. Pairwise comparisons of metabolite contents and gene expression profiles detected up to 37 direct gene-metabolite correlations involving regulatory genes (e.g. the correlations between glutamine, bZIP, and MYB transcription factors). Correlation network analyses revealed the existence of major hub genes correlated with 10 or more regulatory transcripts and embedded in a large regulatory network. This approach proved to be a valuable strategy for identifying specific subsets of genes implicated in key processes of fruit development and metabolism, which are therefore potential targets for genetic improvement of tomato fruit quality. PMID:19144766

  10. Coding and non-coding gene regulatory networks underlie the immune response in liver cirrhosis

    PubMed Central

    Zhang, Xueming; Huang, Yongming; Yang, Zhengpeng; Zhang, Yuguo; Zhang, Weihui; Gao, Zu-hua; Xue, Dongbo

    2017-01-01

    Liver cirrhosis is recognized as being the consequence of immune-mediated hepatocyte damage and repair processes. However, the regulation of these immune responses underlying liver cirrhosis has not been elucidated. In this study, we used GEO datasets and bioinformatics methods to established coding and non-coding gene regulatory networks including transcription factor-/lncRNA-microRNA-mRNA, and competing endogenous RNA interaction networks. Our results identified 2224 mRNAs, 70 lncRNAs and 46 microRNAs were differentially expressed in liver cirrhosis. The transcription factor -/lncRNA- microRNA-mRNA network we uncovered that results in immune-mediated liver cirrhosis is comprised of 5 core microRNAs (e.g., miR-203; miR-219-5p), 3 transcription factors (i.e., FOXP3, ETS1 and FOS) and 7 lncRNAs (e.g., ENTS00000671336, ENST00000575137). The competing endogenous RNA interaction network we identified includes a complex immune response regulatory subnetwork that controls the entire liver cirrhosis network. Additionally, we found 10 overlapping GO terms shared by both liver cirrhosis and hepatocellular carcinoma including “immune response” as well. Interestingly, the overlapping differentially expressed genes in liver cirrhosis and hepatocellular carcinoma were enriched in immune response-related functional terms. In summary, a complex gene regulatory network underlying immune response processes may play an important role in the development and progression of liver cirrhosis, and its development into hepatocellular carcinoma. PMID:28355233

  11. Coding and non-coding gene regulatory networks underlie the immune response in liver cirrhosis.

    PubMed

    Gao, Bo; Zhang, Xueming; Huang, Yongming; Yang, Zhengpeng; Zhang, Yuguo; Zhang, Weihui; Gao, Zu-Hua; Xue, Dongbo

    2017-01-01

    Liver cirrhosis is recognized as being the consequence of immune-mediated hepatocyte damage and repair processes. However, the regulation of these immune responses underlying liver cirrhosis has not been elucidated. In this study, we used GEO datasets and bioinformatics methods to established coding and non-coding gene regulatory networks including transcription factor-/lncRNA-microRNA-mRNA, and competing endogenous RNA interaction networks. Our results identified 2224 mRNAs, 70 lncRNAs and 46 microRNAs were differentially expressed in liver cirrhosis. The transcription factor -/lncRNA- microRNA-mRNA network we uncovered that results in immune-mediated liver cirrhosis is comprised of 5 core microRNAs (e.g., miR-203; miR-219-5p), 3 transcription factors (i.e., FOXP3, ETS1 and FOS) and 7 lncRNAs (e.g., ENTS00000671336, ENST00000575137). The competing endogenous RNA interaction network we identified includes a complex immune response regulatory subnetwork that controls the entire liver cirrhosis network. Additionally, we found 10 overlapping GO terms shared by both liver cirrhosis and hepatocellular carcinoma including "immune response" as well. Interestingly, the overlapping differentially expressed genes in liver cirrhosis and hepatocellular carcinoma were enriched in immune response-related functional terms. In summary, a complex gene regulatory network underlying immune response processes may play an important role in the development and progression of liver cirrhosis, and its development into hepatocellular carcinoma.

  12. Detection of type 2 diabetes related modules and genes based on epigenetic networks

    PubMed Central

    2014-01-01

    Background Type 2 diabetes (T2D) is one of the most common chronic metabolic diseases characterized by insulin resistance and the decrease of insulin secretion. Genetic variation can only explain part of the heritability of T2D, so there need new methods to detect the susceptibility genes of the disease. Epigenetics could establish the interface between the environmental factor and the T2D Pathological mechanism. Results Based on the network theory and by combining epigenetic characteristics with human interactome, the weighted human DNA methylation network (WMPN) was constructed, and a T2D-related subnetwork (TMSN) was obtained through T2D-related differentially methylated genes. It is found that TMSN had a T2D specific network structure that non-fatal metabolic disease causing genes were often located in the topological and functional periphery of network. Combined with chromatin modifications, the weighted chromatin modification network (WCPN) was built, and a T2D-related chromatin modification pattern subnetwork was obtained by the TMSN gene set. TCSN had a densely connected network community, indicating that TMSN and TCSN could represent a collection of T2D-related epigenetic dysregulated sub-pathways. Using the cumulative hypergeometric test, 24 interplay modules of DNA methylation and chromatin modifications were identified. By the analysis of gene expression in human T2D islet tissue, it is found that there existed genes with the variant expression level caused by the aberrant DNA methylation and (or) chromatin modifications, which might affect and promote the development of T2D. Conclusions Here we have detected the potential interplay modules of DNA methylation and chromatin modifications for T2D. The study of T2D epigenetic networks provides a new way for understanding the pathogenic mechanism of T2D caused by epigenetic disorders. PMID:24565181

  13. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    PubMed

    Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun

    2014-01-01

    The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  14. Modeling gene regulatory networks: A network simplification algorithm

    NASA Astrophysics Data System (ADS)

    Ferreira, Luiz Henrique O.; de Castro, Maria Clicia S.; da Silva, Fabricio A. B.

    2016-12-01

    Boolean networks have been used for some time to model Gene Regulatory Networks (GRNs), which describe cell functions. Those models can help biologists to make predictions, prognosis and even specialized treatment when some disturb on the GRN lead to a sick condition. However, the amount of information related to a GRN can be huge, making the task of inferring its boolean network representation quite a challenge. The method shown here takes into account information about the interactome to build a network, where each node represents a protein, and uses the entropy of each node as a key to reduce the size of the network, allowing the further inferring process to focus only on the main protein hubs, the ones with most potential to interfere in overall network behavior.

  15. Topology association analysis in weighted protein interaction network for gene prioritization

    NASA Astrophysics Data System (ADS)

    Wu, Shunyao; Shao, Fengjing; Zhang, Qi; Ji, Jun; Xu, Shaojie; Sun, Rencheng; Sun, Gengxin; Du, Xiangjun; Sui, Yi

    2016-11-01

    Although lots of algorithms for disease gene prediction have been proposed, the weights of edges are rarely taken into account. In this paper, the strengths of topology associations between disease and essential genes are analyzed in weighted protein interaction network. Empirical analysis demonstrates that compared to other genes, disease genes are weakly connected with essential genes in protein interaction network. Based on this finding, a novel global distance measurement for gene prioritization with weighted protein interaction network is proposed in this paper. Positive and negative flow is allocated to disease and essential genes, respectively. Additionally network propagation model is extended for weighted network. Experimental results on 110 diseases verify the effectiveness and potential of the proposed measurement. Moreover, weak links play more important role than strong links for gene prioritization, which is meaningful to deeply understand protein interaction network.

  16. Stability Depends on Positive Autoregulation in Boolean Gene Regulatory Networks

    PubMed Central

    Pinho, Ricardo; Garcia, Victor; Irimia, Manuel; Feldman, Marcus W.

    2014-01-01

    Network motifs have been identified as building blocks of regulatory networks, including gene regulatory networks (GRNs). The most basic motif, autoregulation, has been associated with bistability (when positive) and with homeostasis and robustness to noise (when negative), but its general importance in network behavior is poorly understood. Moreover, how specific autoregulatory motifs are selected during evolution and how this relates to robustness is largely unknown. Here, we used a class of GRN models, Boolean networks, to investigate the relationship between autoregulation and network stability and robustness under various conditions. We ran evolutionary simulation experiments for different models of selection, including mutation and recombination. Each generation simulated the development of a population of organisms modeled by GRNs. We found that stability and robustness positively correlate with autoregulation; in all investigated scenarios, stable networks had mostly positive autoregulation. Assuming biological networks correspond to stable networks, these results suggest that biological networks should often be dominated by positive autoregulatory loops. This seems to be the case for most studied eukaryotic transcription factor networks, including those in yeast, flies and mammals. PMID:25375153

  17. Statistical identification of gene association by CID in application of constructing ER regulatory network

    PubMed Central

    Liu, Li-Yu D; Chen, Chien-Yu; Chen, Mei-Ju M; Tsai, Ming-Shian; Lee, Cho-Han S; Phang, Tzu L; Chang, Li-Yun; Kuo, Wen-Hung; Hwa, Hsiao-Lin; Lien, Huang-Chun; Jung, Shih-Ming; Lin, Yi-Shing; Chang, King-Jen; Hsieh, Fon-Jou

    2009-01-01

    Background A variety of high-throughput techniques are now available for constructing comprehensive gene regulatory networks in systems biology. In this study, we report a new statistical approach for facilitating in silico inference of regulatory network structure. The new measure of association, coefficient of intrinsic dependence (CID), is model-free and can be applied to both continuous and categorical distributions. When given two variables X and Y, CID answers whether Y is dependent on X by examining the conditional distribution of Y given X. In this paper, we apply CID to analyze the regulatory relationships between transcription factors (TFs) (X) and their downstream genes (Y) based on clinical data. More specifically, we use estrogen receptor α (ERα) as the variable X, and the analyses are based on 48 clinical breast cancer gene expression arrays (48A). Results The analytical utility of CID was evaluated in comparison with four commonly used statistical methods, Galton-Pearson's correlation coefficient (GPCC), Student's t-test (STT), coefficient of determination (CoD), and mutual information (MI). When being compared to GPCC, CoD, and MI, CID reveals its preferential ability to discover the regulatory association where distribution of the mRNA expression levels on X and Y does not fit linear models. On the other hand, when CID is used to measure the association of a continuous variable (Y) against a discrete variable (X), it shows similar performance as compared to STT, and appears to outperform CoD and MI. In addition, this study established a two-layer transcriptional regulatory network to exemplify the usage of CID, in combination with GPCC, in deciphering gene networks based on gene expression profiles from patient arrays. Conclusion CID is shown to provide useful information for identifying associations between genes and transcription factors of interest in patient arrays. When coupled with the relationships detected by GPCC, the association

  18. Network-Based Integration of Disparate Omic Data To Identify "Silent Players" in Cancer

    PubMed Central

    Ruffalo, Matthew

    2015-01-01

    Development of high-throughput monitoring technologies enables interrogation of cancer samples at various levels of cellular activity. Capitalizing on these developments, various public efforts such as The Cancer Genome Atlas (TCGA) generate disparate omic data for large patient cohorts. As demonstrated by recent studies, these heterogeneous data sources provide the opportunity to gain insights into the molecular changes that drive cancer pathogenesis and progression. However, these insights are limited by the vast search space and as a result low statistical power to make new discoveries. In this paper, we propose methods for integrating disparate omic data using molecular interaction networks, with a view to gaining mechanistic insights into the relationship between molecular changes at different levels of cellular activity. Namely, we hypothesize that genes that play a role in cancer development and progression may be implicated by neither frequent mutation nor differential expression, and that network-based integration of mutation and differential expression data can reveal these “silent players”. For this purpose, we utilize network-propagation algorithms to simulate the information flow in the cell at a sample-specific resolution. We then use the propagated mutation and expression signals to identify genes that are not necessarily mutated or differentially expressed genes, but have an essential role in tumor development and patient outcome. We test the proposed method on breast cancer and glioblastoma multiforme data obtained from TCGA. Our results show that the proposed method can identify important proteins that are not readily revealed by molecular data, providing insights beyond what can be gleaned by analyzing different types of molecular data in isolation. PMID:26683094

  19. Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation.

    PubMed

    Rund, Samuel S C; Yoo, Boyoung; Alam, Camille; Green, Taryn; Stephens, Melissa T; Zeng, Erliang; George, Gary F; Sheppard, Aaron D; Duffield, Giles E; Milenković, Tijana; Pfrender, Michael E

    2016-08-18

    Marine and freshwater zooplankton exhibit daily rhythmic patterns of behavior and physiology which may be regulated directly by the light:dark (LD) cycle and/or a molecular circadian clock. One of the best-studied zooplankton taxa, the freshwater crustacean Daphnia, has a 24 h diel vertical migration (DVM) behavior whereby the organism travels up and down through the water column daily. DVM plays a critical role in resource tracking and the behavioral avoidance of predators and damaging ultraviolet radiation. However, there is little information at the transcriptional level linking the expression patterns of genes to the rhythmic physiology/behavior of Daphnia. Here we analyzed genome-wide temporal transcriptional patterns from Daphnia pulex collected over a 44 h time period under a 12:12 LD cycle (diel) conditions using a cosine-fitting algorithm. We used a comprehensive network modeling and analysis approach to identify novel co-regulated rhythmic genes that have similar network topological properties and functional annotations as rhythmic genes identified by the cosine-fitting analyses. Furthermore, we used the network approach to predict with high accuracy novel gene-function associations, thus enhancing current functional annotations available for genes in this ecologically relevant model species. Our results reveal that genes in many functional groupings exhibit 24 h rhythms in their expression patterns under diel conditions. We highlight the rhythmic expression of immunity, oxidative detoxification, and sensory process genes. We discuss differences in the chronobiology of D. pulex from other well-characterized terrestrial arthropods. This research adds to a growing body of literature suggesting the genetic mechanisms governing rhythmicity in crustaceans may be divergent from other arthropod lineages including insects. Lastly, these results highlight the power of using a network analysis approach to identify differential gene expression and provide novel

  20. Transcriptional dynamics of a conserved gene expression network associated with craniofacial divergence in Arctic charr.

    PubMed

    Ahi, Ehsan Pashay; Kapralova, Kalina Hristova; Pálsson, Arnar; Maier, Valerie Helene; Gudbrandsson, Jóhannes; Snorrason, Sigurdur S; Jónsson, Zophonías O; Franzdóttir, Sigrídur Rut

    2014-01-01

    Understanding the molecular basis of craniofacial variation can provide insights into key developmental mechanisms of adaptive changes and their role in trophic divergence and speciation. Arctic charr (Salvelinus alpinus) is a polymorphic fish species, and, in Lake Thingvallavatn in Iceland, four sympatric morphs have evolved distinct craniofacial structures. We conducted a gene expression study on candidates from a conserved gene coexpression network, focusing on the development of craniofacial elements in embryos of two contrasting Arctic charr morphotypes (benthic and limnetic). Four Arctic charr morphs were studied: one limnetic and two benthic morphs from Lake Thingvallavatn and a limnetic reference aquaculture morph. The presence of morphological differences at developmental stages before the onset of feeding was verified by morphometric analysis. Following up on our previous findings that Mmp2 and Sparc were differentially expressed between morphotypes, we identified a network of genes with conserved coexpression across diverse vertebrate species. A comparative expression study of candidates from this network in developing heads of the four Arctic charr morphs verified the coexpression relationship of these genes and revealed distinct transcriptional dynamics strongly correlated with contrasting craniofacial morphologies (benthic versus limnetic). A literature review and Gene Ontology analysis indicated that a significant proportion of the network genes play a role in extracellular matrix organization and skeletogenesis, and motif enrichment analysis of conserved noncoding regions of network candidates predicted a handful of transcription factors, including Ap1 and Ets2, as potential regulators of the gene network. The expression of Ets2 itself was also found to associate with network gene expression. Genes linked to glucocorticoid signalling were also studied, as both Mmp2 and Sparc are responsive to this pathway. Among those, several transcriptional

  1. Transcriptome analysis of an apple (Malus × domestica) yellow fruit somatic mutation identifies a gene network module highly associated with anthocyanin and epigenetic regulation.

    PubMed

    El-Sharkawy, Islam; Liang, Dong; Xu, Kenong

    2015-12-01

    Using RNA-seq, this study analysed an apple (Malus×domestica) anthocyanin-deficient yellow-skin somatic mutant 'Blondee' (BLO) and its red-skin parent 'Kidd's D-8' (KID), the original name of 'Gala', to understand the molecular mechanisms underlying the mutation. A total of 3299 differentially expressed genes (DEGs) were identified between BLO and KID at four developmental stages and/or between two adjacent stages within BLO and/or KID. A weighted gene co-expression network analysis (WGCNA) of the DEGs uncovered a network module of 34 genes highly correlated (r=0.95, P=9.0×10(-13)) with anthocyanin contents. Although 12 of the 34 genes in the WGCNA module were characterized and known of roles in anthocyanin, the remainder 22 appear to be novel. Examining the expression of ten representative genes in the module in 14 diverse apples revealed that at least eight were significantly correlated with anthocyanin variation. MdMYB10 (MDP0000259614) and MdGST (MDP0000252292) were among the most suppressed module member genes in BLO despite being undistinguishable in their corresponding sequences between BLO and KID. Methylation assay of MdMYB10 and MdGST in fruit skin revealed that two regions (MR3 and MR7) in the MdMYB10 promoter exhibited remarkable differences between BLO and KID. In particular, methylation was high and progressively increased alongside fruit development in BLO while was correspondingly low and constant in KID. The methylation levels in both MR3 and MR7 were negatively correlated with anthocyanin content as well as the expression of MdMYB10 and MdGST. Clearly, the collective repression of the 34 genes explains the loss-of-colour in BLO while the methylation in MdMYB10 promoter is likely causal for the mutation. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  2. Memory functions reveal structural properties of gene regulatory networks

    PubMed Central

    Perez-Carrasco, Ruben

    2018-01-01

    Gene regulatory networks (GRNs) control cellular function and decision making during tissue development and homeostasis. Mathematical tools based on dynamical systems theory are often used to model these networks, but the size and complexity of these models mean that their behaviour is not always intuitive and the underlying mechanisms can be difficult to decipher. For this reason, methods that simplify and aid exploration of complex networks are necessary. To this end we develop a broadly applicable form of the Zwanzig-Mori projection. By first converting a thermodynamic state ensemble model of gene regulation into mass action reactions we derive a general method that produces a set of time evolution equations for a subset of components of a network. The influence of the rest of the network, the bulk, is captured by memory functions that describe how the subnetwork reacts to its own past state via components in the bulk. These memory functions provide probes of near-steady state dynamics, revealing information not easily accessible otherwise. We illustrate the method on a simple cross-repressive transcriptional motif to show that memory functions not only simplify the analysis of the subnetwork but also have a natural interpretation. We then apply the approach to a GRN from the vertebrate neural tube, a well characterised developmental transcriptional network composed of four interacting transcription factors. The memory functions reveal the function of specific links within the neural tube network and identify features of the regulatory structure that specifically increase the robustness of the network to initial conditions. Taken together, the study provides evidence that Zwanzig-Mori projections offer powerful and effective tools for simplifying and exploring the behaviour of GRNs. PMID:29470492

  3. Continuous time Bayesian networks identify Prdm1 as a negative regulator of TH17 cell differentiation in humans

    PubMed Central

    Acerbi, Enzo; Viganò, Elena; Poidinger, Michael; Mortellaro, Alessandra; Zelante, Teresa; Stella, Fabio

    2016-01-01

    T helper 17 (TH17) cells represent a pivotal adaptive cell subset involved in multiple immune disorders in mammalian species. Deciphering the molecular interactions regulating TH17 cell differentiation is particularly critical for novel drug target discovery designed to control maladaptive inflammatory conditions. Using continuous time Bayesian networks over a time-course gene expression dataset, we inferred the global regulatory network controlling TH17 differentiation. From the network, we identified the Prdm1 gene encoding the B lymphocyte-induced maturation protein 1 as a crucial negative regulator of human TH17 cell differentiation. The results have been validated by perturbing Prdm1 expression on freshly isolated CD4+ naïve T cells: reduction of Prdm1 expression leads to augmentation of IL-17 release. These data unravel a possible novel target to control TH17 polarization in inflammatory disorders. Furthermore, this study represents the first in vitro validation of continuous time Bayesian networks as gene network reconstruction method and as hypothesis generation tool for wet-lab biological experiments. PMID:26976045

  4. A Modularity-Based Method Reveals Mixed Modules from Chemical-Gene Heterogeneous Network

    PubMed Central

    Song, Jianglong; Tang, Shihuan; Liu, Xi; Gao, Yibo; Yang, Hongjun; Lu, Peng

    2015-01-01

    For a multicomponent therapy, molecular network is essential to uncover its specific mode of action from a holistic perspective. The molecular system of a Traditional Chinese Medicine (TCM) formula can be represented by a 2-class heterogeneous network (2-HN), which typically includes chemical similarities, chemical-target interactions and gene interactions. An important premise of uncovering the molecular mechanism is to identify mixed modules from complex chemical-gene heterogeneous network of a TCM formula. We thus proposed a novel method (MixMod) based on mixed modularity to detect accurate mixed modules from 2-HNs. At first, we compared MixMod with Clauset-Newman-Moore algorithm (CNM), Markov Cluster algorithm (MCL), Infomap and Louvain on benchmark 2-HNs with known module structure. Results showed that MixMod was superior to other methods when 2-HNs had promiscuous module structure. Then these methods were tested on a real drug-target network, in which 88 disease clusters were regarded as real modules. MixMod could identify the most accurate mixed modules from the drug-target 2-HN (normalized mutual information 0.62 and classification accuracy 0.4524). In the end, MixMod was applied to the 2-HN of Buchang naoxintong capsule (BNC) and detected 49 mixed modules. By using enrichment analysis, we investigated five mixed modules that contained primary constituents of BNC intestinal absorption liquid. As a matter of fact, the findings of in vitro experiments using BNC intestinal absorption liquid were found to highly accord with previous analysis. Therefore, MixMod is an effective method to detect accurate mixed modules from chemical-gene heterogeneous networks and further uncover the molecular mechanism of multicomponent therapies, especially TCM formulae. PMID:25927435

  5. An analysis of the gene interaction networks identifying the role of PARP1 in metastasis of non-small cell lung cancer.

    PubMed

    Chen, Kai; Li, Yajie; Xu, Hui; Zhang, Chunfeng; Li, Zhiqiang; Wang, Wei; Wang, Baofeng

    2017-10-20

    Though there were many researches about the effects of cancer cells on non-small cell lung cancer (NSCLC) currently, it has been rarely reported completed oncogene and its mechanism in tumors by far. Here, we used biological methods with known oncogene of NSCLC to find new oncogene and explore its functionary mechanism in NSCLC. The study firstly built NSCLC genetic interaction network based on bioinformatics methods and then combined shortest path algorithm with significance test to confirmed core genes that were closely involved with given genes; real-time qPCR was conducted to detect expression levels between patients with NSCLC and normal people; additionally, detection of PARP1's role in migration and invasion was performed by trans-well assays and wound-healing. Through gene interaction network, it was found that, core genes like PARP1, EGFR and ALK had a direct interaction. TCGA database showed that PARP1 presented strong expression in NSCLC and the expression level of metastatic NSCLC was significantly higher than that of non-metastatic NSCLC. Cell migration of NSCLC in accordance to the scratch test was suppressed by PARP1 silence but stimulated noticeably by PARP1 overexpression. According to Kaplan-meier survival curve, the higher PARP1 expression, the poorer patient survival rate and prognosis. Thus, PARP1 expression had a negative correction with patient survival rate and prognosis. New oncogene PARP1 was found from known NSCLC oncogene in terms of gene interaction network, demonstrating PARP1's impact on NSCLC cell migration.

  6. Identification of critical regulatory genes in cancer signaling network using controllability analysis

    NASA Astrophysics Data System (ADS)

    Ravindran, Vandana; Sunitha, V.; Bagler, Ganesh

    2017-05-01

    Cancer is characterized by a complex web of regulatory mechanisms which makes it difficult to identify features that are central to its control. Molecular integrative models of cancer, generated with the help of data from experimental assays, facilitate use of control theory to probe for ways of controlling the state of such a complex dynamic network. We modeled the human cancer signaling network as a directed graph and analyzed it for its controllability, identification of driver nodes and their characterization. We identified the driver nodes using the maximum matching algorithm and classified them as backbone, peripheral and ordinary based on their role in regulatory interactions and control of the network. We found that the backbone driver nodes were key to driving the regulatory network into cancer phenotype (via mutations) as well as for steering into healthy phenotype (as drug targets). This implies that while backbone genes could lead to cancer by virtue of mutations, they are also therapeutic targets of cancer. Further, based on their impact on the size of the set of driver nodes, genes were characterized as indispensable, dispensable and neutral. Indispensable nodes within backbone of the network emerged as central to regulatory mechanisms of control of cancer. In addition to probing the cancer signaling network from the perspective of control, our findings suggest that indispensable backbone driver nodes could be potentially leveraged as therapeutic targets. This study also illustrates the application of structural controllability for studying the mechanisms underlying the regulation of complex diseases.

  7. Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks

    PubMed Central

    Lähdesmäki, Harri; Hautaniemi, Sampsa; Shmulevich, Ilya; Yli-Harja, Olli

    2006-01-01

    A significant amount of attention has recently been focused on modeling of gene regulatory networks. Two frequently used large-scale modeling frameworks are Bayesian networks (BNs) and Boolean networks, the latter one being a special case of its recent stochastic extension, probabilistic Boolean networks (PBNs). PBN is a promising model class that generalizes the standard rule-based interactions of Boolean networks into the stochastic setting. Dynamic Bayesian networks (DBNs) is a general and versatile model class that is able to represent complex temporal stochastic processes and has also been proposed as a model for gene regulatory systems. In this paper, we concentrate on these two model classes and demonstrate that PBNs and a certain subclass of DBNs can represent the same joint probability distribution over their common variables. The major benefit of introducing the relationships between the models is that it opens up the possibility of applying the standard tools of DBNs to PBNs and vice versa. Hence, the standard learning tools of DBNs can be applied in the context of PBNs, and the inference methods give a natural way of handling the missing values in PBNs which are often present in gene expression measurements. Conversely, the tools for controlling the stationary behavior of the networks, tools for projecting networks onto sub-networks, and efficient learning schemes can be used for DBNs. In other words, the introduced relationships between the models extend the collection of analysis tools for both model classes. PMID:17415411

  8. Reveal genes functionally associated with ACADS by a network study.

    PubMed

    Chen, Yulong; Su, Zhiguang

    2015-09-15

    Establishing a systematic network is aimed at finding essential human gene-gene/gene-disease pathway by means of network inter-connecting patterns and functional annotation analysis. In the present study, we have analyzed functional gene interactions of short-chain acyl-coenzyme A dehydrogenase gene (ACADS). ACADS plays a vital role in free fatty acid β-oxidation and regulates energy homeostasis. Modules of highly inter-connected genes in disease-specific ACADS network are derived by integrating gene function and protein interaction data. Among the 8 genes in ACADS web retrieved from both STRING and GeneMANIA, ACADS is effectively conjoined with 4 genes including HAHDA, HADHB, ECHS1 and ACAT1. The functional analysis is done via ontological briefing and candidate disease identification. We observed that the highly efficient-interlinked genes connected with ACADS are HAHDA, HADHB, ECHS1 and ACAT1. Interestingly, the ontological aspect of genes in the ACADS network reveals that ACADS, HAHDA and HADHB play equally vital roles in fatty acid metabolism. The gene ACAT1 together with ACADS indulges in ketone metabolism. Our computational gene web analysis also predicts potential candidate disease recognition, thus indicating the involvement of ACADS, HAHDA, HADHB, ECHS1 and ACAT1 not only with lipid metabolism but also with infant death syndrome, skeletal myopathy, acute hepatic encephalopathy, Reye-like syndrome, episodic ketosis, and metabolic acidosis. The current study presents a comprehensible layout of ACADS network, its functional strategies and candidate disease approach associated with ACADS network. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. NDRC: A Disease-Causing Genes Prioritized Method Based on Network Diffusion and Rank Concordance.

    PubMed

    Fang, Minghong; Hu, Xiaohua; Wang, Yan; Zhao, Junmin; Shen, Xianjun; He, Tingting

    2015-07-01

    Disease-causing genes prioritization is very important to understand disease mechanisms and biomedical applications, such as design of drugs. Previous studies have shown that promising candidate genes are mostly ranked according to their relatedness to known disease genes or closely related disease genes. Therefore, a dangling gene (isolated gene) with no edges in the network can not be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved, and it is able to rank the true causal gene first in 849 of all 2542 cases. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN, DADA and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.

  10. Identification of neuron-related genes for cell therapy of neurological disorders by network analysis.

    PubMed

    Su, Li-Ning; Song, Xiao-Qing; Wei, Hui-Ping; Yin, Hai-Feng

    Bone mesenchymal stem cells (BMSCs) differentiated into neurons have been widely proposed for use in cell therapy of many neurological disorders. It is therefore important to understand the molecular mechanisms underlying this differentiation. We screened differentially expressed genes between immature neural tissues and untreated BMSCs to identify the genes responsible for neuronal differentiation from BMSCs. GSE68243 gene microarray data of rat BMSCs and GSE18860 gene microarray data of rat neurons were received from the Gene Expression Omnibus database. Transcriptome Analysis Console software showed that 1248 genes were up-regulated and 1273 were down-regulated in neurons compared with BMSCs. Gene Ontology functional enrichment, protein-protein interaction networks, functional modules, and hub genes were analyzed using DAVID, STRING 10, BiNGO tool, and Network Analyzer software, revealing that nine hub genes, Nrcam, Sema3a, Mapk8, Dlg4, Slit1, Creb1, Ntrk2, Cntn2, and Pax6, may play a pivotal role in neuronal differentiation from BMSCs. Seven genes, Dcx, Nrcam, sema3a, Cntn2, Slit1, Ephb1, and Pax6, were shown to be hub nodes within the neuronal development network, while six genes, Fgf2, Tgfβ1, Vegfa, Serpine1, Il6, and Stat1, appeared to play an important role in suppressing neuronal differentiation. However, additional studies are required to confirm these results.

  11. Identifying PHM market and network opportunities.

    PubMed

    Grube, Mark E; Krishnaswamy, Anand; Poziemski, John; York, Robert W

    2015-11-01

    Two key processes for healthcare organizations seeking to assume a financially sustainable role in population health management (PHM), after laying the groundwork for the effort, are to identify potential PHM market opportunities and determine the scope of the PHM network. Key variables organizations should consider with respect to market opportunities include the patient population, the overall insurance/employer market, and available types of insurance products. Regarding the network's scope, organizations should consider both traditional strategic criteria for a viable network and at least five additional criteria: network essentiality and PHM care continuum, network adequacy, service distribution right-sizing, network growth strategy, and organizational agility.

  12. Integrated Analysis of Mutation Data from Various Sources Identifies Key Genes and Signaling Pathways in Hepatocellular Carcinoma

    PubMed Central

    Wei, Lin; Tang, Ruqi; Lian, Baofeng; Zhao, Yingjun; He, Xianghuo; Xie, Lu

    2014-01-01

    Background Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored. Principal Findings In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features. Conclusions Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers. PMID:24988079

  13. Integrated analysis of mutation data from various sources identifies key genes and signaling pathways in hepatocellular carcinoma.

    PubMed

    Zhang, Yuannv; Qiu, Zhaoping; Wei, Lin; Tang, Ruqi; Lian, Baofeng; Zhao, Yingjun; He, Xianghuo; Xie, Lu

    2014-01-01

    Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored. In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features. Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers.

  14. Utilizing Gene Tree Variation to Identify Candidate Effector Genes in Zymoseptoria tritici

    PubMed Central

    McDonald, Megan C.; McGinness, Lachlan; Hane, James K.; Williams, Angela H.; Milgate, Andrew; Solomon, Peter S.

    2016-01-01

    Zymoseptoria tritici is a host-specific, necrotrophic pathogen of wheat. Infection by Z. tritici is characterized by its extended latent period, which typically lasts 2 wks, and is followed by extensive host cell death, and rapid proliferation of fungal biomass. This work characterizes the level of genomic variation in 13 isolates, for which we have measured virulence on 11 wheat cultivars with differential resistance genes. Between the reference isolate, IPO323, and the 13 Australian isolates we identified over 800,000 single nucleotide polymorphisms, of which ∼10% had an effect on the coding regions of the genome. Furthermore, we identified over 1700 probable presence/absence polymorphisms in genes across the Australian isolates using de novo assembly. Finally, we developed a gene tree sorting method that quickly identifies groups of isolates within a single gene alignment whose sequence haplotypes correspond with virulence scores on a single wheat cultivar. Using this method, we have identified < 100 candidate effector genes whose gene sequence correlates with virulence toward a wheat cultivar carrying a major resistance gene. PMID:26837952

  15. VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data

    PubMed Central

    Jia, Peilin; Zhao, Zhongming

    2014-01-01

    A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data. PMID:24516372

  16. VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

    PubMed

    Jia, Peilin; Zhao, Zhongming

    2014-02-01

    A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.

  17. A network analysis of the Chinese medicine Lianhua-Qingwen formula to identify its main effective components.

    PubMed

    Wang, Chun-Hua; Zhong, Yi; Zhang, Yan; Liu, Jin-Ping; Wang, Yue-Fei; Jia, Wei-Na; Wang, Guo-Cai; Li, Zheng; Zhu, Yan; Gao, Xiu-Mei

    2016-02-01

    Chinese medicine is known to treat complex diseases with multiple components and multiple targets. However, the main effective components and their related key targets and functions remain to be identified. Herein, a network analysis method was developed to identify the main effective components and key targets of a Chinese medicine, Lianhua-Qingwen Formula (LQF). The LQF is commonly used for the prevention and treatment of viral influenza in China. It is composed of 11 herbs, gypsum and menthol with 61 compounds being identified in our previous work. In this paper, these 61 candidate compounds were used to find their related targets and construct the predicted-target (PT) network. An influenza-related protein-protein interaction (PPI) network was constructed and integrated with the PT network. Then the compound-effective target (CET) network and compound-ineffective target network (CIT) were extracted, respectively. A novel approach was developed to identify effective components by comparing CET and CIT networks. As a result, 15 main effective components were identified along with 61 corresponding targets. 7 of these main effective components were further experimentally validated to have antivirus efficacy in vitro. The main effective component-target (MECT) network was further constructed with main effective components and their key targets. Gene Ontology (GO) analysis of the MECT network predicted key functions such as NO production being modulated by the LQF. Interestingly, five effective components were experimentally tested and exhibited inhibitory effects on NO production in the LPS induced RAW 264.7 cell. In summary, we have developed a novel approach to identify the main effective components in a Chinese medicine LQF and experimentally validated some of the predictions.

  18. Reverse engineering and analysis of large genome-scale gene networks

    PubMed Central

    Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

    2013-01-01

    Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249

  19. Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

    PubMed

    Gong, Wuming; Koyano-Nakagawa, Naoko; Li, Tongbin; Garry, Daniel J

    2015-03-07

    Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion. We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10(-100)), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2

  20. Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules.

    PubMed

    Xiao, Xiaolin; Moreno-Moral, Aida; Rotival, Maxime; Bottolo, Leonardo; Petretto, Enrico

    2014-01-01

    Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co

  1. Prioritizing chronic obstructive pulmonary disease (COPD) candidate genes in COPD-related networks

    PubMed Central

    Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina

    2017-01-01

    Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD. PMID:29262568

  2. Prioritizing chronic obstructive pulmonary disease (COPD) candidate genes in COPD-related networks.

    PubMed

    Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina

    2017-11-28

    Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD.

  3. Broad Integration of Expression Maps and Co-Expression Networks Compassing Novel Gene Functions in the Brain

    PubMed Central

    Okamura-Oho, Yuko; Shimokawa, Kazuro; Nishimura, Masaomi; Takemoto, Satoko; Sato, Akira; Furuichi, Teiichi; Yokota, Hideo

    2014-01-01

    Using a recently invented technique for gene expression mapping in the whole-anatomy context, termed transcriptome tomography, we have generated a dataset of 36,000 maps of overall gene expression in the adult-mouse brain. Here, using an informatics approach, we identified a broad co-expression network that follows an inverse power law and is rich in functional interaction and gene-ontology terms. Our framework for the integrated analysis of expression maps and graphs of co-expression networks revealed that groups of combinatorially expressed genes, which regulate cell differentiation during development, were present in the adult brain and each of these groups was associated with a discrete cell types. These groups included non-coding genes of unknown function. We found that these genes specifically linked developmentally conserved groups in the network. A previously unrecognized robust expression pattern covering the whole brain was related to the molecular anatomy of key biological processes occurring in particular areas. PMID:25382412

  4. Systems Nutrigenomics Reveals Brain Gene Networks Linking Metabolic and Brain Disorders.

    PubMed

    Meng, Qingying; Ying, Zhe; Noble, Emily; Zhao, Yuqi; Agrawal, Rahul; Mikhail, Andrew; Zhuang, Yumei; Tyagi, Ethika; Zhang, Qing; Lee, Jae-Hyung; Morselli, Marco; Orozco, Luz; Guo, Weilong; Kilts, Tina M; Zhu, Jun; Zhang, Bin; Pellegrini, Matteo; Xiao, Xinshu; Young, Marian F; Gomez-Pinilla, Fernando; Yang, Xia

    2016-05-01

    Nutrition plays a significant role in the increasing prevalence of metabolic and brain disorders. Here we employ systems nutrigenomics to scrutinize the genomic bases of nutrient-host interaction underlying disease predisposition or therapeutic potential. We conducted transcriptome and epigenome sequencing of hypothalamus (metabolic control) and hippocampus (cognitive processing) from a rodent model of fructose consumption, and identified significant reprogramming of DNA methylation, transcript abundance, alternative splicing, and gene networks governing cell metabolism, cell communication, inflammation, and neuronal signaling. These signals converged with genetic causal risks of metabolic, neurological, and psychiatric disorders revealed in humans. Gene network modeling uncovered the extracellular matrix genes Bgn and Fmod as main orchestrators of the effects of fructose, as validated using two knockout mouse models. We further demonstrate that an omega-3 fatty acid, DHA, reverses the genomic and network perturbations elicited by fructose, providing molecular support for nutritional interventions to counteract diet-induced metabolic and brain disorders. Our integrative approach complementing rodent and human studies supports the applicability of nutrigenomics principles to predict disease susceptibility and to guide personalized medicine. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  5. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  6. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods.

    PubMed

    Valentini, Giorgio; Paccanaro, Alberto; Caniza, Horacio; Romero, Alfonso E; Re, Matteo

    2014-06-01

    In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both

  7. Genome-wide gene by lead exposure interaction analysis identifies UNC5D as a candidate gene for neurodevelopment.

    PubMed

    Wang, Zhaoxi; Claus Henn, Birgit; Wang, Chaolong; Wei, Yongyue; Su, Li; Sun, Ryan; Chen, Han; Wagner, Peter J; Lu, Quan; Lin, Xihong; Wright, Robert; Bellinger, David; Kile, Molly; Mazumdar, Maitreyi; Tellez-Rojo, Martha Maria; Schnaas, Lourdes; Christiani, David C

    2017-07-28

    Neurodevelopment is a complex process involving both genetic and environmental factors. Prenatal exposure to lead (Pb) has been associated with lower performance on neurodevelopmental tests. Adverse neurodevelopmental outcomes are more frequent and/or more severe when toxic exposures interact with genetic susceptibility. To explore possible loci associated with increased susceptibility to prenatal Pb exposure, we performed a genome-wide gene-environment interaction study (GWIS) in young children from Mexico (n = 390) and Bangladesh (n = 497). Prenatal Pb exposure was estimated by cord blood Pb concentration. Neurodevelopment was assessed using the Bayley Scales of Infant Development. We identified a locus on chromosome 8, containing UNC5D, and demonstrated evidence of its genome-wide significance with mental composite scores (rs9642758, p meta  = 4.35 × 10 -6 ). Within this locus, the joint effects of two independent single nucleotide polymorphisms (SNPs, rs9642758 and rs10503970) had a p-value of 4.38 × 10 -9 for mental composite scores. Correlating GWIS results with in vitro transcriptomic profiles identified one common gene, SLC1A5, which is involved in synaptic function, neuronal development, and excitotoxicity. Further analysis revealed interconnected interactions that formed a large network of 52 genes enriched with oxidative stress genes and neurodevelopmental genes. Our findings suggest that certain genetic polymorphisms within/near genes relevant to neurodevelopment might modify the toxic effects of Pb exposure via oxidative stress.

  8. Gene essentiality and the topology of protein interaction networks

    PubMed Central

    Coulomb, Stéphane; Bauer, Michel; Bernard, Denis; Marsolier-Kergoat, Marie-Claude

    2005-01-01

    The mechanistic bases for gene essentiality and for cell mutational resistance have long been disputed. The recent availability of large protein interaction databases has fuelled the analysis of protein interaction networks and several authors have proposed that gene dispensability could be strongly related to some topological parameters of these networks. However, many results were based on protein interaction data whose biases were not taken into account. In this article, we show that the essentiality of a gene in yeast is poorly related to the number of interactants (or degree) of the corresponding protein and that the physiological consequences of gene deletions are unrelated to several other properties of proteins in the interaction networks, such as the average degrees of their nearest neighbours, their clustering coefficients or their relative distances. We also found that yeast protein interaction networks lack degree correlation, i.e. a propensity for their vertices to associate according to their degrees. Gene essentiality and more generally cell resistance against mutations thus seem largely unrelated to many parameters of protein network topology. PMID:16087428

  9. The Double-Stranded DNA Virosphere as a Modular Hierarchical Network of Gene Sharing

    PubMed Central

    Iranzo, Jaime

    2016-01-01

    ABSTRACT Virus genomes are prone to extensive gene loss, gain, and exchange and share no universal genes. Therefore, in a broad-scale study of virus evolution, gene and genome network analyses can complement traditional phylogenetics. We performed an exhaustive comparative analysis of the genomes of double-stranded DNA (dsDNA) viruses by using the bipartite network approach and found a robust hierarchical modularity in the dsDNA virosphere. Bipartite networks consist of two classes of nodes, with nodes in one class, in this case genomes, being connected via nodes of the second class, in this case genes. Such a network can be partitioned into modules that combine nodes from both classes. The bipartite network of dsDNA viruses includes 19 modules that form 5 major and 3 minor supermodules. Of these modules, 11 include tailed bacteriophages, reflecting the diversity of this largest group of viruses. The module analysis quantitatively validates and refines previously proposed nontrivial evolutionary relationships. An expansive supermodule combines the large and giant viruses of the putative order “Megavirales” with diverse moderate-sized viruses and related mobile elements. All viruses in this supermodule share a distinct morphogenetic tool kit with a double jelly roll major capsid protein. Herpesviruses and tailed bacteriophages comprise another supermodule, held together by a distinct set of morphogenetic proteins centered on the HK97-like major capsid protein. Together, these two supermodules cover the great majority of currently known dsDNA viruses. We formally identify a set of 14 viral hallmark genes that comprise the hubs of the network and account for most of the intermodule connections. PMID:27486193

  10. A stele-enriched gene regulatory network in the Arabidopsis root

    PubMed Central

    Brady, Siobhan M; Zhang, Lifang; Megraw, Molly; Martinez, Natalia J; Jiang, Eric; Yi, Charles S; Liu, Weilin; Zeng, Anna; Taylor-Teeples, Mallorie; Kim, Dahae; Ahnert, Sebastian; Ohler, Uwe; Ware, Doreen; Walhout, Albertha J M; Benfey, Philip N

    2011-01-01

    Tightly controlled gene expression is a hallmark of multicellular development and is accomplished by transcription factors (TFs) and microRNAs (miRNAs). Although many studies have focused on identifying downstream targets of these molecules, less is known about the factors that regulate their differential expression. We used data from high spatial resolution gene expression experiments and yeast one-hybrid (Y1H) and two-hybrid (Y2H) assays to delineate a subset of interactions occurring within a gene regulatory network (GRN) that determines tissue-specific TF and miRNA expression in plants. We find that upstream TFs are expressed in more diverse cell types than their targets and that promoters that are bound by a relatively large number of TFs correspond to key developmental regulators. The regulatory consequence of many TFs for their target was experimentally determined using genetic analysis. Remarkably, molecular phenotypes were identified for 65% of the TFs, but morphological phenotypes were associated with only 16%. This indicates that the GRN is robust, and that gene expression changes may be canalized or buffered. PMID:21245844

  11. Coalitional game theory as a promising approach to identify candidate autism genes.

    PubMed

    Gupta, Anika; Sun, Min Woo; Paskov, Kelley Marie; Stockham, Nate Tyler; Jung, Jae-Yoon; Wall, Dennis Paul

    2018-01-01

    Despite mounting evidence for the strong role of genetics in the phenotypic manifestation of Autism Spectrum Disorder (ASD), the specific genes responsible for the variable forms of ASD remain undefined. ASD may be best explained by a combinatorial genetic model with varying epistatic interactions across many small effect mutations. Coalitional or cooperative game theory is a technique that studies the combined effects of groups of players, known as coalitions, seeking to identify players who tend to improve the performance--the relationship to a specific disease phenotype--of any coalition they join. This method has been previously shown to boost biologically informative signal in gene expression data but to-date has not been applied to the search for cooperative mutations among putative ASD genes. We describe our approach to highlight genes relevant to ASD using coalitional game theory on alteration data of 1,965 fully sequenced genomes from 756 multiplex families. Alterations were encoded into binary matrices for ASD (case) and unaffected (control) samples, indicating likely gene-disrupting, inherited mutations in altered genes. To determine individual gene contributions given an ASD phenotype, a "player" metric, referred to as the Shapley value, was calculated for each gene in the case and control cohorts. Sixty seven genes were found to have significantly elevated player scores and likely represent significant contributors to the genetic coordination underlying ASD. Using network and cross-study analysis, we found that these genes are involved in biological pathways known to be affected in the autism cases and that a subset directly interact with several genes known to have strong associations to autism. These findings suggest that coalitional game theory can be applied to large-scale genomic data to identify hidden yet influential players in complex polygenic disorders such as autism.

  12. Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways.

    PubMed

    Obayashi, Takeshi; Kinoshita, Kengo

    2010-05-01

    Gene coexpression analyses are a powerful method to predict the function of genes and/or to identify genes that are functionally related to query genes. The basic idea of gene coexpression analyses is that genes with similar functions should have similar expression patterns under many different conditions. This approach is now widely used by many experimental researchers, especially in the field of plant biology. In this review, we will summarize recent successful examples obtained by using our gene coexpression database, ATTED-II. Specifically, the examples will describe the identification of new genes, such as the subunits of a complex protein, the enzymes in a metabolic pathway and transporters. In addition, we will discuss the discovery of a new intercellular signaling factor and new regulatory relationships between transcription factors and their target genes. In ATTED-II, we provide two basic views of gene coexpression, a gene list view and a gene network view, which can be used as guide gene approach and narrow-down approach, respectively. In addition, we will discuss the coexpression effectiveness for various types of gene sets.

  13. Building gene co-expression networks using transcriptomics data for systems biology investigations: Comparison of methods using microarray data

    PubMed Central

    Kadarmideen, Haja N; Watson-haigh, Nathan S

    2012-01-01

    Gene co-expression networks (GCN), built using high-throughput gene expression data are fundamental aspects of systems biology. The main aims of this study were to compare two popular approaches to building and analysing GCN. We use real ovine microarray transcriptomics datasets representing four different treatments with Metyrapone, an inhibitor of cortisol biosynthesis. We conducted several microarray quality control checks before applying GCN methods to filtered datasets. Then we compared the outputs of two methods using connectivity as a criterion, as it measures how well a node (gene) is connected within a network. The two GCN construction methods used were, Weighted Gene Co-expression Network Analysis (WGCNA) and Partial Correlation and Information Theory (PCIT) methods. Nodes were ranked based on their connectivity measures in each of the four different networks created by WGCNA and PCIT and node ranks in two methods were compared to identify those nodes which are highly differentially ranked (HDR). A total of 1,017 HDR nodes were identified across one or more of four networks. We investigated HDR nodes by gene enrichment analyses in relation to their biological relevance to phenotypes. We observed that, in contrast to WGCNA method, PCIT algorithm removes many of the edges of the most highly interconnected nodes. Removal of edges of most highly connected nodes or hub genes will have consequences for downstream analyses and biological interpretations. In general, for large GCN construction (with > 20000 genes) access to large computer clusters, particularly those with larger amounts of shared memory is recommended. PMID:23144540

  14. Integration of multi-omics data for integrative gene regulatory network inference.

    PubMed

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun; Kang, Mingon

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.

  15. Integration of multi-omics data for integrative gene regulatory network inference

    PubMed Central

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called ‘multi-omics data’, that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN’s capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed. PMID:29354189

  16. Multi-Dimensional Prioritization of Dental Caries Candidate Genes and Its Enriched Dense Network Modules

    PubMed Central

    Wang, Quan; Jia, Peilin; Cuenco, Karen T.; Feingold, Eleanor; Marazita, Mary L.; Wang, Lily; Zhao, Zhongming

    2013-01-01

    A number of genetic studies have suggested numerous susceptibility genes for dental caries over the past decade with few definite conclusions. The rapid accumulation of relevant information, along with the complex architecture of the disease, provides a challenging but also unique opportunity to review and integrate the heterogeneous data for follow-up validation and exploration. In this study, we collected and curated candidate genes from four major categories: association studies, linkage scans, gene expression analyses, and literature mining. Candidate genes were prioritized according to the magnitude of evidence related to dental caries. We then searched for dense modules enriched with the prioritized candidate genes through their protein-protein interactions (PPIs). We identified 23 modules comprising of 53 genes. Functional analyses of these 53 genes revealed three major clusters: cytokine network relevant genes, matrix metalloproteinases (MMPs) family, and transforming growth factor-beta (TGF-β) family, all of which have been previously implicated to play important roles in tooth development and carious lesions. Through our extensive data collection and an integrative application of gene prioritization and PPI network analyses, we built a dental caries-specific sub-network for the first time. Our study provided insights into the molecular mechanisms underlying dental caries. The framework we proposed in this work can be applied to other complex diseases. PMID:24146904

  17. Paper-based Synthetic Gene Networks

    PubMed Central

    Pardee, Keith; Green, Alexander A.; Ferrante, Tom; Cameron, D. Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J.

    2014-01-01

    Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides a new venue for synthetic biologists to operate, and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze-dried onto paper, enabling the inexpensive, sterile and abiotic distribution of synthetic biology-based technologies for the clinic, global health, industry, research and education. For field use, we create circuits with colorimetric outputs for detection by eye, and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors. PMID:25417167

  18. Paper-based synthetic gene networks.

    PubMed

    Pardee, Keith; Green, Alexander A; Ferrante, Tom; Cameron, D Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J

    2014-11-06

    Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides an alternate, versatile venue for synthetic biologists to operate and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze dried onto paper, enabling the inexpensive, sterile, and abiotic distribution of synthetic-biology-based technologies for the clinic, global health, industry, research, and education. For field use, we create circuits with colorimetric outputs for detection by eye and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small-molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors.

  19. Random walks on mutual microRNA-target gene interaction network improve the prediction of disease-associated microRNAs.

    PubMed

    Le, Duc-Hau; Verbeke, Lieven; Son, Le Hoang; Chu, Dinh-Toi; Pham, Van-Huy

    2017-11-14

    MicroRNAs (miRNAs) have been shown to play an important role in pathological initiation, progression and maintenance. Because identification in the laboratory of disease-related miRNAs is not straightforward, numerous network-based methods have been developed to predict novel miRNAs in silico. Homogeneous networks (in which every node is a miRNA) based on the targets shared between miRNAs have been widely used to predict their role in disease phenotypes. Although such homogeneous networks can predict potential disease-associated miRNAs, they do not consider the roles of the target genes of the miRNAs. Here, we introduce a novel method based on a heterogeneous network that not only considers miRNAs but also the corresponding target genes in the network model. Instead of constructing homogeneous miRNA networks, we built heterogeneous miRNA networks consisting of both miRNAs and their target genes, using databases of known miRNA-target gene interactions. In addition, as recent studies demonstrated reciprocal regulatory relations between miRNAs and their target genes, we considered these heterogeneous miRNA networks to be undirected, assuming mutual miRNA-target interactions. Next, we introduced a novel method (RWRMTN) operating on these mutual heterogeneous miRNA networks to rank candidate disease-related miRNAs using a random walk with restart (RWR) based algorithm. Using both known disease-associated miRNAs and their target genes as seed nodes, the method can identify additional miRNAs involved in the disease phenotype. Experiments indicated that RWRMTN outperformed two existing state-of-the-art methods: RWRMDA, a network-based method that also uses a RWR on homogeneous (rather than heterogeneous) miRNA networks, and RLSMDA, a machine learning-based method. Interestingly, we could relate this performance gain to the emergence of "disease modules" in the heterogeneous miRNA networks used as input for the algorithm. Moreover, we could demonstrate that RWRMTN is stable

  20. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes

    PubMed Central

    2013-01-01

    Background MicroRNAs (miRNAs) are important post-transcriptional regulators that have been demonstrated to play an important role in human diseases. Elucidating the associations between miRNAs and diseases at the systematic level will deepen our understanding of the molecular mechanisms of diseases. However, miRNA-disease associations identified by previous computational methods are far from completeness and more effort is needed. Results We developed a computational framework to identify miRNA-disease associations by performing random walk analysis, and focused on the functional link between miRNA targets and disease genes in protein-protein interaction (PPI) networks. Furthermore, a bipartite miRNA-disease network was constructed, from which several miRNA-disease co-regulated modules were identified by hierarchical clustering analysis. Our approach achieved satisfactory performance in identifying known cancer-related miRNAs for nine human cancers with an area under the ROC curve (AUC) ranging from 71.3% to 91.3%. By systematically analyzing the global properties of the miRNA-disease network, we found that only a small number of miRNAs regulated genes involved in various diseases, genes associated with neurological diseases were preferentially regulated by miRNAs and some immunological diseases were associated with several specific miRNAs. We also observed that most diseases in the same co-regulated module tended to belong to the same disease category, indicating that these diseases might share similar miRNA regulatory mechanisms. Conclusions In this study, we present a computational framework to identify miRNA-disease associations, and further construct a bipartite miRNA-disease network for systematically analyzing the global properties of miRNA regulation of disease genes. Our findings provide a broad perspective on the relationships between miRNAs and diseases and could potentially aid future research efforts concerning miRNA involvement in disease pathogenesis

  1. Linking gene regulation and the exo-metabolome: A comparative transcriptomics approach to identify genes that impact on the production of volatile aroma compounds in yeast

    PubMed Central

    Rossouw, Debra; Næs, Tormod; Bauer, Florian F

    2008-01-01

    Background 'Omics' tools provide novel opportunities for system-wide analysis of complex cellular functions. Secondary metabolism is an example of a complex network of biochemical pathways, which, although well mapped from a biochemical point of view, is not well understood with regards to its physiological roles and genetic and biochemical regulation. Many of the metabolites produced by this network such as higher alcohols and esters are significant aroma impact compounds in fermentation products, and different yeast strains are known to produce highly divergent aroma profiles. Here, we investigated whether we can predict the impact of specific genes of known or unknown function on this metabolic network by combining whole transcriptome and partial exo-metabolome analysis. Results For this purpose, the gene expression levels of five different industrial wine yeast strains that produce divergent aroma profiles were established at three different time points of alcoholic fermentation in synthetic wine must. A matrix of gene expression data was generated and integrated with the concentrations of volatile aroma compounds measured at the same time points. This relatively unbiased approach to the study of volatile aroma compounds enabled us to identify candidate genes for aroma profile modification. Five of these genes, namely YMR210W, BAT1, AAD10, AAD14 and ACS1 were selected for overexpression in commercial wine yeast, VIN13. Analysis of the data show a statistically significant correlation between the changes in the exo-metabome of the overexpressing strains and the changes that were predicted based on the unbiased alignment of transcriptomic and exo-metabolomic data. Conclusion The data suggest that a comparative transcriptomics and metabolomics approach can be used to identify the metabolic impacts of the expression of individual genes in complex systems, and the amenability of transcriptomic data to direct applications of biotechnological relevance. PMID:18990252

  2. Inferring Time-Varying Network Topologies from Gene Expression Data

    PubMed Central

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster—to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence. PMID:18309363

  3. Inferring time-varying network topologies from gene expression data.

    PubMed

    Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.

  4. Identifying Cancer Driver Genes Using Replication-Incompetent Retroviral Vectors

    PubMed Central

    Bii, Victor M.; Trobridge, Grant D.

    2016-01-01

    Identifying novel genes that drive tumor metastasis and drug resistance has significant potential to improve patient outcomes. High-throughput sequencing approaches have identified cancer genes, but distinguishing driver genes from passengers remains challenging. Insertional mutagenesis screens using replication-incompetent retroviral vectors have emerged as a powerful tool to identify cancer genes. Unlike replicating retroviruses and transposons, replication-incompetent retroviral vectors lack additional mutagenesis events that can complicate the identification of driver mutations from passenger mutations. They can also be used for almost any human cancer due to the broad tropism of the vectors. Replication-incompetent retroviral vectors have the ability to dysregulate nearby cancer genes via several mechanisms including enhancer-mediated activation of gene promoters. The integrated provirus acts as a unique molecular tag for nearby candidate driver genes which can be rapidly identified using well established methods that utilize next generation sequencing and bioinformatics programs. Recently, retroviral vector screens have been used to efficiently identify candidate driver genes in prostate, breast, liver and pancreatic cancers. Validated driver genes can be potential therapeutic targets and biomarkers. In this review, we describe the emergence of retroviral insertional mutagenesis screens using replication-incompetent retroviral vectors as a novel tool to identify cancer driver genes in different cancer types. PMID:27792127

  5. Weighted gene co-expression network analysis reveals potential genes involved in early metamorphosis process in sea cucumber Apostichopus japonicus.

    PubMed

    Li, Yongxin; Kikuchi, Mani; Li, Xueyan; Gao, Qionghua; Xiong, Zijun; Ren, Yandong; Zhao, Ruoping; Mao, Bingyu; Kondo, Mariko; Irie, Naoki; Wang, Wen

    2018-01-01

    Sea cucumbers, one main class of Echinoderms, have a very fast and drastic metamorphosis process during their development. However, the molecular basis under this process remains largely unknown. Here we systematically examined the gene expression profiles of Japanese common sea cucumber (Apostichopus japonicus) for the first time by RNA sequencing across 16 developmental time points from fertilized egg to juvenile stage. Based on the weighted gene co-expression network analysis (WGCNA), we identified 21 modules. Among them, MEdarkmagenta was highly expressed and correlated with the early metamorphosis process from late auricularia to doliolaria larva. Furthermore, gene enrichment and differentially expressed gene analysis identified several genes in the module that may play key roles in the metamorphosis process. Our results not only provide a molecular basis for experimentally studying the development and morphological complexity of sea cucumber, but also lay a foundation for improving its emergence rate. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Network-based Analysis of Genome Wide Association Data Provides Novel Candidate Genes for Lipid and Lipoprotein Traits*

    PubMed Central

    Sharma, Amitabh; Gulbahce, Natali; Pevzner, Samuel J.; Menche, Jörg; Ladenvall, Claes; Folkersen, Lasse; Eriksson, Per; Orho-Melander, Marju; Barabási, Albert-László

    2013-01-01

    Genome wide association studies (GWAS) identify susceptibility loci for complex traits, but do not identify particular genes of interest. Integration of functional and network information may help in overcoming this limitation and identifying new susceptibility loci. Using GWAS and comorbidity data, we present a network-based approach to predict candidate genes for lipid and lipoprotein traits. We apply a prediction pipeline incorporating interactome, co-expression, and comorbidity data to Global Lipids Genetics Consortium (GLGC) GWAS for four traits of interest, identifying phenotypically coherent modules. These modules provide insights regarding gene involvement in complex phenotypes with multiple susceptibility alleles and low effect sizes. To experimentally test our predictions, we selected four candidate genes and genotyped representative SNPs in the Malmö Diet and Cancer Cardiovascular Cohort. We found significant associations with LDL-C and total-cholesterol levels for a synonymous SNP (rs234706) in the cystathionine beta-synthase (CBS) gene (p = 1 × 10−5 and adjusted-p = 0.013, respectively). Further, liver samples taken from 206 patients revealed that patients with the minor allele of rs234706 had significant dysregulation of CBS (p = 0.04). Despite the known biological role of CBS in lipid metabolism, SNPs within the locus have not yet been identified in GWAS of lipoprotein traits. Thus, the GWAS-based Comorbidity Module (GCM) approach identifies candidate genes missed by GWAS studies, serving as a broadly applicable tool for the investigation of other complex disease phenotypes. PMID:23882023

  7. Relaxation rates of gene expression kinetics reveal the feedback signs of autoregulatory gene networks

    NASA Astrophysics Data System (ADS)

    Jia, Chen; Qian, Hong; Chen, Min; Zhang, Michael Q.

    2018-03-01

    The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.

  8. Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice.

    PubMed

    Ficklin, Stephen P; Feltus, F Alex

    2011-07-01

    One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.

  9. Identification of T1D susceptibility genes within the MHC region by combining protein interaction networks and SNP genotyping data

    PubMed Central

    Brorsson, C.; Hansen, N. T.; Lage, K.; Bergholdt, R.; Brunak, S.; Pociot, F.

    2009-01-01

    Aim To develop novel methods for identifying new genes that contribute to the risk of developing type 1 diabetes within the Major Histocompatibility Complex (MHC) region on chromosome 6, independently of the known linkage disequilibrium (LD) between human leucocyte antigen (HLA)-DRB1, -DQA1, -DQB1 genes. Methods We have developed a novel method that combines single nucleotide polymorphism (SNP) genotyping data with protein–protein interaction (ppi) networks to identify disease-associated network modules enriched for proteins encoded from the MHC region. Approximately 2500 SNPs located in the 4 Mb MHC region were analysed in 1000 affected offspring trios generated by the Type 1 Diabetes Genetics Consortium (T1DGC). The most associated SNP in each gene was chosen and genes were mapped to ppi networks for identification of interaction partners. The association testing and resulting interacting protein modules were statistically evaluated using permutation. Results A total of 151 genes could be mapped to nodes within the protein interaction network and their interaction partners were identified. Five protein interaction modules reached statistical significance using this approach. The identified proteins are well known in the pathogenesis of T1D, but the modules also contain additional candidates that have been implicated in β-cell development and diabetic complications. Conclusions The extensive LD within the MHC region makes it important to develop new methods for analysing genotyping data for identification of additional risk genes for T1D. Combining genetic data with knowledge about functional pathways provides new insight into mechanisms underlying T1D. PMID:19143816

  10. Identification of Linkages between EDCs in Personal Care Products and Breast Cancer through Data Integration Combined with Gene Network Analysis.

    PubMed

    Jeong, Hyeri; Kim, Jongwoon; Kim, Youngjun

    2017-09-30

    Approximately 1000 chemicals have been reported to possibly have endocrine disrupting effects, some of which are used in consumer products, such as personal care products (PCPs) and cosmetics. We conducted data integration combined with gene network analysis to: (i) identify causal molecular mechanisms between endocrine disrupting chemicals (EDCs) used in PCPs and breast cancer; and (ii) screen candidate EDCs associated with breast cancer. Among EDCs used in PCPs, four EDCs having correlation with breast cancer were selected, and we curated 27 common interacting genes between those EDCs and breast cancer to perform the gene network analysis. Based on the gene network analysis, ESR1, TP53, NCOA1, AKT1, and BCL6 were found to be key genes to demonstrate the molecular mechanisms of EDCs in the development of breast cancer. Using GeneMANIA, we additionally predicted 20 genes which could interact with the 27 common genes. In total, 47 genes combining the common and predicted genes were functionally grouped with the gene ontology and KEGG pathway terms. With those genes, we finally screened candidate EDCs for their potential to increase breast cancer risk. This study highlights that our approach can provide insights to understand mechanisms of breast cancer and identify potential EDCs which are in association with breast cancer.

  11. Identification of Linkages between EDCs in Personal Care Products and Breast Cancer through Data Integration Combined with Gene Network Analysis

    PubMed Central

    Kim, Jongwoon

    2017-01-01

    Approximately 1000 chemicals have been reported to possibly have endocrine disrupting effects, some of which are used in consumer products, such as personal care products (PCPs) and cosmetics. We conducted data integration combined with gene network analysis to: (i) identify causal molecular mechanisms between endocrine disrupting chemicals (EDCs) used in PCPs and breast cancer; and (ii) screen candidate EDCs associated with breast cancer. Among EDCs used in PCPs, four EDCs having correlation with breast cancer were selected, and we curated 27 common interacting genes between those EDCs and breast cancer to perform the gene network analysis. Based on the gene network analysis, ESR1, TP53, NCOA1, AKT1, and BCL6 were found to be key genes to demonstrate the molecular mechanisms of EDCs in the development of breast cancer. Using GeneMANIA, we additionally predicted 20 genes which could interact with the 27 common genes. In total, 47 genes combining the common and predicted genes were functionally grouped with the gene ontology and KEGG pathway terms. With those genes, we finally screened candidate EDCs for their potential to increase breast cancer risk. This study highlights that our approach can provide insights to understand mechanisms of breast cancer and identify potential EDCs which are in association with breast cancer. PMID:28973975

  12. Temporal profiling of gene networks associated with the late phase of long-term potentiation in vivo.

    PubMed

    Ryan, Margaret M; Ryan, Brigid; Kyrke-Smith, Madeleine; Logan, Barbara; Tate, Warren P; Abraham, Wickliffe C; Williams, Joanna M

    2012-01-01

    Long-term potentiation (LTP) is widely accepted as a cellular mechanism underlying memory processes. It is well established that LTP persistence is strongly dependent on activation of constitutive and inducible transcription factors, but there is limited information regarding the downstream gene networks and controlling elements that coalesce to stabilise LTP. To identify these gene networks, we used Affymetrix RAT230.2 microarrays to detect genes regulated 5 h and 24 h (n = 5) after LTP induction at perforant path synapses in the dentate gyrus of awake adult rats. The functional relationships of the differentially expressed genes were examined using DAVID and Ingenuity Pathway Analysis, and compared with our previous data derived 20 min post-LTP induction in vivo. This analysis showed that LTP-related genes are predominantly upregulated at 5 h but that there is pronounced downregulation of gene expression at 24 h after LTP induction. Analysis of the structure of the networks and canonical pathways predicted a regulation of calcium dynamics via G-protein coupled receptors, dendritogenesis and neurogenesis at the 5 h time-point. By 24 h neurotrophin-NFKB driven pathways of neuronal growth were identified. The temporal shift in gene expression appears to be mediated by regulation of protein synthesis, ubiquitination and time-dependent regulation of specific microRNA and histone deacetylase expression. Together this programme of genomic responses, marked by both homeostatic and growth pathways, is likely to be critical for the consolidation of LTP in vivo.

  13. System Biology Approach: Gene Network Analysis for Muscular Dystrophy.

    PubMed

    Censi, Federica; Calcagnini, Giovanni; Mattei, Eugenio; Giuliani, Alessandro

    2018-01-01

    Phenotypic changes at different organization levels from cell to entire organism are associated to changes in the pattern of gene expression. These changes involve the entire genome expression pattern and heavily rely upon correlation patterns among genes. The classical approach used to analyze gene expression data builds upon the application of supervised statistical techniques to detect genes differentially expressed among two or more phenotypes (e.g., normal vs. disease). The use of an a posteriori, unsupervised approach based on principal component analysis (PCA) and the subsequent construction of gene correlation networks can shed a light on unexpected behaviour of gene regulation system while maintaining a more naturalistic view on the studied system.In this chapter we applied an unsupervised method to discriminate DMD patient and controls. The genes having the highest absolute scores in the discrimination between the groups were then analyzed in terms of gene expression networks, on the basis of their mutual correlation in the two groups. The correlation network structures suggest two different modes of gene regulation in the two groups, reminiscent of important aspects of DMD pathogenesis.

  14. Generation of oscillating gene regulatory network motifs

    NASA Astrophysics Data System (ADS)

    van Dorp, M.; Lannoo, B.; Carlon, E.

    2013-07-01

    Using an improved version of an evolutionary algorithm originally proposed by François and Hakim [Proc. Natl. Acad. Sci. USAPNASA60027-842410.1073/pnas.0304532101 101, 580 (2004)], we generated small gene regulatory networks in which the concentration of a target protein oscillates in time. These networks may serve as candidates for oscillatory modules to be found in larger regulatory networks and protein interaction networks. The algorithm was run for 105 times to produce a large set of oscillating modules, which were systematically classified and analyzed. The robustness of the oscillations against variations of the kinetic rates was also determined, to filter out the least robust cases. Furthermore, we show that the set of evolved networks can serve as a database of models whose behavior can be compared to experimentally observed oscillations. The algorithm found three smallest (core) oscillators in which nonlinearities and number of components are minimal. Two of those are two-gene modules: the mixed feedback loop, already discussed in the literature, and an autorepressed gene coupled with a heterodimer. The third one is a single gene module which is competitively regulated by a monomer and a dimer. The evolutionary algorithm also generated larger oscillating networks, which are in part extensions of the three core modules and in part genuinely new modules. The latter includes oscillators which do not rely on feedback induced by transcription factors, but are purely of post-transcriptional type. Analysis of post-transcriptional mechanisms of oscillation may provide useful information for circadian clock research, as recent experiments showed that circadian rhythms are maintained even in the absence of transcription.

  15. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  16. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    PubMed Central

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases. PMID:25644381

  17. Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model.

    PubMed

    Kogelman, Lisette J A; Cirera, Susanna; Zhernakova, Daria V; Fredholm, Merete; Franke, Lude; Kadarmideen, Haja N

    2014-09-30

    Obesity is a complex metabolic condition in strong association with various diseases, like type 2 diabetes, resulting in major public health and economic implications. Obesity is the result of environmental and genetic factors and their interactions, including genome-wide genetic interactions. Identification of co-expressed and regulatory genes in RNA extracted from relevant tissues representing lean and obese individuals provides an entry point for the identification of genes and pathways of importance to the development of obesity. The pig, an omnivorous animal, is an excellent model for human obesity, offering the possibility to study in-depth organ-level transcriptomic regulations of obesity, unfeasible in humans. Our aim was to reveal adipose tissue co-expression networks, pathways and transcriptional regulations of obesity using RNA Sequencing based systems biology approaches in a porcine model. We selected 36 animals for RNA Sequencing from a previously created F2 pig population representing three extreme groups based on their predicted genetic risks for obesity. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to detect clusters of highly co-expressed genes (modules). Additionally, regulator genes were detected using Lemon-Tree algorithms. WGCNA revealed five modules which were strongly correlated with at least one obesity-related phenotype (correlations ranging from -0.54 to 0.72, P < 0.001). Functional annotation identified pathways enlightening the association between obesity and other diseases, like osteoporosis (osteoclast differentiation, P = 1.4E-7), and immune-related complications (e.g. Natural killer cell mediated cytotoxity, P = 3.8E-5; B cell receptor signaling pathway, P = 7.2E-5). Lemon-Tree identified three potential regulator genes, using confident scores, for the WGCNA module which was associated with osteoclast differentiation: CCR1, MSR1 and SI1 (probability scores respectively 95.30, 62.28, and 34.58). Moreover, detection

  18. Transcriptome analysis of genes and gene networks involved in aggressive behavior in mouse and zebrafish.

    PubMed

    Malki, Karim; Du Rietz, Ebba; Crusio, Wim E; Pain, Oliver; Paya-Cano, Jose; Karadaghi, Rezhaw L; Sluyter, Frans; de Boer, Sietse F; Sandnabba, Kenneth; Schalkwyk, Leonard C; Asherson, Philip; Tosto, Maria Grazia

    2016-09-01

    Despite moderate heritability estimates, the molecular architecture of aggressive behavior remains poorly characterized. This study compared gene expression profiles from a genetic mouse model of aggression with zebrafish, an animal model traditionally used to study aggression. A meta-analytic, cross-species approach was used to identify genomic variants associated with aggressive behavior. The Rankprod algorithm was used to evaluated mRNA differences from prefrontal cortex tissues of three sets of mouse lines (N = 18) selectively bred for low and high aggressive behavior (SAL/LAL, TA/TNA, and NC900/NC100). The same approach was used to evaluate mRNA differences in zebrafish (N = 12) exposed to aggressive or non-aggressive social encounters. Results were compared to uncover genes consistently implicated in aggression across both studies. Seventy-six genes were differentially expressed (PFP < 0.05) in aggressive compared to non-aggressive mice. Seventy genes were differentially expressed in zebrafish exposed to a fight encounter compared to isolated zebrafish. Seven genes (Fos, Dusp1, Hdac4, Ier2, Bdnf, Btg2, and Nr4a1) were differentially expressed across both species 5 of which belonging to a gene-network centred on the c-Fos gene hub. Network analysis revealed an association with the MAPK signaling cascade. In human studies HDAC4 haploinsufficiency is a key genetic mechanism associated with brachydactyly mental retardation syndrome (BDMR), which is associated with aggressive behaviors. Moreover, the HDAC4 receptor is a drug target for valproic acid, which is being employed as an effective pharmacological treatment for aggressive behavior in geriatric, psychiatric, and brain-injury patients. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Systems biology approach to late-onset Alzheimer's disease genome-wide association study identifies novel candidate genes validated using brain expression data and Caenorhabditis elegans experiments.

    PubMed

    Mukherjee, Shubhabrata; Russell, Joshua C; Carr, Daniel T; Burgess, Jeremy D; Allen, Mariet; Serie, Daniel J; Boehme, Kevin L; Kauwe, John S K; Naj, Adam C; Fardo, David W; Dickson, Dennis W; Montine, Thomas J; Ertekin-Taner, Nilufer; Kaeberlein, Matt R; Crane, Paul K

    2017-10-01

    We sought to determine whether a systems biology approach may identify novel late-onset Alzheimer's disease (LOAD) loci. We performed gene-wide association analyses and integrated results with human protein-protein interaction data using network analyses. We performed functional validation on novel genes using a transgenic Caenorhabditis elegans Aβ proteotoxicity model and evaluated novel genes using brain expression data from people with LOAD and other neurodegenerative conditions. We identified 13 novel candidate LOAD genes outside chromosome 19. Of those, RNA interference knockdowns of the C. elegans orthologs of UBC, NDUFS3, EGR1, and ATP5H were associated with Aβ toxicity, and NDUFS3, SLC25A11, ATP5H, and APP were differentially expressed in the temporal cortex. Network analyses identified novel LOAD candidate genes. We demonstrated a functional role for four of these in a C. elegans model and found enrichment of differentially expressed genes in the temporal cortex. Copyright © 2017 the Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  20. Meta-review of protein network regulating obesity between validated obesity candidate genes in the white adipose tissue of high-fat diet-induced obese C57BL/6J mice.

    PubMed

    Kim, Eunjung; Kim, Eun Jung; Seo, Seung-Won; Hur, Cheol-Goo; McGregor, Robin A; Choi, Myung-Sook

    2014-01-01

    Worldwide obesity and related comorbidities are increasing, but identifying new therapeutic targets remains a challenge. A plethora of microarray studies in diet-induced obesity models has provided large datasets of obesity associated genes. In this review, we describe an approach to examine the underlying molecular network regulating obesity, and we discuss interactions between obesity candidate genes. We conducted network analysis on functional protein-protein interactions associated with 25 obesity candidate genes identified in a literature-driven approach based on published microarray studies of diet-induced obesity. The obesity candidate genes were closely associated with lipid metabolism and inflammation. Peroxisome proliferator activated receptor gamma (Pparg) appeared to be a core obesity gene, and obesity candidate genes were highly interconnected, suggesting a coordinately regulated molecular network in adipose tissue. In conclusion, the current network analysis approach may help elucidate the underlying molecular network regulating obesity and identify anti-obesity targets for therapeutic intervention.

  1. The application of artificial intelligence to microarray data: identification of a novel gene signature to identify bladder cancer progression.

    PubMed

    Catto, James W F; Abbod, Maysam F; Wild, Peter J; Linkens, Derek A; Pilarsky, Christian; Rehman, Ishtiaq; Rosario, Derek J; Denzinger, Stefan; Burger, Maximilian; Stoehr, Robert; Knuechel, Ruth; Hartmann, Arndt; Hamdy, Freddie C

    2010-03-01

    New methods for identifying bladder cancer (BCa) progression are required. Gene expression microarrays can reveal insights into disease biology and identify novel biomarkers. However, these experiments produce large datasets that are difficult to interpret. To develop a novel method of microarray analysis combining two forms of artificial intelligence (AI): neurofuzzy modelling (NFM) and artificial neural networks (ANN) and validate it in a BCa cohort. We used AI and statistical analyses to identify progression-related genes in a microarray dataset (n=66 tumours, n=2800 genes). The AI-selected genes were then investigated in a second cohort (n=262 tumours) using immunohistochemistry. We compared the accuracy of AI and statistical approaches to identify tumour progression. AI identified 11 progression-associated genes (odds ratio [OR]: 0.70; 95% confidence interval [CI], 0.56-0.87; p=0.0004), and these were more discriminate than genes chosen using statistical analyses (OR: 1.24; 95% CI, 0.96-1.60; p=0.09). The expression of six AI-selected genes (LIG3, FAS, KRT18, ICAM1, DSG2, and BRCA2) was determined using commercial antibodies and successfully identified tumour progression (concordance index: 0.66; log-rank test: p=0.01). AI-selected genes were more discriminate than pathologic criteria at determining progression (Cox multivariate analysis: p=0.01). Limitations include the use of statistical correlation to identify 200 genes for AI analysis and that we did not compare regression identified genes with immunohistochemistry. AI and statistical analyses use different techniques of inference to determine gene-phenotype associations and identify distinct prognostic gene signatures that are equally valid. We have identified a prognostic gene signature whose members reflect a variety of carcinogenic pathways that could identify progression in non-muscle-invasive BCa. 2009 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  2. Reverse engineering highlights potential principles of large gene regulatory network design and learning.

    PubMed

    Carré, Clément; Mas, André; Krouk, Gabriel

    2017-01-01

    Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 10 4 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data ( Escherichia coli K14 network

  3. CoGAPS matrix factorization algorithm identifies transcriptional changes in AP-2alpha target genes in feedback from therapeutic inhibition of the EGFR network

    PubMed Central

    Thakar, Manjusha; Howard, Jason D.; Kagohara, Luciane T.; Krigsfeld, Gabriel; Ranaweera, Ruchira S.; Hughes, Robert M.; Perez, Jimena; Jones, Siân; Favorov, Alexander V.; Carey, Jacob; Stein-O'Brien, Genevieve; Gaykalova, Daria A.; Ochs, Michael F.; Chung, Christine H.

    2016-01-01

    Patients with oncogene driven tumors are treated with targeted therapeutics including EGFR inhibitors. Genomic data from The Cancer Genome Atlas (TCGA) demonstrates molecular alterations to EGFR, MAPK, and PI3K pathways in previously untreated tumors. Therefore, this study uses bioinformatics algorithms to delineate interactions resulting from EGFR inhibitor use in cancer cells with these genetic alterations. We modify the HaCaT keratinocyte cell line model to simulate cancer cells with constitutive activation of EGFR, HRAS, and PI3K in a controlled genetic background. We then measure gene expression after treating modified HaCaT cells with gefitinib, afatinib, and cetuximab. The CoGAPS algorithm distinguishes a gene expression signature associated with the anticipated silencing of the EGFR network. It also infers a feedback signature with EGFR gene expression itself increasing in cells that are responsive to EGFR inhibitors. This feedback signature has increased expression of several growth factor receptors regulated by the AP-2 family of transcription factors. The gene expression signatures for AP-2alpha are further correlated with sensitivity to cetuximab treatment in HNSCC cell lines and changes in EGFR expression in HNSCC tumors with low CDKN2A gene expression. In addition, the AP-2alpha gene expression signatures are also associated with inhibition of MEK, PI3K, and mTOR pathways in the Library of Integrated Network-Based Cellular Signatures (LINCS) data. These results suggest that AP-2 transcription factors are activated as feedback from EGFR network inhibition and may mediate EGFR inhibitor resistance. PMID:27650546

  4. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    PubMed

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  5. Comprehensive Analysis of Gene Expression Profiles of Sepsis-Induced Multiorgan Failure Identified Its Valuable Biomarkers.

    PubMed

    Wang, Yumei; Yin, Xiaoling; Yang, Fang

    2018-02-01

    Sepsis is an inflammatory-related disease, and severe sepsis would induce multiorgan dysfunction, which is the most common cause of death of patients in noncoronary intensive care units. Progression of novel therapeutic strategies has proven to be of little impact on the mortality of severe sepsis, and unfortunately, its mechanisms still remain poorly understood. In this study, we analyzed gene expression profiles of severe sepsis with failure of lung, kidney, and liver for the identification of potential biomarkers. We first downloaded the gene expression profiles from the Gene Expression Omnibus and performed preprocessing of raw microarray data sets and identification of differential expression genes (DEGs) through the R programming software; then, significantly enriched functions of DEGs in lung, kidney, and liver failure sepsis samples were obtained from the Database for Annotation, Visualization, and Integrated Discovery; finally, protein-protein interaction network was constructed for DEGs based on the STRING database, and network modules were also obtained through the MCODE cluster method. As a result, lung failure sepsis has the highest number of DEGs of 859, whereas the number of DEGs in kidney and liver failure sepsis samples is 178 and 175, respectively. In addition, 17 overlaps were obtained among the three lists of DEGs. Biological processes related to immune and inflammatory response were found to be significantly enriched in DEGs. Network and module analysis identified four gene clusters in which all or most of genes were upregulated. The expression changes of Icam1 and Socs3 were further validated through quantitative PCR analysis. This study should shed light on the development of sepsis and provide potential therapeutic targets for sepsis-induced multiorgan failure.

  6. A statistically inferred microRNA network identifies breast cancer target miR-940 as an actin cytoskeleton regulator

    NASA Astrophysics Data System (ADS)

    Bhajun, Ricky; Guyon, Laurent; Pitaval, Amandine; Sulpice, Eric; Combe, Stéphanie; Obeid, Patricia; Haguet, Vincent; Ghorbel, Itebeddine; Lajaunie, Christian; Gidrol, Xavier

    2015-02-01

    MiRNAs are key regulators of gene expression. By binding to many genes, they create a complex network of gene co-regulation. Here, using a network-based approach, we identified miRNA hub groups by their close connections and common targets. In one cluster containing three miRNAs, miR-612, miR-661 and miR-940, the annotated functions of the co-regulated genes suggested a role in small GTPase signalling. Although the three members of this cluster targeted the same subset of predicted genes, we showed that their overexpression impacted cell fates differently. miR-661 demonstrated enhanced phosphorylation of myosin II and an increase in cell invasion, indicating a possible oncogenic miRNA. On the contrary, miR-612 and miR-940 inhibit phosphorylation of myosin II and cell invasion. Finally, expression profiling in human breast tissues showed that miR-940 was consistently downregulated in breast cancer tissues

  7. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    PubMed

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  8. Gene network analysis: from heart development to cardiac therapy.

    PubMed

    Ferrazzi, Fulvia; Bellazzi, Riccardo; Engel, Felix B

    2015-03-01

    Networks offer a flexible framework to represent and analyse the complex interactions between components of cellular systems. In particular gene networks inferred from expression data can support the identification of novel hypotheses on regulatory processes. In this review we focus on the use of gene network analysis in the study of heart development. Understanding heart development will promote the elucidation of the aetiology of congenital heart disease and thus possibly improve diagnostics. Moreover, it will help to establish cardiac therapies. For example, understanding cardiac differentiation during development will help to guide stem cell differentiation required for cardiac tissue engineering or to enhance endogenous repair mechanisms. We introduce different methodological frameworks to infer networks from expression data such as Boolean and Bayesian networks. Then we present currently available temporal expression data in heart development and discuss the use of network-based approaches in published studies. Collectively, our literature-based analysis indicates that gene network analysis constitutes a promising opportunity to infer therapy-relevant regulatory processes in heart development. However, the use of network-based approaches has so far been limited by the small amount of samples in available datasets. Thus, we propose to acquire high-resolution temporal expression data to improve the mathematical descriptions of regulatory processes obtained with gene network inference methodologies. Especially probabilistic methods that accommodate the intrinsic variability of biological systems have the potential to contribute to a deeper understanding of heart development.

  9. Shared molecular pathways and gene networks for cardiovascular disease and type 2 diabetes mellitus in women across diverse ethnicities.

    PubMed

    Chan, Kei Hang K; Huang, Yen-Tsung; Meng, Qingying; Wu, Chunyuan; Reiner, Alexander; Sobel, Eric M; Tinker, Lesley; Lusis, Aldons J; Yang, Xia; Liu, Simin

    2014-12-01

    Although cardiovascular disease (CVD) and type 2 diabetes mellitus (T2D) share many common risk factors, potential molecular mechanisms that may also be shared for these 2 disorders remain unknown. Using an integrative pathway and network analysis, we performed genome-wide association studies in 8155 blacks, 3494 Hispanic American, and 3697 Caucasian American women who participated in the national Women's Health Initiative single-nucleotide polymorphism (SNP) Health Association Resource and the Genomics and Randomized Trials Network. Eight top pathways and gene networks related to cardiomyopathy, calcium signaling, axon guidance, cell adhesion, and extracellular matrix seemed to be commonly shared between CVD and T2D across all 3 ethnic groups. We also identified ethnicity-specific pathways, such as cell cycle (specific for Hispanic American and Caucasian American) and tight junction (CVD and combined CVD and T2D in Hispanic American). In network analysis of gene-gene or protein-protein interactions, we identified key drivers that included COL1A1, COL3A1, and ELN in the shared pathways for both CVD and T2D. These key driver genes were cross-validated in multiple mouse models of diabetes mellitus and atherosclerosis. Our integrative analysis of American women of 3 ethnicities identified multiple shared biological pathways and key regulatory genes for the development of CVD and T2D. These prospective findings also support the notion that ethnicity-specific susceptibility genes and process are involved in the pathogenesis of CVD and T2D. © 2014 American Heart Association, Inc.

  10. Identification of genes related to proliferative diabetic retinopathy through RWR algorithm based on protein-protein interaction network.

    PubMed

    Zhang, Jian; Suo, Yan; Liu, Min; Xu, Xun

    2018-06-01

    Proliferative diabetic retinopathy (PDR) is one of the most common complications of diabetes and can lead to blindness. Proteomic studies have provided insight into the pathogenesis of PDR and a series of PDR-related genes has been identified but are far from fully characterized because the experimental methods are expensive and time consuming. In our previous study, we successfully identified 35 candidate PDR-related genes through the shortest-path algorithm. In the current study, we developed a computational method using the random walk with restart (RWR) algorithm and the protein-protein interaction (PPI) network to identify potential PDR-related genes. After some possible genes were obtained by the RWR algorithm, a three-stage filtration strategy, which includes the permutation test, interaction test and enrichment test, was applied to exclude potential false positives caused by the structure of PPI network, the poor interaction strength, and the limited similarity on gene ontology (GO) terms and biological pathways. As a result, 36 candidate genes were discovered by the method which was different from the 35 genes reported in our previous study. A literature review showed that 21 of these 36 genes are supported by previous experiments. These findings suggest the robustness and complementary effects of both our efforts using different computational methods, thus providing an alternative method to study PDR pathogenesis. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Semi-Supervised Multi-View Learning for Gene Network Reconstruction

    PubMed Central

    Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo

    2015-01-01

    The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091

  12. Regulation of behaviorally associated gene networks in worker honey bee ovaries

    PubMed Central

    Wang, Ying; Kocher, Sarah D.; Linksvayer, Timothy A.; Grozinger, Christina M.; Page, Robert E.; Amdam, Gro V.

    2012-01-01

    SUMMARY Several lines of evidence support genetic links between ovary size and division of labor in worker honey bees. However, it is largely unknown how ovaries influence behavior. To address this question, we first performed transcriptional profiling on worker ovaries from two genotypes that differ in social behavior and ovary size. Then, we contrasted the differentially expressed ovarian genes with six sets of available brain transcriptomes. Finally, we probed behavior-related candidate gene networks in wild-type ovaries of different sizes. We found differential expression in 2151 ovarian transcripts in these artificially selected honey bee strains, corresponding to approximately 20.3% of the predicted gene set of honey bees. Differences in gene expression overlapped significantly with changes in the brain transcriptomes. Differentially expressed genes were associated with neural signal transmission (tyramine receptor, TYR) and ecdysteroid signaling; two independently tested nuclear hormone receptors (HR46 and ftz-f1) were also significantly correlated with ovary size in wild-type bees. We suggest that the correspondence between ovary and brain transcriptomes identified here indicates systemic regulatory networks among hormones (juvenile hormone and ecdysteroids), pheromones (queen mandibular pheromone), reproductive organs and nervous tissues in worker honey bees. Furthermore, robust correlations between ovary size and neuraland endocrine response genes are consistent with the hypothesized roles of the ovaries in honey bee behavioral regulation. PMID:22162860

  13. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain

    PubMed Central

    Krienen, Fenna M.; Yeo, B. T. Thomas; Ge, Tian; Buckner, Randy L.; Sherwood, Chet C.

    2016-01-01

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute’s human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections. PMID:26739559

  14. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain.

    PubMed

    Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C

    2016-01-26

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.

  15. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks

    DOE PAGES

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; ...

    2015-02-14

    Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstratemore » that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited.« less

  16. Candidate genes for panhypopituitarism identified by gene expression profiling

    PubMed Central

    Mortensen, Amanda H.; MacDonald, James W.; Ghosh, Debashis

    2011-01-01

    Mutations in the transcription factors PROP1 and PIT1 (POU1F1) lead to pituitary hormone deficiency and hypopituitarism in mice and humans. The dysmorphology of developing Prop1 mutant pituitaries readily distinguishes them from those of Pit1 mutants and normal mice. This and other features suggest that Prop1 controls the expression of genes besides Pit1 that are important for pituitary cell migration, survival, and differentiation. To identify genes involved in these processes we used microarray analysis of gene expression to compare pituitary RNA from newborn Prop1 and Pit1 mutants and wild-type littermates. Significant differences in gene expression were noted between each mutant and their normal littermates, as well as between Prop1 and Pit1 mutants. Otx2, a gene critical for normal eye and pituitary development in humans and mice, exhibited elevated expression specifically in Prop1 mutant pituitaries. We report the spatial and temporal regulation of Otx2 in normal mice and Prop1 mutants, and the results suggest Otx2 could influence pituitary development by affecting signaling from the ventral diencephalon and regulation of gene expression in Rathke's pouch. The discovery that Otx2 expression is affected by Prop1 deficiency provides support for our hypothesis that identifying molecular differences in mutants will contribute to understanding the molecular mechanisms that control pituitary organogenesis and lead to human pituitary disease. PMID:21828248

  17. Gene expression links functional networks across cortex and striatum.

    PubMed

    Anderson, Kevin M; Krienen, Fenna M; Choi, Eun Young; Reinen, Jenna M; Yeo, B T Thomas; Holmes, Avram J

    2018-04-12

    The human brain is comprised of a complex web of functional networks that link anatomically distinct regions. However, the biological mechanisms supporting network organization remain elusive, particularly across cortical and subcortical territories with vastly divergent cellular and molecular properties. Here, using human and primate brain transcriptional atlases, we demonstrate that spatial patterns of gene expression show strong correspondence with limbic and somato/motor cortico-striatal functional networks. Network-associated expression is consistent across independent human datasets and evolutionarily conserved in non-human primates. Genes preferentially expressed within the limbic network (encompassing nucleus accumbens, orbital/ventromedial prefrontal cortex, and temporal pole) relate to risk for psychiatric illness, chloride channel complexes, and markers of somatostatin neurons. Somato/motor associated genes are enriched for oligodendrocytes and markers of parvalbumin neurons. These analyses indicate that parallel cortico-striatal processing channels possess dissociable genetic signatures that recapitulate distributed functional networks, and nominate molecular mechanisms supporting cortico-striatal circuitry in health and disease.

  18. Comparison of gene co-networks reveals the molecular mechanisms of the rice (Oryza sativa L.) response to Rhizoctonia solani AG1 IA infection.

    PubMed

    Zhang, Jinfeng; Zhao, Wenjuan; Fu, Rong; Fu, Chenglin; Wang, Lingxia; Liu, Huainian; Li, Shuangcheng; Deng, Qiming; Wang, Shiquan; Zhu, Jun; Liang, Yueyang; Li, Ping; Zheng, Aiping

    2018-05-05

    Rhizoctonia solani causes rice sheath blight, an important disease affecting the growth of rice (Oryza sativa L.). Attempts to control the disease have met with little success. Based on transcriptional profiling, we previously identified more than 11,947 common differentially expressed genes (TPM > 10) between the rice genotypes TeQing and Lemont. In the current study, we extended these findings by focusing on an analysis of gene co-expression in response to R. solani AG1 IA and identified gene modules within the networks through weighted gene co-expression network analysis (WGCNA). We compared the different genes assigned to each module and the biological interpretations of gene co-expression networks at early and later modules in the two rice genotypes to reveal differential responses to AG1 IA. Our results show that different changes occurred in the two rice genotypes and that the modules in the two groups contain a number of candidate genes possibly involved in pathogenesis, such as the VQ protein. Furthermore, these gene co-expression networks provide comprehensive transcriptional information regarding gene expression in rice in response to AG1 IA. The co-expression networks derived from our data offer ideas for follow-up experimentation that will help advance our understanding of the translational regulation of rice gene expression changes in response to AG1 IA.

  19. Chaotic Motifs in Gene Regulatory Networks

    PubMed Central

    Zhang, Zhaoyang; Ye, Weiming; Qian, Yu; Zheng, Zhigang; Huang, Xuhui; Hu, Gang

    2012-01-01

    Chaos should occur often in gene regulatory networks (GRNs) which have been widely described by nonlinear coupled ordinary differential equations, if their dimensions are no less than 3. It is therefore puzzling that chaos has never been reported in GRNs in nature and is also extremely rare in models of GRNs. On the other hand, the topic of motifs has attracted great attention in studying biological networks, and network motifs are suggested to be elementary building blocks that carry out some key functions in the network. In this paper, chaotic motifs (subnetworks with chaos) in GRNs are systematically investigated. The conclusion is that: (i) chaos can only appear through competitions between different oscillatory modes with rivaling intensities. Conditions required for chaotic GRNs are found to be very strict, which make chaotic GRNs extremely rare. (ii) Chaotic motifs are explored as the simplest few-node structures capable of producing chaos, and serve as the intrinsic source of chaos of random few-node GRNs. Several optimal motifs causing chaos with atypically high probability are figured out. (iii) Moreover, we discovered that a number of special oscillators can never produce chaos. These structures bring some advantages on rhythmic functions and may help us understand the robustness of diverse biological rhythms. (iv) The methods of dominant phase-advanced driving (DPAD) and DPAD time fraction are proposed to quantitatively identify chaotic motifs and to explain the origin of chaotic behaviors in GRNs. PMID:22792171

  20. Identifying Differences in Abiotic Stress Gene Networks between Lowland and Upland Ecotypes of Switchgrass (DE-SC0008338)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Childs, Kevin; Buell, Robin; Zhao, Bingyu

    stress (e.g., transmembrane pumps that partition Na +) and mitigate the effects of the stress (e.g., synthesis of osmoprotectant metabolites and stress-related signaling compounds). Prior to the start of this project, no gene expression analysis had been performed on switchgrass under conditions of drought or salt stress, and therefore, relevant gene networks responding to drought and salt stress were unknown in switchgrass. In this project, we performed drought, salt and alkali-salt screens on 49 switchgrass cultivars (Liu et al 2014; Liu et al 2015; Hu et al 2015; Kim et al 2016). These experiments demonstrated that a wide range of variation exists within switchgrass for drought, salt and alkali-salt tolerance and that, while the lowland ecotype of switchgrass is often considered more tolerant of abiotic stresses, there are some upland switchgrass lines that are also very tolerant of drought, salt and alkali-salt stress. We also conducted drought and salt time course experiments with Alamo and Dacotah. We have identified modules of coexpressed genes that differentiate Alamo and Dacotah drought responses. We are continuing to analyze these results and plan to submit manuscripts describing this work in early 2017. In an effort to show how drought- and salt-related gene modules could be dissected, we generated transgenic switchgrass overexpressing either PvGTγ-1 or ZmDREB2. Increased expression of PvGTγ-1 does confer increased salt tolerance, and we were able to identify genes that are induced and suppressed by PvGTγ-1. Overexpression of ZmDREB2 increases drought tolerance in switchgrass. Analysis of the PvGTγ-1 and ZmDREB2 overexpression work is ongoing, and we plan to prepare manuscripts about these experiments for submission in early 2017.« less

  1. Construction of an integrated gene regulatory network link to stress-related immune system in cattle.

    PubMed

    Behdani, Elham; Bakhtiarizadeh, Mohammad Reza

    2017-10-01

    The immune system is an important biological system that is negatively impacted by stress. This study constructed an integrated regulatory network to enhance our understanding of the regulatory gene network used in the stress-related immune system. Module inference was used to construct modules of co-expressed genes with bovine leukocyte RNA-Seq data. Transcription factors (TFs) were then assigned to these modules using Lemon-Tree algorithms. In addition, the TFs assigned to each module were confirmed using the promoter analysis and protein-protein interactions data. Therefore, our integrated method identified three TFs which include one TF that is previously known to be involved in immune response (MYBL2) and two TFs (E2F8 and FOXS1) that had not been recognized previously and were identified for the first time in this study as novel regulatory candidates in immune response. This study provides valuable insights on the regulatory programs of genes involved in the stress-related immune system.

  2. Identification of lethal cluster of genes in the yeast transcription network

    NASA Astrophysics Data System (ADS)

    Rho, K.; Jeong, H.; Kahng, B.

    2006-05-01

    Identification of essential or lethal genes would be one of the ultimate goals in drug designs. Here we introduce an in silico method to select the cluster with a high population of lethal genes, called lethal cluster, through microarray assay. We construct a gene transcription network based on the microarray expression level. Links are added one by one in the descending order of the Pearson correlation coefficients between two genes. As the link density p increases, two meaningful link densities pm and ps are observed. At pm, which is smaller than the percolation threshold, the number of disconnected clusters is maximum, and the lethal genes are highly concentrated in a certain cluster that needs to be identified. Thus the deletion of all genes in that cluster could efficiently lead to a lethal inviable mutant. This lethal cluster can be identified by an in silico method. As p increases further beyond the percolation threshold, the power law behavior in the degree distribution of a giant cluster appears at ps. We measure the degree of each gene at ps. With the information pertaining to the degrees of each gene at ps, we return to the point pm and calculate the mean degree of genes of each cluster. We find that the lethal cluster has the largest mean degree.

  3. Genetic associations with micronutrient levels identified in immune and gastrointestinal networks.

    PubMed

    Morine, Melissa J; Monteiro, Jacqueline Pontes; Wise, Carolyn; Teitel, Candee; Pence, Lisa; Williams, Anna; Ning, Baitang; McCabe-Sellers, Beverly; Champagne, Catherine; Turner, Jerome; Shelby, Beatrice; Bogle, Margaret; Beger, Richard D; Priami, Corrado; Kaput, Jim

    2014-07-01

    The discovery of vitamins and clarification of their role in preventing frank essential nutrient deficiencies occurred in the early 1900s. Much vitamin research has understandably focused on public health and the effects of single nutrients to alleviate acute conditions. The physiological processes for maintaining health, however, are complex systems that depend upon interactions between multiple nutrients, environmental factors, and genetic makeup. To analyze the relationship between these factors and nutritional health, data were obtained from an observational, community-based participatory research program of children and teens (age 6-14) enrolled in a summer day camp in the Delta region of Arkansas. Assessments of erythrocyte S-adenosylmethionine (SAM) and S-adenosylhomocysteine (SAH), plasma homocysteine (Hcy) and 6 organic micronutrients (retinol, 25-hydroxy vitamin D3, pyridoxal, thiamin, riboflavin, and vitamin E), and 1,129 plasma proteins were performed at 3 time points in each of 2 years. Genetic makeup was analyzed with 1 M SNP genotyping arrays, and nutrient status was assessed with 24-h dietary intake questionnaires. A pattern of metabolites (met_PC1) that included the ratio of erythrocyte SAM/SAH, Hcy, and 5 vitamins were identified by principal component analysis. Met_PC1 levels were significantly associated with (1) single-nucleotide polymorphisms, (2) levels of plasma proteins, and (3) multilocus genotypes coding for gastrointestinal and immune functions, as identified in a global network of metabolic/protein-protein interactions. Subsequent mining of data from curated pathway, network, and genome-wide association studies identified genetic and functional relationships that may be explained by gene-nutrient interactions. The systems nutrition strategy described here has thus associated a multivariate metabolite pattern in blood with genes involved in immune and gastrointestinal functions.

  4. Chromosome Gene Orientation Inversion Networks (GOINs) of Plasmodium Proteome.

    PubMed

    Quevedo-Tumailli, Viviana F; Ortega-Tenezaca, Bernabé; González-Díaz, Humbert

    2018-03-02

    The spatial distribution of genes in chromosomes seems not to be random. For instance, only 10% of genes are transcribed from bidirectional promoters in humans, and many more are organized into larger clusters. This raises intriguing questions previously asked by different authors. We would like to add a few more questions in this context, related to gene orientation inversions. Does gene orientation (inversion) follow a random pattern? Is it relevant to biological activity somehow? We define a new kind of network coined as the gene orientation inversion network (GOIN). GOIN's complex network encodes short- and long-range patterns of inversion of the orientation of pairs of gene in the chromosome. We selected Plasmodium falciparum as a case of study due to the high relevance of this parasite to public health (causal agent of malaria). We constructed here for the first time all of the GOINs for the genome of this parasite. These networks have an average of 383 nodes (genes in one chromosome) and 1314 links (pairs of gene with inverse orientation). We calculated node centralities and other parameters of these networks. These numerical parameters were used to study different properties of gene inversion patterns, for example, distribution, local communities, similarity to Erdös-Rényi random networks, randomness, and so on. We find clues that seem to indicate that gene orientation inversion does not follow a random pattern. We noted that some gene communities in the GOINs tend to group genes encoding for RIFIN-related proteins in the proteome of the parasite. RIFIN-like proteins are a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Consequently, we used these centralities as input of machine learning (ML) models to predict the RIFIN-like activity of 5365 proteins in the proteome of Plasmodium sp. The best linear ML model found discriminates RIFIN-like from other proteins with sensitivity and

  5. A pathway-based network analysis of hypertension-related genes

    NASA Astrophysics Data System (ADS)

    Wang, Huan; Hu, Jing-Bo; Xu, Chuan-Yun; Zhang, De-Hai; Yan, Qian; Xu, Ming; Cao, Ke-Fei; Zhang, Xu-Sheng

    2016-02-01

    Complex network approach has become an effective way to describe interrelationships among large amounts of biological data, which is especially useful in finding core functions and global behavior of biological systems. Hypertension is a complex disease caused by many reasons including genetic, physiological, psychological and even social factors. In this paper, based on the information of biological pathways, we construct a network model of hypertension-related genes of the salt-sensitive rat to explore the interrelationship between genes. Statistical and topological characteristics show that the network has the small-world but not scale-free property, and exhibits a modular structure, revealing compact and complex connections among these genes. By the threshold of integrated centrality larger than 0.71, seven key hub genes are found: Jun, Rps6kb1, Cycs, Creb312, Cdk4, Actg1 and RT1-Da. These genes should play an important role in hypertension, suggesting that the treatment of hypertension should focus on the combination of drugs on multiple genes.

  6. Analysis of gene network robustness based on saturated fixed point attractors

    PubMed Central

    2014-01-01

    The analysis of gene network robustness to noise and mutation is important for fundamental and practical reasons. Robustness refers to the stability of the equilibrium expression state of a gene network to variations of the initial expression state and network topology. Numerical simulation of these variations is commonly used for the assessment of robustness. Since there exists a great number of possible gene network topologies and initial states, even millions of simulations may be still too small to give reliable results. When the initial and equilibrium expression states are restricted to being saturated (i.e., their elements can only take values 1 or −1 corresponding to maximum activation and maximum repression of genes), an analytical gene network robustness assessment is possible. We present this analytical treatment based on determination of the saturated fixed point attractors for sigmoidal function models. The analysis can determine (a) for a given network, which and how many saturated equilibrium states exist and which and how many saturated initial states converge to each of these saturated equilibrium states and (b) for a given saturated equilibrium state or a given pair of saturated equilibrium and initial states, which and how many gene networks, referred to as viable, share this saturated equilibrium state or the pair of saturated equilibrium and initial states. We also show that the viable networks sharing a given saturated equilibrium state must follow certain patterns. These capabilities of the analytical treatment make it possible to properly define and accurately determine robustness to noise and mutation for gene networks. Previous network research conclusions drawn from performing millions of simulations follow directly from the results of our analytical treatment. Furthermore, the analytical results provide criteria for the identification of model validity and suggest modified models of gene network dynamics. The yeast cell-cycle network

  7. Enhancing gene regulatory network inference through data integration with markov random fields

    DOE PAGES

    Banf, Michael; Rhee, Seung Y.

    2017-02-01

    Here, a gene regulatory network links transcription factors to their target genes and represents a map of transcriptional regulation. Much progress has been made in deciphering gene regulatory networks computationally. However, gene regulatory network inference for most eukaryotic organisms remain challenging. To improve the accuracy of gene regulatory network inference and facilitate candidate selection for experimentation, we developed an algorithm called GRACE (Gene Regulatory network inference ACcuracy Enhancement). GRACE exploits biological a priori and heterogeneous data integration to generate high- confidence network predictions for eukaryotic organisms using Markov Random Fields in a semi-supervised fashion. GRACE uses a novel optimization schememore » to integrate regulatory evidence and biological relevance. It is particularly suited for model learning with sparse regulatory gold standard data. We show GRACE’s potential to produce high confidence regulatory networks compared to state of the art approaches using Drosophila melanogaster and Arabidopsis thaliana data. In an A. thaliana developmental gene regulatory network, GRACE recovers cell cycle related regulatory mechanisms and further hypothesizes several novel regulatory links, including a putative control mechanism of vascular structure formation due to modifications in cell proliferation.« less

  8. Enhancing gene regulatory network inference through data integration with markov random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Banf, Michael; Rhee, Seung Y.

    Here, a gene regulatory network links transcription factors to their target genes and represents a map of transcriptional regulation. Much progress has been made in deciphering gene regulatory networks computationally. However, gene regulatory network inference for most eukaryotic organisms remain challenging. To improve the accuracy of gene regulatory network inference and facilitate candidate selection for experimentation, we developed an algorithm called GRACE (Gene Regulatory network inference ACcuracy Enhancement). GRACE exploits biological a priori and heterogeneous data integration to generate high- confidence network predictions for eukaryotic organisms using Markov Random Fields in a semi-supervised fashion. GRACE uses a novel optimization schememore » to integrate regulatory evidence and biological relevance. It is particularly suited for model learning with sparse regulatory gold standard data. We show GRACE’s potential to produce high confidence regulatory networks compared to state of the art approaches using Drosophila melanogaster and Arabidopsis thaliana data. In an A. thaliana developmental gene regulatory network, GRACE recovers cell cycle related regulatory mechanisms and further hypothesizes several novel regulatory links, including a putative control mechanism of vascular structure formation due to modifications in cell proliferation.« less

  9. The Caenorhabditis elegans vulva: A post-embryonic gene regulatory network controlling organogenesis

    PubMed Central

    Ririe, Ted O.; Fernandes, Jolene S.; Sternberg, Paul W.

    2008-01-01

    The Caenorhabditis elegans vulva is an elegant model for dissecting a gene regulatory network (GRN) that directs postembryonic organogenesis. The mature vulva comprises seven cell types (vulA, vulB1, vulB2, vulC, vulD, vulE, and vulF), each with its own unique pattern of spatial and temporal gene expression. The mechanisms that specify these cell types in a precise spatial pattern are not well understood. Using reverse genetic screens, we identified novel components of the vulval GRN, including nhr-113 in vulA. Several transcription factors (lin-11, lin-29, cog-1, egl-38, and nhr-67) interact with each other and act in concert to regulate target gene expression in the diverse vulval cell types. For example, egl-38 (Pax2/5/8) stabilizes the vulF fate by positively regulating vulF characteristics and by inhibiting characteristics associated with the neighboring vulE cells. nhr-67 and egl-38 regulate cog-1, helping restrict its expression to vulE. Computational approaches have been successfully used to identify functional cis-regulatory motifs in the zmp-1 (zinc metalloproteinase) promoter. These results provide an overview of the regulatory network architecture for each vulval cell type. PMID:19104047

  10. On construction of stochastic genetic networks based on gene expression sequences.

    PubMed

    Ching, Wai-Ki; Ng, Michael M; Fung, Eric S; Akutsu, Tatsuya

    2005-08-01

    Reconstruction of genetic regulatory networks from time series data of gene expression patterns is an important research topic in bioinformatics. Probabilistic Boolean Networks (PBNs) have been proposed as an effective model for gene regulatory networks. PBNs are able to cope with uncertainty, corporate rule-based dependencies between genes and discover the sensitivity of genes in their interactions with other genes. However, PBNs are unlikely to use directly in practice because of huge amount of computational cost for obtaining predictors and their corresponding probabilities. In this paper, we propose a multivariate Markov model for approximating PBNs and describing the dynamics of a genetic network for gene expression sequences. The main contribution of the new model is to preserve the strength of PBNs and reduce the complexity of the networks. The number of parameters of our proposed model is O(n2) where n is the number of genes involved. We also develop efficient estimation methods for solving the model parameters. Numerical examples on synthetic data sets and practical yeast data sequences are given to demonstrate the effectiveness of the proposed model.

  11. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks.

    PubMed

    Ficklin, Stephen P; Luo, Feng; Feltus, F Alex

    2010-09-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.

  12. GENIUS: web server to predict local gene networks and key genes for biological functions.

    PubMed

    Puelma, Tomas; Araus, Viviana; Canales, Javier; Vidal, Elena A; Cabello, Juan M; Soto, Alvaro; Gutiérrez, Rodrigo A

    2017-03-01

    GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods. GENIUS currently supports eight model organisms and is freely available for public use at http://networks.bio.puc.cl/genius . genius.psbl@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  13. Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.

    PubMed

    Adriaens, M E; Bezzina, C R

    2018-06-22

    Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.

  14. Hybrid stochastic simplifications for multiscale gene networks.

    PubMed

    Crudu, Alina; Debussche, Arnaud; Radulescu, Ovidiu

    2009-09-07

    Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed. Approximate schemes reduce the computational time by reducing the number of simulated discrete events. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. These general results are difficult to obtain for exact models. We propose a unified framework for hybrid simplifications of Markov models of multiscale stochastic gene networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion [1-3] which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. Hybrid and averaged simplifications can be used for more effective simulation algorithms and for obtaining general design principles relating noise to topology and time scales. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. The methods are illustrated on several gene network examples. Hybrid simplifications can be used for onion-like (multi-layered) approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach.

  15. Hybrid stochastic simplifications for multiscale gene networks

    PubMed Central

    Crudu, Alina; Debussche, Arnaud; Radulescu, Ovidiu

    2009-01-01

    Background Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed. Approximate schemes reduce the computational time by reducing the number of simulated discrete events. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. These general results are difficult to obtain for exact models. Results We propose a unified framework for hybrid simplifications of Markov models of multiscale stochastic gene networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion [1-3] which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. Hybrid and averaged simplifications can be used for more effective simulation algorithms and for obtaining general design principles relating noise to topology and time scales. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. The methods are illustrated on several gene network examples. Conclusion Hybrid simplifications can be used for onion-like (multi-layered) approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach. PMID:19735554

  16. Efficient experimental design for uncertainty reduction in gene regulatory networks.

    PubMed

    Dehghannasiri, Roozbeh; Yoon, Byung-Jun; Dougherty, Edward R

    2015-01-01

    An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first. The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks. Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/.

  17. System analysis identifies distinct and common functional networks governed by transcription factor ASCL1, in glioma and small cell lung cancer.

    PubMed

    Donakonda, Sainitin; Sinha, Swati; Dighe, Shrinivas Nivrutti; Rao, Manchanahalli R Satyanarayana

    2017-07-25

    ASCL1 is a basic Helix-Loop-Helix transcription factor (TF), which is involved in various cellular processes like neuronal development and signaling pathways. Transcriptome profiling has shown that ASCL1 overexpression plays an important role in the development of glioma and Small Cell Lung Carcinoma (SCLC), but distinct and common molecular mechanisms regulated by ASCL1 in these cancers are unknown. In order to understand how it drives the cellular functional network in these two tumors, we generated a gene expression profile in a glioma cell line (U87MG) to identify ASCL1 gene targets by an si RNA silencing approach and then compared this with a publicly available dataset of similarly silenced SCLC (NCI-H1618 cells). We constructed TF-TF and gene-gene interactions, as well as protein interaction networks of ASCL1 regulated genes in glioma and SCLC cells. Detailed network analysis uncovered various biological processes governed by ASCL1 target genes in these two tumor cell lines. We find that novel ASCL1 functions related to mitosis and signaling pathways influencing development and tumor growth are affected in both glioma and SCLC cells. In addition, we also observed ASCL1 governed functional networks that are distinct to glioma and SCLC.

  18. Identifying potential maternal genes of Bombyx mori using digital gene expression profiling

    PubMed Central

    Xu, Pingzhen

    2018-01-01

    Maternal genes present in mature oocytes play a crucial role in the early development of silkworm. Although maternal genes have been widely studied in many other species, there has been limited research in Bombyx mori. High-throughput next generation sequencing provides a practical method for gene discovery on a genome-wide level. Herein, a transcriptome study was used to identify maternal-related genes from silkworm eggs. Unfertilized eggs from five different stages of early development were used to detect the changing situation of gene expression. The expressed genes showed different patterns over time. Seventy-six maternal genes were annotated according to homology analysis with Drosophila melanogaster. More than half of the differentially expressed maternal genes fell into four expression patterns, while the expression patterns showed a downward trend over time. The functional annotation of these material genes was mainly related to transcription factor activity, growth factor activity, nucleic acid binding, RNA binding, ATP binding, and ion binding. Additionally, twenty-two gene clusters including maternal genes were identified from 18 scaffolds. Altogether, we plotted a profile for the maternal genes of Bombyx mori using a digital gene expression profiling method. This will provide the basis for maternal-specific signature research and improve the understanding of the early development of silkworm. PMID:29462160

  19. VISIONET: intuitive visualisation of overlapping transcription factor networks, with applications in cardiogenic gene discovery.

    PubMed

    Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Rosenthal, Nadia A; Kitano, Hiroaki; Boyd, Sarah E

    2015-05-01

    Existing de novo software platforms have largely overlooked a valuable resource, the expertise of the intended biologist users. Typical data representations such as long gene lists, or highly dense and overlapping transcription factor networks often hinder biologists from relating these results to their expertise. VISIONET, a streamlined visualisation tool built from experimental needs, enables biologists to transform large and dense overlapping transcription factor networks into sparse human-readable graphs via numerically filtering. The VISIONET interface allows users without a computing background to interactively explore and filter their data, and empowers them to apply their specialist knowledge on far more complex and substantial data sets than is currently possible. Applying VISIONET to the Tbx20-Gata4 transcription factor network led to the discovery and validation of Aldh1a2, an essential developmental gene associated with various important cardiac disorders, as a healthy adult cardiac fibroblast gene co-regulated by cardiogenic transcription factors Gata4 and Tbx20. We demonstrate with experimental validations the utility of VISIONET for expertise-driven gene discovery that opens new experimental directions that would not otherwise have been identified.

  20. Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review

    PubMed Central

    Zhang, Xue; Acencio, Marcio Luis; Lemke, Ney

    2016-01-01

    Essential proteins/genes are indispensable to the survival or reproduction of an organism, and the deletion of such essential proteins will result in lethality or infertility. The identification of essential genes is very important not only for understanding the minimal requirements for survival of an organism, but also for finding human disease genes and new drug targets. Experimental methods for identifying essential genes are costly, time-consuming, and laborious. With the accumulation of sequenced genomes data and high-throughput experimental data, many computational methods for identifying essential proteins are proposed, which are useful complements to experimental methods. In this review, we show the state-of-the-art methods for identifying essential genes and proteins based on machine learning and network topological features, point out the progress and limitations of current methods, and discuss the challenges and directions for further research. PMID:27014079

  1. Synchronous versus asynchronous modeling of gene regulatory networks.

    PubMed

    Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni

    2008-09-01

    In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.

  2. Reverse engineering gene regulatory networks from measurement with missing values.

    PubMed

    Ogundijo, Oyetunji E; Elmas, Abdulkadir; Wang, Xiaodong

    2016-12-01

    Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values. We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements . The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae . PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence

  3. A quantitative proteomics approach identifies ETV6 and IKZF1 as new regulators of an ERG-driven transcriptional network

    PubMed Central

    Unnikrishnan, Ashwin; Guan, Yi F.; Huang, Yizhou; Beck, Dominik; Thoms, Julie A. I.; Peirs, Sofie; Knezevic, Kathy; Ma, Shiyong; de Walle, Inge V.; de Jong, Ineke; Ali, Zara; Zhong, Ling; Raftery, Mark J.; Taghon, Tom; Larsson, Jonas; MacKenzie, Karen L.; Van Vlierberghe, Pieter; Wong, Jason W. H.; Pimanda, John E.

    2016-01-01

    Aberrant stem cell-like gene regulatory networks are a feature of leukaemogenesis. The ETS-related gene (ERG), an important regulator of normal haematopoiesis, is also highly expressed in T-ALL and acute myeloid leukaemia (AML). However, the transcriptional regulation of ERG in leukaemic cells remains poorly understood. In order to discover transcriptional regulators of ERG, we employed a quantitative mass spectrometry-based method to identify factors binding the 321 bp ERG +85 stem cell enhancer region in MOLT-4 T-ALL and KG-1 AML cells. Using this approach, we identified a number of known binders of the +85 enhancer in leukaemic cells along with previously unknown binders, including ETV6 and IKZF1. We confirmed that ETV6 and IKZF1 were also bound at the +85 enhancer in both leukaemic cells and in healthy human CD34+ haematopoietic stem and progenitor cells. Knockdown experiments confirmed that ETV6 and IKZF1 are transcriptional regulators not just of ERG, but also of a number of genes regulated by a densely interconnected network of seven transcription factors. At last, we show that ETV6 and IKZF1 expression levels are positively correlated with expression of a number of heptad genes in AML and high expression of all nine genes confers poorer overall prognosis. PMID:27604872

  4. Expression profiling during ocular development identifies 2 Nlz genes with a critical role in optic fissure closure.

    PubMed

    Brown, Jacob D; Dutta, Sunit; Bharti, Kapil; Bonner, Robert F; Munson, Peter J; Dawid, Igor B; Akhtar, Amana L; Onojafe, Ighovie F; Alur, Ramakrishna P; Gross, Jeffrey M; Hejtmancik, J Fielding; Jiao, Xiaodong; Chan, Wai-Yee; Brooks, Brian P

    2009-02-03

    The gene networks underlying closure of the optic fissure during vertebrate eye development are poorly understood. Here, we profile global gene expression during optic fissure closure using laser capture microdissected (LCM) tissue from the margins of the fissure. From these data, we identify a unique role for the C(2)H(2) zinc finger proteins Nlz1 and Nlz2 in normal fissure closure. Gene knockdown of nlz1 and/or nlz2 in zebrafish leads to a failure of the optic fissure to close, a phenotype which closely resembles that seen in human uveal coloboma. We also identify misregulation of pax2 in the developing eye of morphant fish, suggesting that Nlz1 and Nlz2 act upstream of the Pax2 pathway in directing proper closure of the optic fissure.

  5. Rapid identifying high-influence nodes in complex networks

    NASA Astrophysics Data System (ADS)

    Song, Bo; Jiang, Guo-Ping; Song, Yu-Rong; Xia, Ling-Ling

    2015-10-01

    A tiny fraction of influential individuals play a critical role in the dynamics on complex systems. Identifying the influential nodes in complex networks has theoretical and practical significance. Considering the uncertainties of network scale and topology, and the timeliness of dynamic behaviors in real networks, we propose a rapid identifying method (RIM) to find the fraction of high-influential nodes. Instead of ranking all nodes, our method only aims at ranking a small number of nodes in network. We set the high-influential nodes as initial spreaders, and evaluate the performance of RIM by the susceptible-infected-recovered (SIR) model. The simulations show that in different networks, RIM performs well on rapid identifying high-influential nodes, which is verified by typical ranking methods, such as degree, closeness, betweenness, and eigenvector centrality methods. Project supported by the National Natural Science Foundation of China (Grant Nos. 61374180 and 61373136), the Ministry of Education Research in the Humanities and Social Sciences Planning Fund Project, China (Grant No. 12YJAZH120), and the Six Projects Sponsoring Talent Summits of Jiangsu Province, China (Grant No. RLD201212).

  6. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

    PubMed

    Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

  7. Integrated microarray and ChIP analysis identifies multiple Foxa2 dependent target genes in the notochord.

    PubMed

    Tamplin, Owen J; Cox, Brian J; Rossant, Janet

    2011-12-15

    The node and notochord are key tissues required for patterning of the vertebrate body plan. Understanding the gene regulatory network that drives their formation and function is therefore important. Foxa2 is a key transcription factor at the top of this genetic hierarchy and finding its targets will help us to better understand node and notochord development. We performed an extensive microarray-based gene expression screen using sorted embryonic notochord cells to identify early notochord-enriched genes. We validated their specificity to the node and notochord by whole mount in situ hybridization. This provides the largest available resource of notochord-expressed genes, and therefore candidate Foxa2 target genes in the notochord. Using existing Foxa2 ChIP-seq data from adult liver, we were able to identify a set of genes expressed in the notochord that had associated regions of Foxa2-bound chromatin. Given that Foxa2 is a pioneer transcription factor, we reasoned that these sites might represent notochord-specific enhancers. Candidate Foxa2-bound regions were tested for notochord specific enhancer function in a zebrafish reporter assay and 7 novel notochord enhancers were identified. Importantly, sequence conservation or predictive models could not have readily identified these regions. Mutation of putative Foxa2 binding elements in two of these novel enhancers abrogated reporter expression and confirmed their Foxa2 dependence. The combination of highly specific gene expression profiling and genome-wide ChIP analysis is a powerful means of understanding developmental pathways, even for small cell populations such as the notochord. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. Gene regulatory networks and the underlying biology of developmental toxicity

    EPA Science Inventory

    Embryonic cells are specified by large-scale networks of functionally linked regulatory genes. Knowledge of the relevant gene regulatory networks is essential for understanding phenotypic heterogeneity that emerges from disruption of molecular functions, cellular processes or sig...

  9. JRmGRN: Joint reconstruction of multiple gene regulatory networks with common hub genes using data from multiple tissues or conditions.

    PubMed

    Deng, Wenping; Zhang, Kui; Liu, Sanzhen; Zhao, Patrick; Xu, Shizhong; Wei, Hairong

    2018-04-30

    Joint reconstruction of multiple gene regulatory networks (GRNs) using gene expression data from multiple tissues/conditions is very important for understanding common and tissue/condition-specific regulation. However, there are currently no computational models and methods available for directly constructing such multiple GRNs that not only share some common hub genes but also possess tissue/condition-specific regulatory edges. In this paper, we proposed a new graphic Gaussian model for joint reconstruction of multiple gene regulatory networks (JRmGRN), which highlighted hub genes, using gene expression data from several tissues/conditions. Under the framework of Gaussian graphical model, JRmGRN method constructs the GRNs through maximizing a penalized log likelihood function. We formulated it as a convex optimization problem, and then solved it with an alternating direction method of multipliers (ADMM) algorithm. The performance of JRmGRN was first evaluated with synthetic data and the results showed that JRmGRN outperformed several other methods for reconstruction of GRNs. We also applied our method to real Arabidopsis thaliana RNA-seq data from two light regime conditions in comparison with other methods, and both common hub genes and some conditions-specific hub genes were identified with higher accuracy and precision. JRmGRN is available as a R program from: https://github.com/wenpingd. hairong@mtu.edu. Proof of theorem, derivation of algorithm and supplementary data are available at Bioinformatics online.

  10. Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization.

    PubMed

    Zhao, Zhi-Qin; Han, Guo-Sheng; Yu, Zu-Guo; Li, Jinyan

    2015-08-01

    Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Pluripotency gene network dynamics: System views from parametric analysis.

    PubMed

    Akberdin, Ilya R; Omelyanchuk, Nadezda A; Fadeev, Stanislav I; Leskova, Natalya E; Oschepkova, Evgeniya A; Kazantsev, Fedor V; Matushkin, Yury G; Afonnikov, Dmitry A; Kolchanov, Nikolay A

    2018-01-01

    Multiple experimental data demonstrated that the core gene network orchestrating self-renewal and differentiation of mouse embryonic stem cells involves activity of Oct4, Sox2 and Nanog genes by means of a number of positive feedback loops among them. However, recent studies indicated that the architecture of the core gene network should also incorporate negative Nanog autoregulation and might not include positive feedbacks from Nanog to Oct4 and Sox2. Thorough parametric analysis of the mathematical model based on this revisited core regulatory circuit identified that there are substantial changes in model dynamics occurred depending on the strength of Oct4 and Sox2 activation and molecular complexity of Nanog autorepression. The analysis showed the existence of four dynamical domains with different numbers of stable and unstable steady states. We hypothesize that these domains can constitute the checkpoints in a developmental progression from naïve to primed pluripotency and vice versa. During this transition, parametric conditions exist, which generate an oscillatory behavior of the system explaining heterogeneity in expression of pluripotent and differentiation factors in serum ESC cultures. Eventually, simulations showed that addition of positive feedbacks from Nanog to Oct4 and Sox2 leads mainly to increase of the parametric space for the naïve ESC state, in which pluripotency factors are strongly expressed while differentiation ones are repressed.

  12. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additionalmore » genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.« less

  13. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

    PubMed

    Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

    2015-09-29

    In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.

  14. Identifying molecular features for prostate cancer with Gleason 7 based on microarray gene expression profiles.

    PubMed

    Bălăcescu, Loredana; Bălăcescu, O; Crişan, N; Fetica, B; Petruţ, B; Bungărdean, Cătălina; Rus, Meda; Tudoran, Oana; Meurice, G; Irimie, Al; Dragoş, N; Berindan-Neagoe, Ioana

    2011-01-01

    Prostate cancer represents the first leading cause of cancer among western male population, with different clinical behavior ranging from indolent to metastatic disease. Although many molecules and deregulated pathways are known, the molecular mechanisms involved in the development of prostate cancer are not fully understood. The aim of this study was to explore the molecular variation underlying the prostate cancer, based on microarray analysis and bioinformatics approaches. Normal and prostate cancer tissues were collected by macrodissection from prostatectomy pieces. All prostate cancer specimens used in our study were Gleason score 7. Gene expression microarray (Agilent Technologies) was used for Whole Human Genome evaluation. The bioinformatics and functional analysis were based on Limma and Ingenuity software. The microarray analysis identified 1119 differentially expressed genes between prostate cancer and normal prostate, which were up- or down-regulated at least 2-fold. P-values were adjusted for multiple testing using Benjamini-Hochberg method with a false discovery rate of 0.01. These genes were analyzed with Ingenuity Pathway Analysis software and were established 23 genetic networks. Our microarray results provide new information regarding the molecular networks in prostate cancer stratified as Gleason 7. These data highlighted gene expression profiles for better understanding of prostate cancer progression.

  15. Genome-wide association study and gene network analysis of fertility, retained placenta, and metritis in US Holstein cattle

    USDA-ARS?s Scientific Manuscript database

    The objectives of this research were to identify genes, genomic regions, and gene networks associated with three measures of fertility (daughter pregnancy rate, DPR; heifer conception rate, HCR; and cow conception rate, CCR) and two measures of reproductive health (metritis, METR; and retained place...

  16. How reliable is the linear noise approximation of gene regulatory networks?

    PubMed Central

    2013-01-01

    Background The linear noise approximation (LNA) is commonly used to predict how noise is regulated and exploited at the cellular level. These predictions are exact for reaction networks composed exclusively of first order reactions or for networks involving bimolecular reactions and large numbers of molecules. It is however well known that gene regulation involves bimolecular interactions with molecule numbers as small as a single copy of a particular gene. It is therefore questionable how reliable are the LNA predictions for these systems. Results We implement in the software package intrinsic Noise Analyzer (iNA), a system size expansion based method which calculates the mean concentrations and the variances of the fluctuations to an order of accuracy higher than the LNA. We then use iNA to explore the parametric dependence of the Fano factors and of the coefficients of variation of the mRNA and protein fluctuations in models of genetic networks involving nonlinear protein degradation, post-transcriptional, post-translational and negative feedback regulation. We find that the LNA can significantly underestimate the amplitude and period of noise-induced oscillations in genetic oscillators. We also identify cases where the LNA predicts that noise levels can be optimized by tuning a bimolecular rate constant whereas our method shows that no such regulation is possible. All our results are confirmed by stochastic simulations. Conclusion The software iNA allows the investigation of parameter regimes where the LNA fares well and where it does not. We have shown that the parametric dependence of the coefficients of variation and Fano factors for common gene regulatory networks is better described by including terms of higher order than LNA in the system size expansion. This analysis is considerably faster than stochastic simulations due to the extensive ensemble averaging needed to obtain statistically meaningful results. Hence iNA is well suited for performing

  17. Efficient experimental design for uncertainty reduction in gene regulatory networks

    PubMed Central

    2015-01-01

    Background An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first. Results The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks. Conclusions Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/. PMID:26423515

  18. Identification of susceptible genes for complex chronic diseases based on disease risk functional SNPs and interaction networks.

    PubMed

    Li, Wan; Zhu, Lina; Huang, Hao; He, Yuehan; Lv, Junjie; Li, Weimin; Chen, Lina; He, Weiming

    2017-10-01

    Complex chronic diseases are caused by the effects of genetic and environmental factors. Single nucleotide polymorphisms (SNPs), one common type of genetic variations, played vital roles in diseases. We hypothesized that disease risk functional SNPs in coding regions and protein interaction network modules were more likely to contribute to the identification of disease susceptible genes for complex chronic diseases. This could help to further reveal the pathogenesis of complex chronic diseases. Disease risk SNPs were first recognized from public SNP data for coronary heart disease (CHD), hypertension (HT) and type 2 diabetes (T2D). SNPs in coding regions that were classified into nonsense and missense by integrating several SNP functional annotation databases were treated as functional SNPs. Then, regions significantly associated with each disease were screened using random permutations for disease risk functional SNPs. Corresponding to these regions, 155, 169 and 173 potential disease susceptible genes were identified for CHD, HT and T2D, respectively. A disease-related gene product interaction network in environmental context was constructed for interacting gene products of both disease genes and potential disease susceptible genes for these diseases. After functional enrichment analysis for disease associated modules, 5 CHD susceptible genes, 7 HT susceptible genes and 3 T2D susceptible genes were finally identified, some of which had pleiotropic effects. Most of these genes were verified to be related to these diseases in literature. This was similar for disease genes identified from another method proposed by Lee et al. from a different aspect. This research could provide novel perspectives for diagnosis and treatment of complex chronic diseases and susceptible genes identification for other diseases. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. A Systems Biology Framework Identifies Molecular Underpinnings of Coronary Heart Disease

    PubMed Central

    Huan, Tianxiao; Zhang, Bin; Wang, Zhi; Joehanes, Roby; Zhu, Jun; Johnson, Andrew D.; Ying, Saixia; Munson, Peter J.; Raghavachari, Nalini; Wang, Richard; Liu, Poching; Courchesne, Paul; Hwang, Shih-Jen; Assimes, Themistocles L.; McPherson, Ruth; Samani, Nilesh J.; Schunkert, Heribert; Meng, Qingying; Suver, Christine; O'Donnell, Christopher J.; Derry, Jonathan; Yang, Xia; Levy, Daniel

    2013-01-01

    Objective Genetic approaches have identified numerous loci associated with coronary heart disease (CHD). The molecular mechanisms underlying CHD gene-disease associations, however, remain unclear. We hypothesized that genetic variants with both strong and subtle effects drive gene subnetworks that in turn affect CHD. Approach and Results We surveyed CHD-associated molecular interactions by constructing coexpression networks using whole blood gene expression profiles from 188 CHD cases and 188 age- and sex-matched controls. 24 coexpression modules were identified including one case-specific and one control-specific differential module (DM). The DMs were enriched for genes involved in B-cell activation, immune response, and ion transport. By integrating the DMs with altered gene expression associated SNPs (eSNPs) and with results of GWAS of CHD and its risk factors, the control-specific DM was implicated as CHD-causal based on its significant enrichment for both CHD and lipid eSNPs. This causal DM was further integrated with tissue-specific Bayesian networks and protein-protein interaction networks to identify regulatory key driver (KD) genes. Multi-tissue KDs (SPIB and TNFRSF13C) and tissue-specific KDs (e.g. EBF1) were identified. Conclusions Our network-driven integrative analysis not only identified CHD-related genes, but also defined network structure that sheds light on the molecular interactions of genes associated with CHD risk. PMID:23539213

  20. A Consensus Network of Gene Regulatory Factors in the Human Frontal Lobe

    PubMed Central

    Berto, Stefano; Perdomo-Sabogal, Alvaro; Gerighausen, Daniel; Qin, Jing; Nowick, Katja

    2016-01-01

    Cognitive abilities, such as memory, learning, language, problem solving, and planning, involve the frontal lobe and other brain areas. Not much is known yet about the molecular basis of cognitive abilities, but it seems clear that cognitive abilities are determined by the interplay of many genes. One approach for analyzing the genetic networks involved in cognitive functions is to study the coexpression networks of genes with known importance for proper cognitive functions, such as genes that have been associated with cognitive disorders like intellectual disability (ID) or autism spectrum disorders (ASD). Because many of these genes are gene regulatory factors (GRFs) we aimed to provide insights into the gene regulatory networks active in the human frontal lobe. Using genome wide human frontal lobe expression data from 10 independent data sets, we first derived 10 individual coexpression networks for all GRFs including their potential target genes. We observed a high level of variability among these 10 independently derived networks, pointing out that relying on results from a single study can only provide limited biological insights. To instead focus on the most confident information from these 10 networks we developed a method for integrating such independently derived networks into a consensus network. This consensus network revealed robust GRF interactions that are conserved across the frontal lobes of different healthy human individuals. Within this network, we detected a strong central module that is enriched for 166 GRFs known to be involved in brain development and/or cognitive disorders. Interestingly, several hubs of the consensus network encode for GRFs that have not yet been associated with brain functions. Their central role in the network suggests them as excellent new candidates for playing an essential role in the regulatory network of the human frontal lobe, which should be investigated in future studies. PMID:27014338

  1. Identifying protein complex by integrating characteristic of core-attachment into dynamic PPI network.

    PubMed

    Shen, Xianjun; Yi, Li; Jiang, Xingpeng; He, Tingting; Yang, Jincai; Xie, Wei; Hu, Po; Hu, Xiaohua

    2017-01-01

    How to identify protein complex is an important and challenging task in proteomics. It would make great contribution to our knowledge of molecular mechanism in cell life activities. However, the inherent organization and dynamic characteristic of cell system have rarely been incorporated into the existing algorithms for detecting protein complexes because of the limitation of protein-protein interaction (PPI) data produced by high throughput techniques. The availability of time course gene expression profile enables us to uncover the dynamics of molecular networks and improve the detection of protein complexes. In order to achieve this goal, this paper proposes a novel algorithm DCA (Dynamic Core-Attachment). It detects protein-complex core comprising of continually expressed and highly connected proteins in dynamic PPI network, and then the protein complex is formed by including the attachments with high adhesion into the core. The integration of core-attachment feature into the dynamic PPI network is responsible for the superiority of our algorithm. DCA has been applied on two different yeast dynamic PPI networks and the experimental results show that it performs significantly better than the state-of-the-art techniques in terms of prediction accuracy, hF-measure and statistical significance in biology. In addition, the identified complexes with strong biological significance provide potential candidate complexes for biologists to validate.

  2. Gene networks specific for innate immunity define post-traumatic stress disorder.

    PubMed

    Breen, M S; Maihofer, A X; Glatt, S J; Tylee, D S; Chandler, S D; Tsuang, M T; Risbrough, V B; Baker, D G; O'Connor, D T; Nievergelt, C M; Woelk, C H

    2015-12-01

    The molecular factors involved in the development of Post-Traumatic Stress Disorder (PTSD) remain poorly understood. Previous transcriptomic studies investigating the mechanisms of PTSD apply targeted approaches to identify individual genes under a cross-sectional framework lack a holistic view of the behaviours and properties of these genes at the system-level. Here we sought to apply an unsupervised gene-network based approach to a prospective experimental design using whole-transcriptome RNA-Seq gene expression from peripheral blood leukocytes of U.S. Marines (N=188), obtained both pre- and post-deployment to conflict zones. We identified discrete groups of co-regulated genes (i.e., co-expression modules) and tested them for association to PTSD. We identified one module at both pre- and post-deployment containing putative causal signatures for PTSD development displaying an over-expression of genes enriched for functions of innate-immune response and interferon signalling (Type-I and Type-II). Importantly, these results were replicated in a second non-overlapping independent dataset of U.S. Marines (N=96), further outlining the role of innate immune and interferon signalling genes within co-expression modules to explain at least part of the causal pathophysiology for PTSD development. A second module, consequential of trauma exposure, contained PTSD resiliency signatures and an over-expression of genes involved in hemostasis and wound responsiveness suggesting that chronic levels of stress impair proper wound healing during/after exposure to the battlefield while highlighting the role of the hemostatic system as a clinical indicator of chronic-based stress. These findings provide novel insights for early preventative measures and advanced PTSD detection, which may lead to interventions that delay or perhaps abrogate the development of PTSD.

  3. Novel Loci for Metabolic Networks and Multi-Tissue Expression Studies Reveal Genes for Atherosclerosis

    PubMed Central

    Inouye, Michael; Ripatti, Samuli; Kettunen, Johannes; Lyytikäinen, Leo-Pekka; Oksala, Niku; Laurila, Pirkka-Pekka; Kangas, Antti J.; Soininen, Pasi; Savolainen, Markku J.; Viikari, Jorma; Kähönen, Mika; Perola, Markus; Salomaa, Veikko; Raitakari, Olli; Lehtimäki, Terho; Taskinen, Marja-Riitta; Järvelin, Marjo-Riitta; Ala-Korpela, Mika; Palotie, Aarno; de Bakker, Paul I. W.

    2012-01-01

    Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis. PMID:22916037

  4. On the robustness of complex heterogeneous gene expression networks.

    PubMed

    Gómez-Gardeñes, Jesús; Moreno, Yamir; Floría, Luis M

    2005-04-01

    We analyze a continuous gene expression model on the underlying topology of a complex heterogeneous network. Numerical simulations aimed at studying the chaotic and periodic dynamics of the model are performed. The results clearly indicate that there is a region in which the dynamical and structural complexity of the system avoid chaotic attractors. However, contrary to what has been reported for Random Boolean Networks, the chaotic phase cannot be completely suppressed, which has important bearings on network robustness and gene expression modeling.

  5. From gene networks to drugs: systems pharmacology approaches for AUD.

    PubMed

    Ferguson, Laura B; Harris, R Adron; Mayfield, Roy Dayne

    2018-06-01

    The alcohol research field has amassed an impressive number of gene expression datasets spanning key brain areas for addiction, species (humans as well as multiple animal models), and stages in the addiction cycle (binge/intoxication, withdrawal/negative effect, and preoccupation/anticipation). These data have improved our understanding of the molecular adaptations that eventually lead to dysregulation of brain function and the chronic, relapsing disorder of addiction. Identification of new medications to treat alcohol use disorder (AUD) will likely benefit from the integration of genetic, genomic, and behavioral information included in these important datasets. Systems pharmacology considers drug effects as the outcome of the complex network of interactions a drug has rather than a single drug-molecule interaction. Computational strategies based on this principle that integrate gene expression signatures of pharmaceuticals and disease states have shown promise for identifying treatments that ameliorate disease symptoms (called in silico gene mapping or connectivity mapping). In this review, we suggest that gene expression profiling for in silico mapping is critical to improve drug repurposing and discovery for AUD and other psychiatric illnesses. We highlight studies that successfully apply gene mapping computational approaches to identify or repurpose pharmaceutical treatments for psychiatric illnesses. Furthermore, we address important challenges that must be overcome to maximize the potential of these strategies to translate to the clinic and improve healthcare outcomes.

  6. Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores

    PubMed Central

    Gonçalves, Joana P.; Francisco, Alexandre P.; Moreau, Yves; Madeira, Sara C.

    2012-01-01

    Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to

  7. Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy

    PubMed Central

    Liu, Wei; Zhu, Wen; Liao, Bo; Chen, Xiangtao

    2016-01-01

    Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method. PMID:27829000

  8. Integrative molecular network analysis identifies emergent enzalutamide resistance mechanisms in prostate cancer

    PubMed Central

    King, Carly J.; Woodward, Josha; Schwartzman, Jacob; Coleman, Daniel J.; Lisac, Robert; Wang, Nicholas J.; Van Hook, Kathryn; Gao, Lina; Urrutia, Joshua; Dane, Mark A.; Heiser, Laura M.; Alumkal, Joshi J.

    2017-01-01

    Recent work demonstrates that castration-resistant prostate cancer (CRPC) tumors harbor countless genomic aberrations that control many hallmarks of cancer. While some specific mutations in CRPC may be actionable, many others are not. We hypothesized that genomic aberrations in cancer may operate in concert to promote drug resistance and tumor progression, and that organization of these genomic aberrations into therapeutically targetable pathways may improve our ability to treat CRPC. To identify the molecular underpinnings of enzalutamide-resistant CRPC, we performed transcriptional and copy number profiling studies using paired enzalutamide-sensitive and resistant LNCaP prostate cancer cell lines. Gene networks associated with enzalutamide resistance were revealed by performing an integrative genomic analysis with the PAthway Representation and Analysis by Direct Reference on Graphical Models (PARADIGM) tool. Amongst the pathways enriched in the enzalutamide-resistant cells were those associated with MEK, EGFR, RAS, and NFKB. Functional validation studies of 64 genes identified 10 candidate genes whose suppression led to greater effects on cell viability in enzalutamide-resistant cells as compared to sensitive parental cells. Examination of a patient cohort demonstrated that several of our functionally-validated gene hits are deregulated in metastatic CRPC tumor samples, suggesting that they may be clinically relevant therapeutic targets for patients with enzalutamide-resistant CRPC. Altogether, our approach demonstrates the potential of integrative genomic analyses to clarify determinants of drug resistance and rational co-targeting strategies to overcome resistance. PMID:29340039

  9. Molecular characterization and analysis of the acrB gene of Aspergillus nidulans: a gene identified by genetic interaction as a component of the regulatory network that includes the CreB deubiquitination enzyme.

    PubMed Central

    Boase, Natasha A; Lockington, Robin A; Adams, Julian R J; Rodbourn, Louise; Kelly, Joan M

    2003-01-01

    Mutations in the acrB gene, which were originally selected through their resistance to acriflavine, also result in reduced growth on a range of sole carbon sources, including fructose, cellobiose, raffinose, and starch, and reduced utilization of omega-amino acids, including GABA and beta-alanine, as sole carbon and nitrogen sources. The acrB2 mutation suppresses the phenotypic effects of mutations in the creB gene that encodes a regulatory deubiquitinating enzyme, and in the creC gene that encodes a WD40-repeat-containing protein. Thus AcrB interacts with a regulatory network controlling carbon source utilization that involves ubiquitination and deubiquitination. The acrB gene was cloned and physically analyzed, and it encodes a novel protein that contains three putative transmembrane domains and a coiled-coil region. AcrB may play a role in the ubiquitination aspect of this regulatory network. PMID:12750323

  10. A novel gene network inference algorithm using predictive minimum description length approach.

    PubMed

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    PMDL principle is effective in determining the MI threshold and the developed algorithm improves precision of gene regulatory network inference. Based on the sensitivity analysis of all tested cases, an optimal CMI threshold value has been identified. Finally it was observed that the performance of the algorithms saturates at a certain threshold of data size.

  11. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach

    PubMed Central

    2014-01-01

    Background Accurate estimation of parameters of biochemical models is required to characterize the dynamics of molecular processes. This problem is intimately linked to identifying the most informative experiments for accomplishing such tasks. While significant progress has been made, effective experimental strategies for parameter identification and for distinguishing among alternative network topologies remain unclear. We approached these questions in an unbiased manner using a unique community-based approach in the context of the DREAM initiative (Dialogue for Reverse Engineering Assessment of Methods). We created an in silico test framework under which participants could probe a network with hidden parameters by requesting a range of experimental assays; results of these experiments were simulated according to a model of network dynamics only partially revealed to participants. Results We proposed two challenges; in the first, participants were given the topology and underlying biochemical structure of a 9-gene regulatory network and were asked to determine its parameter values. In the second challenge, participants were given an incomplete topology with 11 genes and asked to find three missing links in the model. In both challenges, a budget was provided to buy experimental data generated in silico with the model and mimicking the features of different common experimental techniques, such as microarrays and fluorescence microscopy. Data could be bought at any stage, allowing participants to implement an iterative loop of experiments and computation. Conclusions A total of 19 teams participated in this competition. The results suggest that the combination of state-of-the-art parameter estimation and a varied set of experimental methods using a few datasets, mostly fluorescence imaging data, can accurately determine parameters of biochemical models of gene regulation. However, the task is considerably more difficult if the gene network topology is not completely

  12. Genome co-amplification upregulates a mitotic gene network activity that predicts outcome and response to mitotic protein inhibitors in breast cancer

    DOE PAGES

    Hu, Zhi; Mao, Jian-Hua; Curtis, Christina; ...

    2016-07-01

    Background: High mitotic activity is associated with the genesis and progression of many cancers. Small molecule inhibitors of mitotic apparatus proteins are now being developed and evaluated clinically as anticancer agents. With clinical trials of several of these experimental compounds underway, it is important to understand the molecular mechanisms that determine high mitotic activity, identify tumor subtypes that carry molecular aberrations that confer high mitotic activity, and to develop molecular markers that distinguish which tumors will be most responsive to mitotic apparatus inhibitors. Methods: We identified a coordinately regulated mitotic apparatus network by analyzing gene expression profiles for 53 malignantmore » and non-malignant human breast cancer cell lines and two separate primary breast tumor datasets. We defined the mitotic network activity index (MNAI) as the sum of the transcriptional levels of the 54 coordinately regulated mitotic apparatus genes. The effect of those genes on cell growth was evaluated by small interfering RNA (siRNA). Results: High MNAI was enriched in basal-like breast tumors and was associated with reduced survival duration and preferential sensitivity to i nhibitors of the mitotic apparatus proteins, polo-like kinase, centromere associated protein E and aurora kinase designated GSK462364, GSK923295 and GSK1070916, respectively. Co-amplification of regions of chromosomes 8q24, 10p15-p12, 12p13, and 17q24-q25 was associated with the transcriptional upregulation of this network of 54 mitotic apparatus genes, and we identify transcription factors that localize to these regions and putatively regulate mitotic activity. Knockdown of the mitotic network by siRNA identified 22 genes that might be considered as additional therapeutic targets for this clinically relevant patient subgroup. Conclusions: We define a molecular signature which may guide therapeutic approaches for tumors with high mitotic network activity.« less

  13. Genome co-amplification upregulates a mitotic gene network activity that predicts outcome and response to mitotic protein inhibitors in breast cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, Zhi; Mao, Jian-Hua; Curtis, Christina

    Background: High mitotic activity is associated with the genesis and progression of many cancers. Small molecule inhibitors of mitotic apparatus proteins are now being developed and evaluated clinically as anticancer agents. With clinical trials of several of these experimental compounds underway, it is important to understand the molecular mechanisms that determine high mitotic activity, identify tumor subtypes that carry molecular aberrations that confer high mitotic activity, and to develop molecular markers that distinguish which tumors will be most responsive to mitotic apparatus inhibitors. Methods: We identified a coordinately regulated mitotic apparatus network by analyzing gene expression profiles for 53 malignantmore » and non-malignant human breast cancer cell lines and two separate primary breast tumor datasets. We defined the mitotic network activity index (MNAI) as the sum of the transcriptional levels of the 54 coordinately regulated mitotic apparatus genes. The effect of those genes on cell growth was evaluated by small interfering RNA (siRNA). Results: High MNAI was enriched in basal-like breast tumors and was associated with reduced survival duration and preferential sensitivity to i nhibitors of the mitotic apparatus proteins, polo-like kinase, centromere associated protein E and aurora kinase designated GSK462364, GSK923295 and GSK1070916, respectively. Co-amplification of regions of chromosomes 8q24, 10p15-p12, 12p13, and 17q24-q25 was associated with the transcriptional upregulation of this network of 54 mitotic apparatus genes, and we identify transcription factors that localize to these regions and putatively regulate mitotic activity. Knockdown of the mitotic network by siRNA identified 22 genes that might be considered as additional therapeutic targets for this clinically relevant patient subgroup. Conclusions: We define a molecular signature which may guide therapeutic approaches for tumors with high mitotic network activity.« less

  14. Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases

    PubMed Central

    Ritchie, Marylyn D; White, Bill C; Parker, Joel S; Hahn, Lance W; Moore, Jason H

    2003-01-01

    Background Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. Results Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. Conclusion This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases. PMID:12846935

  15. Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease

    DOE PAGES

    Chatterjee, Sumantra; Kapoor, Ashish; Akiyama, Jennifer A.; ...

    2016-09-29

    Common sequence variants in cis-regulatory elements (CREs) are suspected etiological causes of complex disorders. We previously identified an intronic enhancer variant in the RET gene disrupting SOX10 binding and increasing Hirschsprung disease (HSCR) risk 4-fold. We now show that two other functionally independent CRE variants, one binding Gata2 and the other binding Rarb, also reduce Ret expression and increase risk 2- and 1.7-fold. By studying human and mouse fetal gut tissues and cell lines, we demonstrate that reduced RET expression propagates throughout its gene regulatory network, exerting effects on both its positive and negative feedback components. We also provide evidencemore » that the presence of a combination of CRE variants synergistically reduces RET expression and its effects throughout the GRN. These studies show how the effects of functionally independent non-coding variants in a coordinated gene regulatory network amplify their individually small effects, providing a model for complex disorders.« less

  16. Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chatterjee, Sumantra; Kapoor, Ashish; Akiyama, Jennifer A.

    Common sequence variants in cis-regulatory elements (CREs) are suspected etiological causes of complex disorders. We previously identified an intronic enhancer variant in the RET gene disrupting SOX10 binding and increasing Hirschsprung disease (HSCR) risk 4-fold. We now show that two other functionally independent CRE variants, one binding Gata2 and the other binding Rarb, also reduce Ret expression and increase risk 2- and 1.7-fold. By studying human and mouse fetal gut tissues and cell lines, we demonstrate that reduced RET expression propagates throughout its gene regulatory network, exerting effects on both its positive and negative feedback components. We also provide evidencemore » that the presence of a combination of CRE variants synergistically reduces RET expression and its effects throughout the GRN. These studies show how the effects of functionally independent non-coding variants in a coordinated gene regulatory network amplify their individually small effects, providing a model for complex disorders.« less

  17. PINTA: a web server for network-based gene prioritization from expression data

    PubMed Central

    Nitsch, Daniela; Tranchevent, Léon-Charles; Gonçalves, Joana P.; Vogt, Josef Korbinian; Madeira, Sara C.; Moreau, Yves

    2011-01-01

    PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user. PMID:21602267

  18. Co-acting gene networks predict TRAIL responsiveness of tumour cells with high accuracy.

    PubMed

    O'Reilly, Paul; Ortutay, Csaba; Gernon, Grainne; O'Connell, Enda; Seoighe, Cathal; Boyce, Susan; Serrano, Luis; Szegezdi, Eva

    2014-12-19

    Identification of differentially expressed genes from transcriptomic studies is one of the most common mechanisms to identify tumor biomarkers. This approach however is not well suited to identify interaction between genes whose protein products potentially influence each other, which limits its power to identify molecular wiring of tumour cells dictating response to a drug. Due to the fact that signal transduction pathways are not linear and highly interlinked, the biological response they drive may be better described by the relative amount of their components and their functional relationships than by their individual, absolute expression. Gene expression microarray data for 109 tumor cell lines with known sensitivity to the death ligand cytokine tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) was used to identify genes with potential functional relationships determining responsiveness to TRAIL-induced apoptosis. The machine learning technique Random Forest in the statistical environment "R" with backward elimination was used to identify the key predictors of TRAIL sensitivity and differentially expressed genes were identified using the software GeneSpring. Gene co-regulation and statistical interaction was assessed with q-order partial correlation analysis and non-rejection rate. Biological (functional) interactions amongst the co-acting genes were studied with Ingenuity network analysis. Prediction accuracy was assessed by calculating the area under the receiver operator curve using an independent dataset. We show that the gene panel identified could predict TRAIL-sensitivity with a very high degree of sensitivity and specificity (AUC=0·84). The genes in the panel are co-regulated and at least 40% of them functionally interact in signal transduction pathways that regulate cell death and cell survival, cellular differentiation and morphogenesis. Importantly, only 12% of the TRAIL-predictor genes were differentially expressed highlighting the

  19. Local and global responses in complex gene regulation networks

    NASA Astrophysics Data System (ADS)

    Tsuchiya, Masa; Selvarajoo, Kumar; Piras, Vincent; Tomita, Masaru; Giuliani, Alessandro

    2009-04-01

    An exacerbated sensitivity to apparently minor stimuli and a general resilience of the entire system stay together side-by-side in biological systems. This apparent paradox can be explained by the consideration of biological systems as very strongly interconnected network systems. Some nodes of these networks, thanks to their peculiar location in the network architecture, are responsible for the sensitivity aspects, while the large degree of interconnection is at the basis of the resilience properties of the system. One relevant feature of the high degree of connectivity of gene regulation networks is the emergence of collective ordered phenomena influencing the entire genome and not only a specific portion of transcripts. The great majority of existing gene regulation models give the impression of purely local ‘hard-wired’ mechanisms disregarding the emergence of global ordered behavior encompassing thousands of genes while the general, genome wide, aspects are less known. Here we address, on a data analysis perspective, the discrimination between local and global scale regulations, this goal was achieved by means of the examination of two biological systems: innate immune response in macrophages and oscillating growth dynamics in yeast. Our aim was to reconcile the ‘hard-wired’ local view of gene regulation with a global continuous and scalable one borrowed from statistical physics. This reconciliation is based on the network paradigm in which the local ‘hard-wired’ activities correspond to the activation of specific crucial nodes in the regulation network, while the scalable continuous responses can be equated to the collective oscillations of the network after a perturbation.

  20. MicroRNA-Target Network Inference and Local Network Enrichment Analysis Identify Two microRNA Clusters with Distinct Functions in Head and Neck Squamous Cell Carcinoma

    PubMed Central

    Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S.; Theis, Fabian J.

    2015-01-01

    MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method “miRlastic”, which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of mi

  1. MicroRNA-Target Network Inference and Local Network Enrichment Analysis Identify Two microRNA Clusters with Distinct Functions in Head and Neck Squamous Cell Carcinoma.

    PubMed

    Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S; Theis, Fabian J

    2015-12-18

    MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method "miRlastic", which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that

  2. Gene regulatory network inference using fused LASSO on multiple data sets

    PubMed Central

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-01-01

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687

  3. The Genome-Wide Interaction Network of Nutrient Stress Genes in Escherichia coli.

    PubMed

    Côté, Jean-Philippe; French, Shawn; Gehrke, Sebastian S; MacNair, Craig R; Mangat, Chand S; Bharat, Amrita; Brown, Eric D

    2016-11-22

    Conventional efforts to describe essential genes in bacteria have typically emphasized nutrient-rich growth conditions. Of note, however, are the set of genes that become essential when bacteria are grown under nutrient stress. For example, more than 100 genes become indispensable when the model bacterium Escherichia coli is grown on nutrient-limited media, and many of these nutrient stress genes have also been shown to be important for the growth of various bacterial pathogens in vivo To better understand the genetic network that underpins nutrient stress in E. coli, we performed a genome-scale cross of strains harboring deletions in some 82 nutrient stress genes with the entire E. coli gene deletion collection (Keio) to create 315,400 double deletion mutants. An analysis of the growth of the resulting strains on rich microbiological media revealed an average of 23 synthetic sick or lethal genetic interactions for each nutrient stress gene, suggesting that the network defining nutrient stress is surprisingly complex. A vast majority of these interactions involved genes of unknown function or genes of unrelated pathways. The most profound synthetic lethal interactions were between nutrient acquisition and biosynthesis. Further, the interaction map reveals remarkable metabolic robustness in E. coli through pathway redundancies. In all, the genetic interaction network provides a powerful tool to mine and identify missing links in nutrient synthesis and to further characterize genes of unknown function in E. coli Moreover, understanding of bacterial growth under nutrient stress could aid in the development of novel antibiotic discovery platforms. With the rise of antibiotic drug resistance, there is an urgent need for new antibacterial drugs. Here, we studied a group of genes that are essential for the growth of Escherichia coli under nutrient limitation, culture conditions that arguably better represent nutrient availability during an infection than rich

  4. Identifying driving gene clusters in complex diseases through critical transition theory

    NASA Astrophysics Data System (ADS)

    Wolanyk, Nathaniel; Wang, Xujing; Hessner, Martin; Gao, Shouguo; Chen, Ye; Jia, Shuang

    A novel approach of looking at the human body using critical transition theory has yielded positive results: clusters of genes that act in tandem to drive complex disease progression. This cluster of genes can be thought of as the first part of a large genetic force that pushes the body from a curable, but sick, point to an incurable diseased point through a catastrophic bifurcation. The data analyzed is time course microarray blood assay data of 7 high risk individuals for Type 1 Diabetes who progressed into a clinical onset, with an additional larger study requested to be presented at the conference. The normalized data is 25,000 genes strong, which were narrowed down based on statistical metrics, and finally a machine learning algorithm using critical transition metrics found the driving network. This approach was created to be repeatable across multiple complex diseases with only progression time course data needed so that it would be applicable to identifying when an individual is at risk of developing a complex disease. Thusly, preventative measures can be enacted, and in the longer term, offers a possible solution to prevent all Type 1 Diabetes.

  5. Reverse engineering the gap gene network of Drosophila melanogaster.

    PubMed

    Perkins, Theodore J; Jaeger, Johannes; Reinitz, John; Glass, Leon

    2006-05-01

    A fundamental problem in functional genomics is to determine the structure and dynamics of genetic networks based on expression data. We describe a new strategy for solving this problem and apply it to recently published data on early Drosophila melanogaster development. Our method is orders of magnitude faster than current fitting methods and allows us to fit different types of rules for expressing regulatory relationships. Specifically, we use our approach to fit models using a smooth nonlinear formalism for modeling gene regulation (gene circuits) as well as models using logical rules based on activation and repression thresholds for transcription factors. Our technique also allows us to infer regulatory relationships de novo or to test network structures suggested by the literature. We fit a series of models to test several outstanding questions about gap gene regulation, including regulation of and by hunchback and the role of autoactivation. Based on our modeling results and validation against the experimental literature, we propose a revised network structure for the gap gene system. Interestingly, some relationships in standard textbook models of gap gene regulation appear to be unnecessary for or even inconsistent with the details of gap gene expression during wild-type development.

  6. Gap Gene Regulatory Dynamics Evolve along a Genotype Network

    PubMed Central

    Crombach, Anton; Wotton, Karl R.; Jiménez-Guri, Eva; Jaeger, Johannes

    2016-01-01

    Developmental gene networks implement the dynamic regulatory mechanisms that pattern and shape the organism. Over evolutionary time, the wiring of these networks changes, yet the patterning outcome is often preserved, a phenomenon known as “system drift.” System drift is illustrated by the gap gene network—involved in segmental patterning—in dipteran insects. In the classic model organism Drosophila melanogaster and the nonmodel scuttle fly Megaselia abdita, early activation and placement of gap gene expression domains show significant quantitative differences, yet the final patterning output of the system is essentially identical in both species. In this detailed modeling analysis of system drift, we use gene circuits which are fit to quantitative gap gene expression data in M. abdita and compare them with an equivalent set of models from D. melanogaster. The results of this comparative analysis show precisely how compensatory regulatory mechanisms achieve equivalent final patterns in both species. We discuss the larger implications of the work in terms of “genotype networks” and the ways in which the structure of regulatory networks can influence patterns of evolutionary change (evolvability). PMID:26796549

  7. Modularity and evolutionary constraints in a baculovirus gene regulatory network

    PubMed Central

    2013-01-01

    Background The structure of regulatory networks remains an open question in our understanding of complex biological systems. Interactions during complete viral life cycles present unique opportunities to understand how host-parasite network take shape and behave. The Anticarsia gemmatalis multiple nucleopolyhedrovirus (AgMNPV) is a large double-stranded DNA virus, whose genome may encode for 152 open reading frames (ORFs). Here we present the analysis of the ordered cascade of the AgMNPV gene expression. Results We observed an earlier onset of the expression than previously reported for other baculoviruses, especially for genes involved in DNA replication. Most ORFs were expressed at higher levels in a more permissive host cell line. Genes with more than one copy in the genome had distinct expression profiles, which could indicate the acquisition of new functionalities. The transcription gene regulatory network (GRN) for 149 ORFs had a modular topology comprising five communities of highly interconnected nodes that separated key genes that are functionally related on different communities, possibly maximizing redundancy and GRN robustness by compartmentalization of important functions. Core conserved functions showed expression synchronicity, distinct GRN features and significantly less genetic diversity, consistent with evolutionary constraints imposed in key elements of biological systems. This reduced genetic diversity also had a positive correlation with the importance of the gene in our estimated GRN, supporting a relationship between phylogenetic data of baculovirus genes and network features inferred from expression data. We also observed that gene arrangement in overlapping transcripts was conserved among related baculoviruses, suggesting a principle of genome organization. Conclusions Albeit with a reduced number of nodes (149), the AgMNPV GRN had a topology and key characteristics similar to those observed in complex cellular organisms, which indicates

  8. Genetic regulation of gene expression in the lung identifies CST3 and CD22 as potential causal genes for airflow obstruction.

    PubMed

    Lamontagne, Maxime; Timens, Wim; Hao, Ke; Bossé, Yohan; Laviolette, Michel; Steiling, Katrina; Campbell, Joshua D; Couture, Christian; Conti, Massimo; Sherwood, Karen; Hogg, James C; Brandsma, Corry-Anke; van den Berge, Maarten; Sandford, Andrew; Lam, Stephen; Lenburg, Marc E; Spira, Avrum; Paré, Peter D; Nickle, David; Sin, Don D; Postma, Dirkje S

    2014-11-01

    COPD is a complex chronic disease with poorly understood pathogenesis. Integrative genomic approaches have the potential to elucidate the biological networks underlying COPD and lung function. We recently combined genome-wide genotyping and gene expression in 1111 human lung specimens to map expression quantitative trait loci (eQTL). To determine causal associations between COPD and lung function-associated single nucleotide polymorphisms (SNPs) and lung tissue gene expression changes in our lung eQTL dataset. We evaluated causality between SNPs and gene expression for three COPD phenotypes: FEV(1)% predicted, FEV(1)/FVC and COPD as a categorical variable. Different models were assessed in the three cohorts independently and in a meta-analysis. SNPs associated with a COPD phenotype and gene expression were subjected to causal pathway modelling and manual curation. In silico analyses evaluated functional enrichment of biological pathways among newly identified causal genes. Biologically relevant causal genes were validated in two separate gene expression datasets of lung tissues and bronchial airway brushings. High reliability causal relations were found in SNP-mRNA-phenotype triplets for FEV(1)% predicted (n=169) and FEV(1)/FVC (n=80). Several genes of potential biological relevance for COPD were revealed. eQTL-SNPs upregulating cystatin C (CST3) and CD22 were associated with worse lung function. Signalling pathways enriched with causal genes included xenobiotic metabolism, apoptosis, protease-antiprotease and oxidant-antioxidant balance. By using integrative genomics and analysing the relationships of COPD phenotypes with SNPs and gene expression in lung tissue, we identified CST3 and CD22 as potential causal genes for airflow obstruction. This study also augmented the understanding of previously described COPD pathways. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  9. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    PubMed

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

  10. NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

    PubMed Central

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available

  11. fabp4 is central to eight obesity associated genes: a functional gene network-based polymorphic study.

    PubMed

    Bag, Susmita; Ramaiah, Sudha; Anbarasu, Anand

    2015-01-07

    Network study on genes and proteins offers functional basics of the complexity of gene and protein, and its interacting partners. The gene fatty acid-binding protein 4 (fabp4) is found to be highly expressed in adipose tissue, and is one of the most abundant proteins in mature adipocytes. Our investigations on functional modules of fabp4 provide useful information on the functional genes interacting with fabp4, their biochemical properties and their regulatory functions. The present study shows that there are eight set of candidate genes: acp1, ext2, insr, lipe, ostf1, sncg, usp15, and vim that are strongly and functionally linked up with fabp4. Gene ontological analysis of network modules of fabp4 provides an explicit idea on the functional aspect of fabp4 and its interacting nodes. The hierarchal mapping on gene ontology indicates gene specific processes and functions as well as their compartmentalization in tissues. The fabp4 along with its interacting genes are involved in lipid metabolic activity and are integrated in multi-cellular processes of tissues and organs. They also have important protein/enzyme binding activity. Our study elucidated disease-associated nsSNP prediction for fabp4 and it is interesting to note that there are four rsID׳s (rs1051231, rs3204631, rs140925685 and rs141169989) with disease allelic variation (T104P, T126P, G27D and G90V respectively). On the whole, our gene network analysis presents a clear insight about the interactions and functions associated with fabp4 gene network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Informed walks: whispering hints to gene hunters inside networks' jungle.

    PubMed

    Bourdakou, Marilena M; Spyrou, George M

    2017-10-11

    Systemic approaches offer a different point of view on the analysis of several types of molecular associations as well as on the identification of specific gene communities in several cancer types. However, due to lack of sufficient data needed to construct networks based on experimental evidence, statistical gene co-expression networks are widely used instead. Many efforts have been made to exploit the information hidden in these networks. However, these approaches still need to capitalize comprehensively the prior knowledge encrypted into molecular pathway associations and improve their efficiency regarding the discovery of both exclusive subnetworks as candidate biomarkers and conserved subnetworks that may uncover common origins of several cancer types. In this study we present the development of the Informed Walks model based on random walks that incorporate information from molecular pathways to mine candidate genes and gene-gene links. The proposed model has been applied to TCGA (The Cancer Genome Atlas) datasets from seven different cancer types, exploring the reconstructed co-expression networks of the whole set of genes and driving to highlighted sub-networks for each cancer type. In the sequel, we elucidated the impact of each subnetwork on the indication of underlying exclusive and common molecular mechanisms as well as on the short-listing of drugs that have the potential to suppress the corresponding cancer type through a drug-repurposing pipeline. We have developed a method of gene subnetwork highlighting based on prior knowledge, capable to give fruitful insights regarding the underlying molecular mechanisms and valuable input to drug-repurposing pipelines for a variety of cancer types.

  13. MINER: exploratory analysis of gene interaction networks by machine learning from expression data.

    PubMed

    Kadupitige, Sidath Randeni; Leung, Kin Chun; Sellmeier, Julia; Sivieng, Jane; Catchpoole, Daniel R; Bain, Michael E; Gaëta, Bruno A

    2009-12-03

    The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. We have developed MINER (Microarray Interactive Network Exploration and Representation), an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.

  14. Stability and structural properties of gene regulation networks with coregulation rules.

    PubMed

    Warrell, Jonathan; Mhlanga, Musa

    2017-05-07

    Coregulation of the expression of groups of genes has been extensively demonstrated empirically in bacterial and eukaryotic systems. Such coregulation can arise through the use of shared regulatory motifs, which allow the coordinated expression of modules (and module groups) of functionally related genes across the genome. Coregulation can also arise through the physical association of multi-gene complexes through chromosomal looping, which are then transcribed together. We present a general formalism for modeling coregulation rules in the framework of Random Boolean Networks (RBN), and develop specific models for transcription factor networks with modular structure (including module groups, and multi-input modules (MIM) with autoregulation) and multi-gene complexes (including hierarchical differentiation between multi-gene complex members). We develop a mean-field approach to analyse the dynamical stability of large networks incorporating coregulation, and show that autoregulated MIM and hierarchical gene-complex models can achieve greater stability than networks without coregulation whose rules have matching activation frequency. We provide further analysis of the stability of small networks of both kinds through simulations. We also characterize several general properties of the transients and attractors in the hierarchical coregulation model, and show using simulations that the steady-state distribution factorizes hierarchically as a Bayesian network in a Markov Jump Process analogue of the RBN model. Copyright © 2017. Published by Elsevier Ltd.

  15. Statistical mechanics of scale-free gene expression networks

    NASA Astrophysics Data System (ADS)

    Gross, Eitan

    2012-12-01

    The gene co-expression networks of many organisms including bacteria, mice and man exhibit scale-free distribution. This heterogeneous distribution of connections decreases the vulnerability of the network to random attacks and thus may confer the genetic replication machinery an intrinsic resilience to such attacks, triggered by changing environmental conditions that the organism may be subject to during evolution. This resilience to random attacks comes at an energetic cost, however, reflected by the lower entropy of the scale-free distribution compared to the more homogenous, random network. In this study we found that the cell cycle-regulated gene expression pattern of the yeast Saccharomyces cerevisiae obeys a power-law distribution with an exponent α = 2.1 and an entropy of 1.58. The latter is very close to the maximal value of 1.65 obtained from linear optimization of the entropy function under the constraint of a constant cost function, determined by the average degree connectivity . We further show that the yeast's gene expression network can achieve scale-free distribution in a process that does not involve growth but rather via re-wiring of the connections between nodes of an ordered network. Our results support the idea of an evolutionary selection, which acts at the level of the protein sequence, and is compatible with the notion of greater biological importance of highly connected nodes in the protein interaction network. Our constrained re-wiring model provides a theoretical framework for a putative thermodynamically driven evolutionary selection process.

  16. Diametrical clustering for identifying anti-correlated gene clusters.

    PubMed

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  17. Annotation of gene function in citrus using gene expression information and co-expression networks

    PubMed Central

    2014-01-01

    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks

  18. Variable neighborhood search for reverse engineering of gene regulatory networks.

    PubMed

    Nicholson, Charles; Goodwin, Leslie; Clark, Corey

    2017-01-01

    A new search heuristic, Divided Neighborhood Exploration Search, designed to be used with inference algorithms such as Bayesian networks to improve on the reverse engineering of gene regulatory networks is presented. The approach systematically moves through the search space to find topologies representative of gene regulatory networks that are more likely to explain microarray data. In empirical testing it is demonstrated that the novel method is superior to the widely employed greedy search techniques in both the quality of the inferred networks and computational time. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Identifying emerging research collaborations and networks: method development.

    PubMed

    Dozier, Ann M; Martina, Camille A; O'Dell, Nicole L; Fogg, Thomas T; Lurie, Stephen J; Rubinstein, Eric P; Pearson, Thomas A

    2014-03-01

    Clinical and translational research is a multidisciplinary, collaborative team process. To evaluate this process, we developed a method to document emerging research networks and collaborations in our medical center to describe their productivity and viability over time. Using an e-mail survey, sent to 1,620 clinical and basic science full- and part-time faculty members, respondents identified their research collaborators. Initial analyses, using Pajek software, assessed the feasibility of using social network analysis (SNA) methods with these data. Nearly 400 respondents identified 1,594 collaborators across 28 medical center departments resulting in 309 networks with 5 or more collaborators. This low-burden approach yielded a rich data set useful for evaluation using SNA to: (a) assess networks at several levels of the organization, including intrapersonal (individuals), interpersonal (social), organizational/institutional leadership (tenure and promotion), and physical/environmental (spatial proximity) and (b) link with other data to assess the evolution of these networks.

  20. Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers

    PubMed Central

    2013-01-01

    Background Many large-scale studies analyzed high-throughput genomic data to identify altered pathways essential to the development and progression of specific types of cancer. However, no previous study has been extended to provide a comprehensive analysis of pathways disrupted by copy number alterations across different human cancers. Towards this goal, we propose a network-based method to integrate copy number alteration data with human protein-protein interaction networks and pathway databases to identify pathways that are commonly disrupted in many different types of cancer. Results We applied our approach to a data set of 2,172 cancer patients across 16 different types of cancers, and discovered a set of commonly disrupted pathways, which are likely essential for tumor formation in majority of the cancers. We also identified pathways that are only disrupted in specific cancer types, providing molecular markers for different human cancers. Analysis with independent microarray gene expression datasets confirms that the commonly disrupted pathways can be used to identify patient subgroups with significantly different survival outcomes. We also provide a network view of disrupted pathways to explain how copy number alterations affect pathways that regulate cell growth, cycle, and differentiation for tumorigenesis. Conclusions In this work, we demonstrated that the network-based integrative analysis can help to identify pathways disrupted by copy number alterations across 16 types of human cancers, which are not readily identifiable by conventional overrepresentation-based and other pathway-based methods. All the results and source code are available at http://compbio.cs.umn.edu/NetPathID/. PMID:23822816

  1. Initial deployment of the cardiogenic gene regulatory network in the basal chordate, Ciona intestinalis.

    PubMed

    Woznica, Arielle; Haeussler, Maximilian; Starobinska, Ella; Jemmett, Jessica; Li, Younan; Mount, David; Davidson, Brad

    2012-08-01

    The complex, partially redundant gene regulatory architecture underlying vertebrate heart formation has been difficult to characterize. Here, we dissect the primary cardiac gene regulatory network in the invertebrate chordate, Ciona intestinalis. The Ciona heart progenitor lineage is first specified by Fibroblast Growth Factor/Map Kinase (FGF/MapK) activation of the transcription factor Ets1/2 (Ets). Through microarray analysis of sorted heart progenitor cells, we identified the complete set of primary genes upregulated by FGF/Ets shortly after heart progenitor emergence. Combinatorial sequence analysis of these co-regulated genes generated a hypothetical regulatory code consisting of Ets binding sites associated with a specific co-motif, ATTA. Through extensive reporter analysis, we confirmed the functional importance of the ATTA co-motif in primary heart progenitor gene regulation. We then used the Ets/ATTA combination motif to successfully predict a number of additional heart progenitor gene regulatory elements, including an intronic element driving expression of the core conserved cardiac transcription factor, GATAa. This work significantly advances our understanding of the Ciona heart gene network. Furthermore, this work has begun to elucidate the precise regulatory architecture underlying the conserved, primary role of FGF/Ets in chordate heart lineage specification. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data.

    PubMed

    Chen, Shuonan; Mar, Jessica C

    2018-06-19

    A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less

  3. Mutated Genes in Schizophrenia Map to Brain Networks

    MedlinePlus

    ... Research Matters August 12, 2013 Mutated Genes in Schizophrenia Map to Brain Networks Schizophrenia networks in the prefrontal cortex area of the ... University of Washington Researchers found that people with schizophrenia have a high number of spontaneous mutations in ...

  4. Arabidopsis ensemble reverse-engineered gene regulatory network discloses interconnected transcription factors in oxidative stress.

    PubMed

    Vermeirssen, Vanessa; De Clercq, Inge; Van Parys, Thomas; Van Breusegem, Frank; Van de Peer, Yves

    2014-12-01

    The abiotic stress response in plants is complex and tightly controlled by gene regulation. We present an abiotic stress gene regulatory network of 200,014 interactions for 11,938 target genes by integrating four complementary reverse-engineering solutions through average rank aggregation on an Arabidopsis thaliana microarray expression compendium. This ensemble performed the most robustly in benchmarking and greatly expands upon the availability of interactions currently reported. Besides recovering 1182 known regulatory interactions, cis-regulatory motifs and coherent functionalities of target genes corresponded with the predicted transcription factors. We provide a valuable resource of 572 abiotic stress modules of coregulated genes with functional and regulatory information, from which we deduced functional relationships for 1966 uncharacterized genes and many regulators. Using gain- and loss-of-function mutants of seven transcription factors grown under control and salt stress conditions, we experimentally validated 141 out of 271 predictions (52% precision) for 102 selected genes and mapped 148 additional transcription factor-gene regulatory interactions (49% recall). We identified an intricate core oxidative stress regulatory network where NAC13, NAC053, ERF6, WRKY6, and NAC032 transcription factors interconnect and function in detoxification. Our work shows that ensemble reverse-engineering can generate robust biological hypotheses of gene regulation in a multicellular eukaryote that can be tested by medium-throughput experimental validation. © 2014 American Society of Plant Biologists. All rights reserved.

  5. Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling

    PubMed Central

    Li, Xia; Rao, Shaoqi; Jiang, Wei; Li, Chuanxing; Xiao, Yun; Guo, Zheng; Zhang, Qingpu; Wang, Lihong; Du, Lei; Li, Jing; Li, Li; Zhang, Tianwen; Wang, Qing K

    2006-01-01

    Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network) to address the underlying regulations of genes that can span any unit(s) of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex gene regulations related to

  6. Construction of Gene Regulatory Networks Using Recurrent Neural Networks and Swarm Intelligence.

    PubMed

    Khan, Abhinandan; Mandal, Sudip; Pal, Rajat Kumar; Saha, Goutam

    2016-01-01

    We have proposed a methodology for the reverse engineering of biologically plausible gene regulatory networks from temporal genetic expression data. We have used established information and the fundamental mathematical theory for this purpose. We have employed the Recurrent Neural Network formalism to extract the underlying dynamics present in the time series expression data accurately. We have introduced a new hybrid swarm intelligence framework for the accurate training of the model parameters. The proposed methodology has been first applied to a small artificial network, and the results obtained suggest that it can produce the best results available in the contemporary literature, to the best of our knowledge. Subsequently, we have implemented our proposed framework on experimental (in vivo) datasets. Finally, we have investigated two medium sized genetic networks (in silico) extracted from GeneNetWeaver, to understand how the proposed algorithm scales up with network size. Additionally, we have implemented our proposed algorithm with half the number of time points. The results indicate that a reduction of 50% in the number of time points does not have an effect on the accuracy of the proposed methodology significantly, with a maximum of just over 15% deterioration in the worst case.

  7. Mouse Social Network Dynamics and Community Structure are Associated with Plasticity-Related Brain Gene Expression

    PubMed Central

    Williamson, Cait M.; Franks, Becca; Curley, James P.

    2016-01-01

    Laboratory studies of social behavior have typically focused on dyadic interactions occurring within a limited spatiotemporal context. However, this strategy prevents analyses of the dynamics of group social behavior and constrains identification of the biological pathways mediating individual differences in behavior. In the current study, we aimed to identify the spatiotemporal dynamics and hierarchical organization of a large social network of male mice. We also sought to determine if standard assays of social and exploratory behavior are predictive of social behavior in this social network and whether individual network position was associated with the mRNA expression of two plasticity-related genes, DNA methyltransferase 1 and 3a. Mice were observed to form a hierarchically organized social network and self-organized into two separate social network communities. Members of both communities exhibited distinct patterns of socio-spatial organization within the vivaria that was not limited to only agonistic interactions. We further established that exploratory and social behaviors in standard behavioral assays conducted prior to placing the mice into the large group was predictive of initial network position and behavior but were not associated with final social network position. Finally, we determined that social network position is associated with variation in mRNA levels of two neural plasticity genes, DNMT1 and DNMT3a, in the hippocampus but not the mPOA. This work demonstrates the importance of understanding the role of social context and complex social dynamics in determining the relationship between individual differences in social behavior and brain gene expression. PMID:27540359

  8. Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity.

    PubMed

    Zhang, J D; Berntenis, N; Roth, A; Ebeling, M

    2014-06-01

    Gene signatures of drug-induced toxicity are of broad interest, but they are often identified from small-scale, single-time point experiments, and are therefore of limited applicability. To address this issue, we performed multivariate analysis of gene expression, cell-based assays, and histopathological data in the TG-GATEs (Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system) database. Data mining highlights four genes-EGR1, ATF3, GDF15 and FGF21-that are induced 2 h after drug administration in human and rat primary hepatocytes poised to eventually undergo cytotoxicity-induced cell death. Modelling and simulation reveals that these early stress-response genes form a functional network with evolutionarily conserved structure and intrinsic dynamics. This is underlined by the fact that early induction of this network in vivo predicts drug-induced liver and kidney pathology with high accuracy. Our findings demonstrate the value of early gene-expression signatures in predicting and understanding compound-induced toxicity. The identified network can empower first-line tests that reduce animal use and costs of safety evaluation.

  9. The Orphan Disease Networks

    PubMed Central

    Zhang, Minlu; Zhu, Cheng; Jacomy, Alexis; Lu, Long J.; Jegga, Anil G.

    2011-01-01

    The low prevalence rate of orphan diseases (OD) requires special combined efforts to improve diagnosis, prevention, and discovery of novel therapeutic strategies. To identify and investigate relationships based on shared genes or shared functional features, we have conducted a bioinformatic-based global analysis of all orphan diseases with known disease-causing mutant genes. Starting with a bipartite network of known OD and OD-causing mutant genes and using the human protein interactome, we first construct and topologically analyze three networks: the orphan disease network, the orphan disease-causing mutant gene network, and the orphan disease-causing mutant gene interactome. Our results demonstrate that in contrast to the common disease-causing mutant genes that are predominantly nonessential, a majority of orphan disease-causing mutant genes are essential. In confirmation of this finding, we found that OD-causing mutant genes are topologically important in the protein interactome and are ubiquitously expressed. Additionally, functional enrichment analysis of those genes in which mutations cause ODs shows that a majority result in premature death or are lethal in the orthologous mouse gene knockout models. To address the limitations of traditional gene-based disease networks, we also construct and analyze OD networks on the basis of shared enriched features (biological processes, cellular components, pathways, phenotypes, and literature citations). Analyzing these functionally-linked OD networks, we identified several additional OD-OD relations that are both phenotypically similar and phenotypically diverse. Surprisingly, we observed that the wiring of the gene-based and other feature-based OD networks are largely different; this suggests that the relationship between ODs cannot be fully captured by the gene-based network alone. PMID:21664998

  10. Network analysis of genomic alteration profiles reveals co-altered functional modules and driver genes for glioblastoma.

    PubMed

    Gu, Yunyan; Wang, Hongwei; Qin, Yao; Zhang, Yujing; Zhao, Wenyuan; Qi, Lishuang; Zhang, Yuannv; Wang, Chenguang; Guo, Zheng

    2013-03-01

    The heterogeneity of genetic alterations in human cancer genomes presents a major challenge to advancing our understanding of cancer mechanisms and identifying cancer driver genes. To tackle this heterogeneity problem, many approaches have been proposed to investigate genetic alterations and predict driver genes at the individual pathway level. However, most of these approaches ignore the correlation of alteration events between pathways and miss many genes with rare alterations collectively contributing to carcinogenesis. Here, we devise a network-based approach to capture the cooperative functional modules hidden in genome-wide somatic mutation and copy number alteration profiles of glioblastoma (GBM) from The Cancer Genome Atlas (TCGA), where a module is a set of altered genes with dense interactions in the protein interaction network. We identify 7 pairs of significantly co-altered modules that involve the main pathways known to be altered in GBM (TP53, RB and RTK signaling pathways) and highlight the striking co-occurring alterations among these GBM pathways. By taking into account the non-random correlation of gene alterations, the property of co-alteration could distinguish oncogenic modules that contain driver genes involved in the progression of GBM. The collaboration among cancer pathways suggests that the redundant models and aggravating models could shed new light on the potential mechanisms during carcinogenesis and provide new indications for the design of cancer therapeutic strategies.

  11. Gene Coexpression Network Alignment and Conservation of Gene Modules between Two Grass Species: Maize and Rice[C][W][OA

    PubMed Central

    Ficklin, Stephen P.; Feltus, F. Alex

    2011-01-01

    One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species. PMID:21606319

  12. Listening to the noise: random fluctuations reveal gene network parameters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Munsky, Brian; Khammash, Mustafa

    2009-01-01

    The cellular environment is abuzz with noise. The origin of this noise is attributed to the inherent random motion of reacting molecules that take part in gene expression and post expression interactions. In this noisy environment, clonal populations of cells exhibit cell-to-cell variability that frequently manifests as significant phenotypic differences within the cellular population. The stochastic fluctuations in cellular constituents induced by noise can be measured and their statistics quantified. We show that these random fluctuations carry within them valuable information about the underlying genetic network. Far from being a nuisance, the ever-present cellular noise acts as a rich sourcemore » of excitation that, when processed through a gene network, carries its distinctive fingerprint that encodes a wealth of information about that network. We demonstrate that in some cases the analysis of these random fluctuations enables the full identification of network parameters, including those that may otherwise be difficult to measure. This establishes a potentially powerful approach for the identification of gene networks and offers a new window into the workings of these networks.« less

  13. Inferring transcriptional gene regulation network of starch metabolism in Arabidopsis thaliana leaves using graphical Gaussian model

    PubMed Central

    2012-01-01

    Background Starch serves as a temporal storage of carbohydrates in plant leaves during day/night cycles. To study transcriptional regulatory modules of this dynamic metabolic process, we conducted gene regulation network analysis based on small-sample inference of graphical Gaussian model (GGM). Results Time-series significant analysis was applied for Arabidopsis leaf transcriptome data to obtain a set of genes that are highly regulated under a diurnal cycle. A total of 1,480 diurnally regulated genes included 21 starch metabolic enzymes, 6 clock-associated genes, and 106 transcription factors (TF). A starch-clock-TF gene regulation network comprising 117 nodes and 266 edges was constructed by GGM from these 133 significant genes that are potentially related to the diurnal control of starch metabolism. From this network, we found that β-amylase 3 (b-amy3: At4g17090), which participates in starch degradation in chloroplast, is the most frequently connected gene (a hub gene). The robustness of gene-to-gene regulatory network was further analyzed by TF binding site prediction and by evaluating global co-expression of TFs and target starch metabolic enzymes. As a result, two TFs, indeterminate domain 5 (AtIDD5: At2g02070) and constans-like (COL: At2g21320), were identified as positive regulators of starch synthase 4 (SS4: At4g18240). The inference model of AtIDD5-dependent positive regulation of SS4 gene expression was experimentally supported by decreased SS4 mRNA accumulation in Atidd5 mutant plants during the light period of both short and long day conditions. COL was also shown to positively control SS4 mRNA accumulation. Furthermore, the knockout of AtIDD5 and COL led to deformation of chloroplast and its contained starch granules. This deformity also affected the number of starch granules per chloroplast, which increased significantly in both knockout mutant lines. Conclusions In this study, we utilized a systematic approach of microarray analysis to discover

  14. ENU Mutagenesis in Mice Identifies Candidate Genes For Hypogonadism

    PubMed Central

    Weiss, Jeffrey; Hurley, Lisa A.; Harris, Rebecca M.; Finlayson, Courtney; Tong, Minghan; Fisher, Lisa A.; Moran, Jennifer L.; Beier, David R.; Mason, Christopher; Jameson, J. Larry

    2012-01-01

    Genome-wide mutagenesis was performed in mice to identify candidate genes for male infertility, for which the predominant causes remain idiopathic. Mice were mutagenized using N-ethyl-N-nitrosourea (ENU), bred, and screened for phenotypes associated with the male urogenital system. Fifteen heritable lines were isolated and chromosomal loci were assigned using low density genome-wide SNP arrays. Ten of the fifteen lines were pursued further using higher resolution SNP analysis to narrow the candidate gene regions. Exon sequencing of candidate genes identified mutations in mice with cystic kidneys (Bicc1), cryptorchidism (Rxfp2), restricted germ cell deficiency (Plk4), and severe germ cell deficiency (Prdm9). In two other lines with severe hypogonadism candidate sequencing failed to identify mutations, suggesting defects in genes with previously undocumented roles in gonadal function. These genomic intervals were sequenced in their entirety and a candidate mutation was identified in SnrpE in one of the two lines. The line harboring the SnrpE variant retains substantial spermatogenesis despite small testis size, an unusual phenotype. In addition to the reproductive defects, heritable phenotypes were observed in mice with ataxia (Myo5a), tremors (Pmp22), growth retardation (unknown gene), and hydrocephalus (unknown gene). These results demonstrate that the ENU screen is an effective tool for identifying potential causes of male infertility. PMID:22258617

  15. Insights into the etiology-associated gene regulatory networks in hepatocellular carcinoma from The Cancer Genome Atlas.

    PubMed

    Seshachalam, Veerabrahma Pratap; Sekar, Karthik; Hui, Kam M

    2018-04-19

    Hepatitis B virus, hepatitis C virus, alcoholic consumption and non-alcoholic fatty liver are the major known risk factors for Hepatocellular carcinoma (HCC). There have been very few studies comparing the underlying biological mechanisms associated with the different etiologies of HCC. In this study, we hypothesized the existence of different regulatory networks associated with different liver disease etiologies involved in hepatocarcinogenesis. Using upstream regulatory analysis tool in ingenuity pathway analysis software, URs were predicted using differential expressed genes for HCC to facilitate the interrogation of global gene regulation. Analysis of regulatory networks for HBV HCC revealed E2F1 as activated UR, regulating genes involved in cell cycle and DNA replication and HNF4A and HNF1A as inhibited UR. In HCV HCC, IFNG, involved in cellular movement and signaling was activated while IL1RN, MAPK1 involved in IL-22 signaling and immune response was inhibited. In Alcoholic-consumption HCC, ERBB2 involved in inflammatory response and cellular movement was activated, whereas HNF4A, NUPR1 were inhibited. For HCC derived from Non-alcoholic fatty liver disease, miR-1249-5p was activated and NUPR1 involved in cell cycle and apoptosis was inhibited. The prognostic value of representative genes identified in the regulatory networks for HBV HCC can be further validated by an independent HBV HCC dataset established in our laboratory with survival data. Our study identified functionally distinct candidate URs for HCC developed from different etiologic risk factors. Further functional validation studies of these regulatory networks could facilitate the management of HCC towards personalized medicine. This article is protected by copyright. All rights reserved.

  16. A Genome-wide Regulatory Network Identifies Key Transcription Factors for Memory CD8+ T Cell Development

    PubMed Central

    Hu, Guangan; Chen, Jianzhu

    2014-01-01

    Memory CD8+ T cell development is defined by the expression of a specific set of memory signature genes (MSGs). Despite recent progress, many components of the transcriptional control of memory CD8+ T cell development are still unknown. To identify transcription factors (TFs) and their interactions in memory CD8+ T cell development, we construct a genome-wide regulatory network and apply it to identify key TFs that regulate MSGs. Most of the known TFs in memory CD8+ T cell development are rediscovered and about a dozen new TFs are also identified. Sox4, Bhlhe40, Bach2 and Runx2 are experimentally verified and Bach2 is further shown to promote both development and recall proliferation of memory CD8+ T cells through Prdm1 and Id3. Gene perturbation study identifies the mode of interactions among the TFs with Sox4 as a hub. The identified TFs and insights into their interactions should facilitate further dissection of molecular mechanisms underlying memory CD8+ T cell development. PMID:24335726

  17. GFD-Net: A novel semantic similarity methodology for the analysis of gene networks.

    PubMed

    Díaz-Montaña, Juan J; Díaz-Díaz, Norberto; Gómez-Vela, Francisco

    2017-04-01

    Since the popularization of biological network inference methods, it has become crucial to create methods to validate the resulting models. Here we present GFD-Net, the first methodology that applies the concept of semantic similarity to gene network analysis. GFD-Net combines the concept of semantic similarity with the use of gene network topology to analyze the functional dissimilarity of gene networks based on Gene Ontology (GO). The main innovation of GFD-Net lies in the way that semantic similarity is used to analyze gene networks taking into account the network topology. GFD-Net selects a functionality for each gene (specified by a GO term), weights each edge according to the dissimilarity between the nodes at its ends and calculates a quantitative measure of the network functional dissimilarity, i.e. a quantitative value of the degree of dissimilarity between the connected genes. The robustness of GFD-Net as a gene network validation tool was demonstrated by performing a ROC analysis on several network repositories. Furthermore, a well-known network was analyzed showing that GFD-Net can also be used to infer knowledge. The relevance of GFD-Net becomes more evident in Section "GFD-Net applied to the study of human diseases" where an example of how GFD-Net can be applied to the study of human diseases is presented. GFD-Net is available as an open-source Cytoscape app which offers a user-friendly interface to configure and execute the algorithm as well as the ability to visualize and interact with the results(http://apps.cytoscape.org/apps/gfdnet). Copyright © 2017 Elsevier Inc. All rights reserved.

  18. DNA methylome profiling identifies novel methylated genes in African American patients with colorectal neoplasia.

    PubMed

    Ashktorab, Hassan; Daremipouran, M; Goel, Ajay; Varma, Sudhir; Leavitt, R; Sun, Xueguang; Brim, Hassan

    2014-04-01

    The identification of genes that are differentially methylated in colorectal cancer (CRC) has potential value for both diagnostic and therapeutic interventions specifically in high-risk populations such as African Americans (AAs). However, DNA methylation patterns in CRC, especially in AAs, have not been systematically explored and remain poorly understood. Here, we performed DNA methylome profiling to identify the methylation status of CpG islands within candidate genes involved in critical pathways important in the initiation and development of CRC. We used reduced representation bisulfite sequencing (RRBS) in colorectal cancer and adenoma tissues that were compared with DNA methylome from a healthy AA subject's colon tissue and peripheral blood DNA. The identified methylation markers were validated in fresh frozen CRC tissues and corresponding normal tissues from AA patients diagnosed with CRC at Howard University Hospital. We identified and validated the methylation status of 355 CpG sites located within 16 gene promoter regions associated with CpG islands. Fifty CpG sites located within CpG islands-in genes ATXN7L1 (2), BMP3 (7), EID3 (15), GAS7 (1), GPR75 (24), and TNFAIP2 (1)-were significantly hypermethylated in tumor vs. normal tissues (P<0.05). The methylation status of BMP3, EID3, GAS7, and GPR75 was confirmed in an independent, validation cohort. Ingenuity pathway analysis mapped three of these markers (GAS7, BMP3 and GPR) in the insulin and TGF-β1 network-the two key pathways in CRC. In addition to hypermethylated genes, our analysis also revealed that LINE-1 repeat elements were progressively hypomethylated in the normal-adenoma-cancer sequence. We conclude that DNA methylome profiling based on RRBS is an effective method for screening aberrantly methylated genes in CRC. While previous studies focused on the limited identification of hypermethylated genes, ours is the first study to systematically and comprehensively identify novel hypermethylated

  19. Network-based prediction and knowledge mining of disease genes

    PubMed Central

    2015-01-01

    Background In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. Methods We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Results Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second

  20. Network-based prediction and knowledge mining of disease genes.

    PubMed

    Carson, Matthew B; Lu, Hui

    2015-01-01

    In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network

  1. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency.

    PubMed

    Yeh, Hsiang-Yuan; Cheng, Shih-Wu; Lin, Yu-Chun; Yeh, Cheng-Yu; Lin, Shih-Fang; Soo, Von-Wun

    2009-12-21

    Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage. We provide a computational framework to reconstruct

  2. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency

    PubMed Central

    2009-01-01

    Background Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. Results To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage. Conclusions We provide a

  3. Featured Article: Transcriptional landscape analysis identifies differently expressed genes involved in follicle-stimulating hormone induced postmenopausal osteoporosis.

    PubMed

    Maasalu, Katre; Laius, Ott; Zhytnik, Lidiia; Kõks, Sulev; Prans, Ele; Reimann, Ene; Märtson, Aare

    2017-01-01

    Osteoporosis is a disorder associated with bone tissue reorganization, bone mass, and mineral density. Osteoporosis can severely affect postmenopausal women, causing bone fragility and osteoporotic fractures. The aim of the current study was to compare blood mRNA profiles of postmenopausal women with and without osteoporosis, with the aim of finding different gene expressions and thus targets for future osteoporosis biomarker studies. Our study consisted of transcriptome analysis of whole blood serum from 12 elderly female osteoporotic patients and 12 non-osteoporotic elderly female controls. The transcriptome analysis was performed with RNA sequencing technology. For data analysis, the edgeR package of R Bioconductor was used. Two hundred and fourteen genes were expressed differently in osteoporotic compared with non-osteoporotic patients. Statistical analysis revealed 20 differently expressed genes with a false discovery rate of less than 1.47 × 10 -4 among osteoporotic patients. The expression of 10 genes were up-regulated and 10 down-regulated. Further statistical analysis identified a potential osteoporosis mRNA biomarker pattern consisting of six genes: CACNA1G, ALG13, SBK1, GGT7, MBNL3, and RIOK3. Functional ingenuity pathway analysis identified the strongest candidate genes with regard to potential involvement in a follicle-stimulating hormone activated network of increased osteoclast activity and hypogonadal bone loss. The differentially expressed genes identified in this study may contribute to future research of postmenopausal osteoporosis blood biomarkers.

  4. In vivo genome-wide analysis of multiple tissues identifies gene regulatory networks, novel functions and downstream regulatory genes for Bapx1 and its co-regulation with Sox9 in the mammalian vertebral column.

    PubMed

    Chatterjee, Sumantra; Sivakamasundari, V; Yap, Sook Peng; Kraus, Petra; Kumar, Vibhor; Xing, Xing; Lim, Siew Lan; Sng, Joel; Prabhakar, Shyam; Lufkin, Thomas

    2014-12-05

    Vertebrate organogenesis is a highly complex process involving sequential cascades of transcription factor activation or repression. Interestingly a single developmental control gene can occasionally be essential for the morphogenesis and differentiation of tissues and organs arising from vastly disparate embryological lineages. Here we elucidated the role of the mammalian homeobox gene Bapx1 during the embryogenesis of five distinct organs at E12.5 - vertebral column, spleen, gut, forelimb and hindlimb - using expression profiling of sorted wildtype and mutant cells combined with genome wide binding site analysis. Furthermore we analyzed the development of the vertebral column at the molecular level by combining transcriptional profiling and genome wide binding data for Bapx1 with similarly generated data sets for Sox9 to assemble a detailed gene regulatory network revealing genes previously not reported to be controlled by either of these two transcription factors. The gene regulatory network appears to control cell fate decisions and morphogenesis in the vertebral column along with the prevention of premature chondrocyte differentiation thus providing a detailed molecular view of vertebral column development.

  5. A novel method to identify hub pathways of rheumatoid arthritis based on differential pathway networks.

    PubMed

    Wei, Shi-Tong; Sun, Yong-Hua; Zong, Shi-Hua

    2017-09-01

    The aim of the current study was to identify hub pathways of rheumatoid arthritis (RA) using a novel method based on differential pathway network (DPN) analysis. The present study proposed a DPN where protein‑protein interaction (PPI) network was integrated with pathway‑pathway interactions. Pathway data was obtained from background PPI network and the Reactome pathway database. Subsequently, pathway interactions were extracted from the pathway data by building randomized gene‑gene interactions and a weight value was assigned to each pathway interaction using Spearman correlation coefficient (SCC) to identify differential pathway interactions. Differential pathway interactions were visualized using Cytoscape to construct a DPN. Topological analysis was conducted to identify hub pathways that possessed the top 5% degree distribution of DPN. Modules of DPN were mined according to ClusterONE. A total of 855 pathways were selected to build pathway interactions. By filtrating pathway interactions of weight values >0.7, a DPN with 312 nodes and 791 edges was obtained. Topological degree analysis revealed 15 hub pathways, such as heparan sulfate/heparin‑glycosaminoglycan (HS‑GAG) degradation, HS‑GAG metabolism and keratan sulfate degradation for RA based on DPN. Furthermore, hub pathways were also important in modules, which validated the significance of hub pathways. In conclusion, the proposed method is a computationally efficient way to identify hub pathways of RA, which identified 15 hub pathways that may be potential biomarkers and provide insight to future investigation and treatment of RA.

  6. A parallel implementation of the network identification by multiple regression (NIR) algorithm to reverse-engineer regulatory gene networks.

    PubMed

    Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro

    2010-04-21

    The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.

  7. Integrated in silico analyses of regulatory and metabolic networks of Synechococcus sp. PCC 7002 reveal relationships between gene centrality and essentiality

    DOE PAGES

    Song, Hyun-Seob; McClure, Ryan S.; Bernstein, Hans C.; ...

    2015-03-27

    Cyanobacteria dynamically relay environmental inputs to intracellular adaptations through a coordinated adjustment of photosynthetic efficiency and carbon processing rates. The output of such adaptations is reflected through changes in transcriptional patterns and metabolic flux distributions that ultimately define growth strategy. To address interrelationships between metabolism and regulation, we performed integrative analyses of metabolic and gene co-expression networks in a model cyanobacterium, Synechococcus sp. PCC 7002. Centrality analyses using the gene co-expression network identified a set of key genes, which were defined here as ‘topologically important.’ Parallel in silico gene knock-out simulations, using the genome-scale metabolic network, classified what we termedmore » as ‘functionally important’ genes, deletion of which affected growth or metabolism. A strong positive correlation was observed between topologically and functionally important genes. Functionally important genes exhibited variable levels of topological centrality; however, the majority of topologically central genes were found to be functionally essential for growth. Subsequent functional enrichment analysis revealed that both functionally and topologically important genes in Synechococcus sp. PCC 7002 are predominantly associated with translation and energy metabolism, two cellular processes critical for growth. This research demonstrates how synergistic network-level analyses can be used for reconciliation of metabolic and gene expression data to uncover fundamental biological principles.« less

  8. Integrated in silico analyses of regulatory and metabolic networks of Synechococcus sp. PCC 7002 reveal relationships between gene centrality and essentiality

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Song, Hyun-Seob; McClure, Ryan S.; Bernstein, Hans C.

    Cyanobacteria dynamically relay environmental inputs to intracellular adaptations through a coordinated adjustment of photosynthetic efficiency and carbon processing rates. The output of such adaptations is reflected through changes in transcriptional patterns and metabolic flux distributions that ultimately define growth strategy. To address interrelationships between metabolism and regulation, we performed integrative analyses of metabolic and gene co-expression networks in a model cyanobacterium, Synechococcus sp. PCC 7002. Centrality analyses using the gene co-expression network identified a set of key genes, which were defined here as ‘topologically important.’ Parallel in silico gene knock-out simulations, using the genome-scale metabolic network, classified what we termedmore » as ‘functionally important’ genes, deletion of which affected growth or metabolism. A strong positive correlation was observed between topologically and functionally important genes. Functionally important genes exhibited variable levels of topological centrality; however, the majority of topologically central genes were found to be functionally essential for growth. Subsequent functional enrichment analysis revealed that both functionally and topologically important genes in Synechococcus sp. PCC 7002 are predominantly associated with translation and energy metabolism, two cellular processes critical for growth. This research demonstrates how synergistic network-level analyses can be used for reconciliation of metabolic and gene expression data to uncover fundamental biological principles.« less

  9. MicroRNA-integrated and network-embedded gene selection with diffusion distance.

    PubMed

    Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C

    2010-10-29

    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

  10. De novo Transcriptome Analysis of Miscanthus lutarioriparius Identifies Candidate Genes in Rhizome Development

    PubMed Central

    Hu, Ruibo; Yu, Changjiang; Wang, Xiaoyu; Jia, Chunlin; Pei, Shengqiang; He, Kang; He, Guo; Kong, Yingzhen; Zhou, Gongke

    2017-01-01

    HIGHLIGHT De novo transcriptome profiling of five tissues reveals candidate genes putatively involved in rhizome development in M. lutarioriparius. Miscanthus lutarioriparius is a promising lignocellulosic feedstock for second-generation bioethanol production. However, the genomic resource for this species is relatively limited thus hampers our understanding of the molecular mechanisms underlying many important biological processes. In this study, we performed the first de novo transcriptome analysis of five tissues (leaf, stem, root, lateral bud and rhizome bud) of M. lutarioriparius with an emphasis to identify putative genes involved in rhizome development. Approximately 66 gigabase (GB) paired-end clean reads were obtained and assembled into 169,064 unigenes with an average length of 759 bp. Among these unigenes, 103,899 (61.5%) were annotated in seven public protein databases. Differential gene expression profiling analysis revealed that 4,609, 3,188, 1,679, 1,218, and 1,077 genes were predominantly expressed in root, leaf, stem, lateral bud, and rhizome bud, respectively. Their expression patterns were further classified into 12 distinct clusters. Pathway enrichment analysis revealed that genes predominantly expressed in rhizome bud were mainly involved in primary metabolism and hormone signaling and transduction pathways. Noteworthy, 19 transcription factors (TFs) and 16 hormone signaling pathway-related genes were identified to be predominantly expressed in rhizome bud compared with the other tissues, suggesting putative roles in rhizome formation and development. In addition, a predictive regulatory network was constructed between four TFs and six auxin and abscisic acid (ABA) -related genes. Furthermore, the expression of 24 rhizome-specific genes was further validated by quantitative real-time RT-PCR (qRT-PCR) analysis. Taken together, this study provide a global portrait of gene expression across five different tissues and reveal preliminary insights

  11. Integron associated mobile genes: Just a collection of plug in apps or essential components of cell network hardware?

    PubMed

    Labbate, Maurizio; Boucher, Yan; Luu, Ivan; Chowdhury, Piklu Roy; Stokes, H W

    2012-01-01

    Lateral gene transfer (LGT) impacts on the evolution of prokaryotes in both the short and long-term. The short-term impacts of mobilized genes are a concern to humans since LGT explains the global rise of multi drug resistant pathogens seen in the past 70 years. However, LGT has been a feature of prokaryotes from the earliest days of their existence and the concept of a bifurcating tree of life is not entirely applicable to prokaryotes since most genes in extant prokaryotic genomes have probably been acquired from other lineages. Successful transfer and maintenance of a gene in a new host is understandable if it acts independently of cell networks and confers an advantage. Antibiotic resistance provides an example of this whereby a gene can be advantageous in virtually any cell across broad species backgrounds. In a longer evolutionary context however laterally transferred genes can be assimilated into even essential cell networks. How this happens is not well understood and we discuss recent work that identifies a mobile gene, unique to a cell lineage, which is detrimental to the cell when lost. We also present some additional data and believe our emerging model will be helpful in understanding how mobile genes integrate into cell networks.

  12. How to train your microbe: methods for dynamically characterizing gene networks

    PubMed Central

    Castillo-Hair, Sebastian M.; Igoshin, Oleg A.; Tabor, Jeffrey J.

    2015-01-01

    Gene networks regulate biological processes dynamically. However, researchers have largely relied upon static perturbations, such as growth media variations and gene knockouts, to elucidate gene network structure and function. Thus, much of the regulation on the path from DNA to phenotype remains poorly understood. Recent studies have utilized improved genetic tools, hardware, and computational control strategies to generate precise temporal perturbations outside and inside of live cells. These experiments have, in turn, provided new insights into the organizing principles of biology. Here, we introduce the major classes of dynamical perturbations that can be used to study gene networks, and discuss technologies available for creating them in a wide range of microbial pathways. PMID:25677419

  13. Cross-species microarray hybridization to identify developmentally regulated genes in the filamentous fungus Sordaria macrospora.

    PubMed

    Nowrousian, Minou; Ringelberg, Carol; Dunlap, Jay C; Loros, Jennifer J; Kück, Ulrich

    2005-04-01

    The filamentous fungus Sordaria macrospora forms complex three-dimensional fruiting bodies that protect the developing ascospores and ensure their proper discharge. Several regulatory genes essential for fruiting body development were previously isolated by complementation of the sterile mutants pro1, pro11 and pro22. To establish the genetic relationships between these genes and to identify downstream targets, we have conducted cross-species microarray hybridizations using cDNA arrays derived from the closely related fungus Neurospora crassa and RNA probes prepared from wild-type S. macrospora and the three developmental mutants. Of the 1,420 genes which gave a signal with the probes from all the strains used, 172 (12%) were regulated differently in at least one of the three mutants compared to the wild type, and 17 (1.2%) were regulated differently in all three mutant strains. Microarray data were verified by Northern analysis or quantitative real time PCR. Among the genes that are up- or down-regulated in the mutant strains are genes encoding the pheromone precursors, enzymes involved in melanin biosynthesis and a lectin-like protein. Analysis of gene expression in double mutants revealed a complex network of interaction between the pro gene products.

  14. Altered Pathway Analyzer: A gene expression dataset analysis tool for identification and prioritization of differentially regulated and network rewired pathways

    PubMed Central

    Kaushik, Abhinav; Ali, Shakir; Gupta, Dinesh

    2017-01-01

    Gene connection rewiring is an essential feature of gene network dynamics. Apart from its normal functional role, it may also lead to dysregulated functional states by disturbing pathway homeostasis. Very few computational tools measure rewiring within gene co-expression and its corresponding regulatory networks in order to identify and prioritize altered pathways which may or may not be differentially regulated. We have developed Altered Pathway Analyzer (APA), a microarray dataset analysis tool for identification and prioritization of altered pathways, including those which are differentially regulated by TFs, by quantifying rewired sub-network topology. Moreover, APA also helps in re-prioritization of APA shortlisted altered pathways enriched with context-specific genes. We performed APA analysis of simulated datasets and p53 status NCI-60 cell line microarray data to demonstrate potential of APA for identification of several case-specific altered pathways. APA analysis reveals several altered pathways not detected by other tools evaluated by us. APA analysis of unrelated prostate cancer datasets identifies sample-specific as well as conserved altered biological processes, mainly associated with lipid metabolism, cellular differentiation and proliferation. APA is designed as a cross platform tool which may be transparently customized to perform pathway analysis in different gene expression datasets. APA is freely available at http://bioinfo.icgeb.res.in/APA. PMID:28084397

  15. Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes

    PubMed Central

    Roy, Janine; Aust, Daniela; Knösel, Thomas; Rümmele, Petra; Jahnke, Beatrix; Hentrich, Vera; Rückert, Felix; Niedergethmann, Marco; Weichert, Wilko; Bahra, Marcus; Schlitt, Hans J.; Settmacher, Utz; Friess, Helmut; Büchler, Markus; Saeger, Hans-Detlev; Schroeder, Michael; Pilarsky, Christian; Grützmann, Robert

    2012-01-01

    Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice. PMID:22615549

  16. Gene regulatory and signaling networks exhibit distinct topological distributions of motifs

    NASA Astrophysics Data System (ADS)

    Ferreira, Gustavo Rodrigues; Nakaya, Helder Imoto; Costa, Luciano da Fontoura

    2018-04-01

    The biological processes of cellular decision making and differentiation involve a plethora of signaling pathways and gene regulatory circuits. These networks in turn exhibit a multitude of motifs playing crucial parts in regulating network activity. Here we compare the topological placement of motifs in gene regulatory and signaling networks and observe that it suggests different evolutionary strategies in motif distribution for distinct cellular subnetworks.

  17. Sign: large-scale gene network estimation environment for high performance computing.

    PubMed

    Tamada, Yoshinori; Shimamura, Teppei; Yamaguchi, Rui; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru

    2011-01-01

    Our research group is currently developing software for estimating large-scale gene networks from gene expression data. The software, called SiGN, is specifically designed for the Japanese flagship supercomputer "K computer" which is planned to achieve 10 petaflops in 2012, and other high performance computing environments including Human Genome Center (HGC) supercomputer system. SiGN is a collection of gene network estimation software with three different sub-programs: SiGN-BN, SiGN-SSM and SiGN-L1. In these three programs, five different models are available: static and dynamic nonparametric Bayesian networks, state space models, graphical Gaussian models, and vector autoregressive models. All these models require a huge amount of computational resources for estimating large-scale gene networks and therefore are designed to be able to exploit the speed of 10 petaflops. The software will be available freely for "K computer" and HGC supercomputer system users. The estimated networks can be viewed and analyzed by Cell Illustrator Online and SBiP (Systems Biology integrative Pipeline). The software project web site is available at http://sign.hgc.jp/ .

  18. Wisdom of crowds for robust gene network inference

    PubMed Central

    Marbach, Daniel; Costello, James C.; Küffner, Robert; Vega, Nicci; Prill, Robert J.; Camacho, Diogo M.; Allison, Kyle R.; Kellis, Manolis; Collins, James J.; Stolovitzky, Gustavo

    2012-01-01

    Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks. PMID:22796662

  19. A genomic approach to identify hybrid incompatibility genes.

    PubMed

    Cooper, Jacob C; Phadnis, Nitin

    2016-07-02

    Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids.

  20. A genomic approach to identify hybrid incompatibility genes

    PubMed Central

    Cooper, Jacob C.; Phadnis, Nitin

    2016-01-01

    ABSTRACT Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids. PMID:27230814