Computation and application of tissue-specific gene set weights.
Frost, H Robert
2018-04-06
Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.
Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas
2017-01-21
We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex
2010-01-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
Ficklin, Stephen P; Luo, Feng; Feltus, F Alex
2010-09-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A
2013-01-29
Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.
2013-01-01
Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878
Premzl, Marko
2015-01-01
Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635
Learning contextual gene set interaction networks of cancer with condition specificity
2013-01-01
Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further investigations. Conclusions The analysis on the contextual gene sets and characterization of networks of interaction composed of these sets discovered distinct functional differences underlying various types of cancer. The results show that our method successfully reveals many subtype-specific regions in the identified maps of biological contexts, which well represent biological functions that can be connected to specific subtypes. PMID:23418942
Discovery of cancer common and specific driver gene sets
2017-01-01
Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295
Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J
2009-01-01
Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less
Functional Abstraction as a Method to Discover Knowledge in Gene Ontologies
Ultsch, Alfred; Lötsch, Jörn
2014-01-01
Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the ‘main story’. We developed a methodology to identify a comprehensibly small number of GO terms as “headlines” of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery. PMID:24587272
Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling; Wang, Xianhui; Kang, Le
2017-06-01
The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain-containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. © The Authors 2017. Published by Oxford University Press.
Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling
2017-01-01
Abstract The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain–containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. PMID:28444351
Engert, Christoph G; Droste, Rita; van Oudenaarden, Alexander; Horvitz, H Robert
2018-04-01
To better understand the tissue-specific regulation of chromatin state in cell-fate determination and animal development, we defined the tissue-specific expression of all 36 C. elegans presumptive lysine methyltransferase (KMT) genes using single-molecule fluorescence in situ hybridization (smFISH). Most KMTs were expressed in only one or two tissues. The germline was the tissue with the broadest KMT expression. We found that the germline-expressed C. elegans protein SET-17, which has a SET domain similar to that of the PRDM9 and PRDM7 SET-domain proteins, promotes fertility by regulating gene expression in primary spermatocytes. SET-17 drives the transcription of spermatocyte-specific genes from four genomic clusters to promote spermatid development. SET-17 is concentrated in stable chromatin-associated nuclear foci at actively transcribed msp (major sperm protein) gene clusters, which we term msp locus bodies. Our results reveal the function of a PRDM9/7-family SET-domain protein in spermatocyte transcription. We propose that the spatial intranuclear organization of chromatin factors might be a conserved mechanism in tissue-specific control of transcription.
Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis.
de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Verheijen, Mark H G; Posthuma, Danielle
2015-11-01
Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis.
Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis
de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Scharf, J M; Pauls, D L; Yu, D; Illmann, C; Osiecki, L; Neale, B M; Mathews, C A; Reus, V I; Lowe, T L; Freimer, N B; Cox, N J; Davis, L K; Rouleau, G A; Chouinard, S; Dion, Y; Girard, S; Cath, D C; Posthuma, D; Smit, J H; Heutink, P; King, R A; Fernandez, T; Leckman, J F; Sandor, P; Barr, C L; McMahon, W; Lyon, G; Leppert, M; Morgan, J; Weiss, R; Grados, M A; Singer, H; Jankovic, J; Tischfield, J A; Heiman, G A; Verheijen, Mark H G; Posthuma, Danielle
2015-01-01
Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis. PMID:25735483
Design and verification of a pangenome microarray oligonucleotide probe set for Dehalococcoides spp.
Hug, Laura A; Salehi, Maryam; Nuin, Paulo; Tillier, Elisabeth R; Edwards, Elizabeth A
2011-08-01
Dehalococcoides spp. are an industrially relevant group of Chloroflexi bacteria capable of reductively dechlorinating contaminants in groundwater environments. Existing Dehalococcoides genomes revealed a high level of sequence identity within this group, including 98 to 100% 16S rRNA sequence identity between strains with diverse substrate specificities. Common molecular techniques for identification of microbial populations are often not applicable for distinguishing Dehalococcoides strains. Here we describe an oligonucleotide microarray probe set designed based on clustered Dehalococcoides genes from five different sources (strain DET195, CBDB1, BAV1, and VS genomes and the KB-1 metagenome). This "pangenome" probe set provides coverage of core Dehalococcoides genes as well as strain-specific genes while optimizing the potential for hybridization to closely related, previously unknown Dehalococcoides strains. The pangenome probe set was compared to probe sets designed independently for each of the five Dehalococcoides strains. The pangenome probe set demonstrated better predictability and higher detection of Dehalococcoides genes than strain-specific probe sets on nontarget strains with <99% average nucleotide identity. An in silico analysis of the expected probe hybridization against the recently released Dehalococcoides strain GT genome and additional KB-1 metagenome sequence data indicated that the pangenome probe set performs more robustly than the combined strain-specific probe sets in the detection of genes not included in the original design. The pangenome probe set represents a highly specific, universal tool for the detection and characterization of Dehalococcoides from contaminated sites. It has the potential to become a common platform for Dehalococcoides-focused research, allowing meaningful comparisons between microarray experiments regardless of the strain examined.
Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees
2018-06-07
The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.
Xu, Yuantao; Wu, Guizhi; Hao, Baohai; Chen, Lingling; Deng, Xiuxin; Xu, Qiang
2015-11-23
With the availability of rapidly increasing number of genome and transcriptome sequences, lineage-specific genes (LSGs) can be identified and characterized. Like other conserved functional genes, LSGs play important roles in biological evolution and functions. Two set of citrus LSGs, 296 citrus-specific genes (CSGs) and 1039 orphan genes specific to sweet orange, were identified by comparative analysis between the sweet orange genome sequences and 41 genomes and 273 transcriptomes. With the two sets of genes, gene structure and gene expression pattern were investigated. On average, both the CSGs and orphan genes have fewer exons, shorter gene length and higher GC content when compared with those evolutionarily conserved genes (ECs). Expression profiling indicated that most of the LSGs expressed in various tissues of sweet orange and some of them exhibited distinct temporal and spatial expression patterns. Particularly, the orphan genes were preferentially expressed in callus, which is an important pluripotent tissue of citrus. Besides, part of the CSGs and orphan genes expressed responsive to abiotic stress, indicating their potential functions during interaction with environment. This study identified and characterized two sets of LSGs in citrus, dissected their sequence features and expression patterns, and provided valuable clues for future functional analysis of the LSGs in sweet orange.
Epigenetic regulation of depot-specific gene expression in adipose tissue.
Gehrke, Sandra; Brueckner, Bodo; Schepky, Andreas; Klein, Johannes; Iwen, Alexander; Bosch, Thomas C G; Wenck, Horst; Winnefeld, Marc; Hagemann, Sabine
2013-01-01
In humans, adipose tissue is distributed in subcutaneous abdominal and subcutaneous gluteal depots that comprise a variety of functional differences. Whereas energy storage in gluteal adipose tissue has been shown to mediate a protective effect, an increase of abdominal adipose tissue is associated with metabolic disorders. However, the molecular basis of depot-specific characteristics is not completely understood yet. Using array-based analyses of transcription profiles, we identified a specific set of genes that was differentially expressed between subcutaneous abdominal and gluteal adipose tissue. To investigate the role of epigenetic regulation in depot-specific gene expression, we additionally analyzed genome-wide DNA methylation patterns in abdominal and gluteal depots. By combining both data sets, we identified a highly significant set of depot-specifically expressed genes that appear to be epigenetically regulated. Interestingly, the majority of these genes form part of the homeobox gene family. Moreover, genes involved in fatty acid metabolism were also differentially expressed. Therefore we suppose that changes in gene expression profiles might account for depot-specific differences in lipid composition. Indeed, triglycerides and fatty acids of abdominal adipose tissue were more saturated compared to triglycerides and fatty acids in gluteal adipose tissue. Taken together, our results uncover clear differences between abdominal and gluteal adipose tissue on the gene expression and DNA methylation level as well as in fatty acid composition. Therefore, a detailed molecular characterization of adipose tissue depots will be essential to develop new treatment strategies for metabolic syndrome associated complications.
2011-01-01
Background Stem cells and their niches are studied in many systems, but mammalian germ stem cells (GSC) and their niches are still poorly understood. In rat testis, spermatogonia and undifferentiated Sertoli cells proliferate before puberty, but at puberty most spermatogonia enter spermatogenesis, and Sertoli cells differentiate to support this program. Thus, pre-pubertal spermatogonia might possess GSC potential and pre-pubertal Sertoli cells niche functions. We hypothesized that the different stem cell pools at pre-puberty and maturity provide a model for the identification of stem cell and niche-specific genes. We compared the transcript profiles of spermatogonia and Sertoli cells from pre-pubertal and pubertal rats and examined how these related to genes expressed in testicular cancers, which might originate from inappropriate communication between GSCs and Sertoli cells. Results The pre-pubertal spermatogonia-specific gene set comprised known stem cell and spermatogonial stem cell (SSC) markers. Similarly, the pre-pubertal Sertoli cell-specific gene set comprised known niche gene transcripts. A large fraction of these specifically enriched transcripts encoded trans-membrane, extra-cellular, and secreted proteins highlighting stem cell to niche communication. Comparing selective gene sets established in this study with published gene expression data of testicular cancers and their stroma, we identified sets expressed genes shared between testicular tumors and pre-pubertal spermatogonia, and tumor stroma and pre-pubertal Sertoli cells with statistic significance. Conclusions Our data suggest that SSC and their niche specifically express complementary factors for cell communication and that the same factors might be implicated in the communication between tumor cells and their micro-enviroment in testicular cancer. PMID:21232125
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo
2015-12-23
The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.
Down-weighting overlapping genes improves gene set analysis
2012-01-01
Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124
GeneTopics - interpretation of gene sets via literature-driven topic models
2013-01-01
Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875
Lee, Chang Soo; Lee, Jiyoung
2010-09-01
A rapid and specific gyrB-based real-time PCR system has been developed for detecting Bacteroides fragilis as a human-specific marker of fecal contamination. Its specificity and sensitivity was evaluated by comparison with other 16S rRNA gene-based primers using closely related Bacteroides and Prevotella. Many studies have used 16S rRNA gene-based method targeting Bacteroides because this genus is relatively abundant in human feces and is useful for microbial source tracking. However, 16S rRNA gene-based primers are evolutionarily too conserved among taxa to discriminate between human-specific species of Bacteroides and other closely related genera, such as Prevotella. Recently, one of the housekeeping genes, gyrB, has been used as an alternative target in multilocus sequence analysis (MLSA) to provide greater phylogenetic resolution. In this study, a new B. fragilis-specific primer set (Bf904F/Bf958R) was designed by alignments of 322 gyrB genes and was compared with the performance of the 16S rRNA gene-based primers in the presence of B. fragilis, Bacteroides ovatus and Prevotella melaninogenica. Amplicons were sequenced and a phylogenetic tree was constructed to confirm the specificity of the primers to B. fragilis. The gyrB-based primers successfully discriminated B. fragilis from B. ovatus and P. melaninogenica. Real-time PCR results showed that the gyrB primer set had a comparable sensitivity in the detection of B. fragilis when compared with the 16S rRNA primer set. The host-specificity of our gyrB-based primer set was validated with human, pig, cow, and dog fecal samples. The gyrB primer system had superior human-specificity. The gyrB-based system can rapidly detect human-specific fecal source and can be used for improved source tracking of human contamination. (c) 2010 Elsevier B.V. All rights reserved.
Chau, John H; Rahfeldt, Wolfgang A; Olmstead, Richard G
2018-03-01
Targeted sequence capture can be used to efficiently gather sequence data for large numbers of loci, such as single-copy nuclear loci. Most published studies in plants have used taxon-specific locus sets developed individually for a clade using multiple genomic and transcriptomic resources. General locus sets can also be developed from loci that have been identified as single-copy and have orthologs in large clades of plants. We identify and compare a taxon-specific locus set and three general locus sets (conserved ortholog set [COSII], shared single-copy nuclear [APVO SSC] genes, and pentatricopeptide repeat [PPR] genes) for targeted sequence capture in Buddleja (Scrophulariaceae) and outgroups. We evaluate their performance in terms of assembly success, sequence variability, and resolution and support of inferred phylogenetic trees. The taxon-specific locus set had the most target loci. Assembly success was high for all locus sets in Buddleja samples. For outgroups, general locus sets had greater assembly success. Taxon-specific and PPR loci had the highest average variability. The taxon-specific data set produced the best-supported tree, but all data sets showed improved resolution over previous non-sequence capture data sets. General locus sets can be a useful source of sequence capture targets, especially if multiple genomic resources are not available for a taxon.
Dimensionality of Data Matrices with Applications to Gene Expression Profiles
ERIC Educational Resources Information Center
Feng, Xingdong
2009-01-01
Probe-level microarray data are usually stored in matrices. Take a given probe set (gene), for example, each row of the matrix corresponds to an array, and each column corresponds to a probe. Often, people summarize each array by the gene expression level. Is one number sufficient to summarize a whole probe set for a specific gene in an array?…
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su
2015-01-01
It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
2015-01-01
Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779
Jaiswal, Deepika; Jezek, Meagan; Quijote, Jeremiah; Lum, Joanna; Choi, Grace; Kulkarni, Rushmie; Park, DoHwan; Green, Erin M.
2017-01-01
The conserved yeast histone methyltransferase Set1 targets H3 lysine 4 (H3K4) for mono, di, and trimethylation and is linked to active transcription due to the euchromatic distribution of these methyl marks and the recruitment of Set1 during transcription. However, loss of Set1 results in increased expression of multiple classes of genes, including genes adjacent to telomeres and middle sporulation genes, which are repressed under normal growth conditions because they function in meiotic progression and spore formation. The mechanisms underlying Set1-mediated gene repression are varied, and still unclear in some cases, although repression has been linked to both direct and indirect action of Set1, associated with noncoding transcription, and is often dependent on the H3K4me2 mark. We show that Set1, and particularly the H3K4me2 mark, are implicated in repression of a subset of middle sporulation genes during vegetative growth. In the absence of Set1, there is loss of the DNA-binding transcriptional regulator Sum1 and the associated histone deacetylase Hst1 from chromatin in a locus-specific manner. This is linked to increased H4K5ac at these loci and aberrant middle gene expression. These data indicate that, in addition to DNA sequence, histone modification status also contributes to proper localization of Sum1. Our results also show that the role for Set1 in middle gene expression control diverges as cells receive signals to undergo meiosis. Overall, this work dissects an unexplored role for Set1 in gene-specific repression, and provides important insights into a new mechanism associated with the control of gene expression linked to meiotic differentiation. PMID:29066473
2012-01-01
Background Early liver development and the transcriptional transitions during hepatogenesis are well characterized. However, gene expression changes during the late postnatal/pre-pubertal to young adulthood period are less well understood, especially with regards to sex-specific gene expression. Methods Microarray analysis of male and female mouse liver was carried out at 3, 4, and 8 wk of age to elucidate developmental changes in gene expression from the late postnatal/pre-pubertal period to young adulthood. Results A large number of sex-biased and sex-independent genes showed significant changes during this developmental period. Notably, sex-independent genes involved in cell cycle, chromosome condensation, and DNA replication were down regulated from 3 wk to 8 wk, while genes associated with metal ion binding, ion transport and kinase activity were up regulated. A majority of genes showing sex differential expression in adult liver did not display sex differences prior to puberty, at which time extensive changes in sex-specific gene expression were seen, primarily in males. Thus, in male liver, 76% of male-specific genes were up regulated and 47% of female-specific genes were down regulated from 3 to 8 wk of age, whereas in female liver 67% of sex-specific genes showed no significant change in expression. In both sexes, genes up regulated from 3 to 8 wk were significantly enriched (p < E-76) in the set of genes positively regulated by the liver transcription factor HNF4α, as determined in a liver-specific HNF4α knockout mouse model, while genes down regulated during this developmental period showed significant enrichment (p < E-65) for negative regulation by HNF4α. Significant enrichment of the developmentally regulated genes in the set of genes subject to positive and negative regulation by pituitary hormone was also observed. Five sex-specific transcriptional regulators showed sex-specific expression at 4 wk (male-specific Ihh; female-specific Cdx4, Cux2, Tox, and Trim24) and may contribute to the developmental changes that lead to global acquisition of liver sex-specificity by 8 wk of age. Conclusions Overall, the observed changes in gene expression during postnatal liver development reflect the deceleration of liver growth and the induction of specialized liver functions, with widespread changes in sex-specific gene expression primarily occurring in male liver. PMID:22475005
Van Vlierberghe, Pieter; van Grotel, Martine; Tchinda, Joëlle; Lee, Charles; Beverloo, H. Berna; van der Spek, Peter J.; Stubbs, Andrew; Cools, Jan; Nagata, Kyosuke; Fornerod, Maarten; Buijs-Gladdines, Jessica; Horstmann, Martin; van Wering, Elisabeth R.; Soulier, Jean; Pieters, Rob
2008-01-01
T-cell acute lymphoblastic leukemia (T-ALL) is mostly characterized by specific chromosomal abnormalities, some occurring in a mutually exclusive manner that possibly delineate specific T-ALL subgroups. One subgroup, including MLL-rearranged, CALM-AF10 or inv (7)(p15q34) patients, is characterized by elevated expression of HOXA genes. Using a gene expression–based clustering analysis of 67 T-ALL cases with recurrent molecular genetic abnormalities and 25 samples lacking apparent aberrations, we identified 5 new patients with elevated HOXA levels. Using microarray-based comparative genomic hybridization (array-CGH), a cryptic and recurrent deletion, del (9)(q34.11q34.13), was exclusively identified in 3 of these 5 patients. This deletion results in a conserved SET-NUP214 fusion product, which was also identified in the T-ALL cell line LOUCY. SET-NUP214 binds in the promoter regions of specific HOXA genes, where it interacts with CRM1 and DOT1L, which may transcriptionally activate specific members of the HOXA cluster. Targeted inhibition of SET-NUP214 by siRNA abolished expression of HOXA genes, inhibited proliferation, and induced differentiation in LOUCY but not in other T-ALL lines. We conclude that SET-NUP214 may contribute to the pathogenesis of T-ALL by enforcing T-cell differentiation arrest. PMID:18299449
Ujino-Ihara, Tokuko; Kanamori, Hiroyuki; Yamane, Hiroko; Taguchi, Yuriko; Namiki, Nobukazu; Mukai, Yuzuru; Yoshimura, Kensuke; Tsumura, Yoshihiko
2005-12-01
To identify and characterize lineage-specific genes of conifers, two sets of ESTs (with 12791 and 5902 ESTs, representing 5373 and 3018 gene transcripts, respectively) were generated from the Cupressaceae species Cryptomeria japonica and Chamaecyparis obtusa. These transcripts were compared with non-redundant sets of genes generated from Pinaceae species, other gymnosperms and angiosperms. About 6% of tentative unique genes (Unigenes) of C. japonica and C. obtusa had homologs in other conifers but not angiosperms, and about 70% had apparent homologs in angiosperms. The calculated GC contents of orthologous genes showed that GC contents of coniferous genes are likely to be lower than those of angiosperms. Comparisons of the numbers of homologous genes in each species suggest that copy numbers of genes may be correlated between diverse seed plants. This correlation suggests that the multiplicity of such genes may have arisen before the divergence of gymnosperms and angiosperms.
Prediction of gene expression in embryonic structures of Drosophila melanogaster.
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-07-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-01-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945
Pace, Tomasino; Olivieri, Anna; Sanchez, Massimo; Albanesi, Veronica; Picci, Leonardo; Siden Kiamos, Inga; Janse, Chris J; Waters, Andrew P; Pizzi, Elisabetta; Ponzi, Marta
2006-05-01
Transmission of the malaria parasite depends on specialized gamete precursors (gametocytes) that develop in the bloodstream of a vertebrate host. Gametocyte/gamete differentiation requires controlled patterns of gene expression and regulation not only of stage and gender-specific genes but also of genes associated with DNA replication and mitosis. Once taken up by mosquito, male gametocytes undergo three mitotic cycles within few minutes to produce eight motile gametes. Here we analysed, in two Plasmodium species, the expression of SET, a conserved nuclear protein involved in chromatin dynamics. SET is expressed in both asexual and sexual blood stages but strongly accumulates in male gametocytes. We demonstrated functionally the presence of two distinct promoters upstream of the set open reading frame, the one active in all blood stage parasites while the other active only in gametocytes and in a fraction of schizonts possibly committed to sexual differentiation. In ookinetes both promoters exhibit a basal activity, while in the oocysts the gametocyte-specific promoter is silent and the reporter gene is only transcribed from the constitutive promoter. This transcriptional control, described for the first time in Plasmodium, provides a mechanism by which single-copy genes can be differently modulated during parasite development. In male gametocytes an overexpression of SET might contribute to a prompt entry and execution of S/M phases within mosquito vector.
Chen, Meng-Yun; Liang, Dan; Zhang, Peng
2015-11-01
Incongruence between different phylogenomic analyses is the main challenge faced by phylogeneticists in the genomic era. To reduce incongruence, phylogenomic studies normally adopt some data filtering approaches, such as reducing missing data or using slowly evolving genes, to improve the signal quality of data. Here, we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes to investigate the backbone phylogeny of jawed vertebrates under both concatenation and coalescent-based frameworks. To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods, we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions. We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions. Our analyses showed that non-specific data sets that are generated without bias toward specific questions are not sufficient to produce consistent results when there are several difficult nodes within a phylogeny. Moreover, phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods. To address such incongruences, we selected genes that resolve a given internode but not the entire phylogeny. Notably, not only can this strategy yield correct relationships for the question, but it also reduces inconsistency associated with data sizes and inference methods. Our study highlights the importance of gene selection in phylogenomic analyses, suggesting that simply using a large amount of data cannot guarantee correct results. Constructing question-specific data sets may be more powerful for resolving problematic nodes. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.
Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin
2011-04-14
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.
Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian
2017-01-01
Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.
Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian
2017-01-01
Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes. PMID:28729848
Van Loo, Peter; Aerts, Stein; Thienpont, Bernard; De Moor, Bart; Moreau, Yves; Marynen, Peter
2008-01-01
We present ModuleMiner, a novel algorithm for computationally detecting cis-regulatory modules (CRMs) in a set of co-expressed genes. ModuleMiner outperforms other methods for CRM detection on benchmark data, and successfully detects CRMs in tissue-specific microarray clusters and in embryonic development gene sets. Interestingly, CRM predictions for differentiated tissues exhibit strong enrichment close to the transcription start site, whereas CRM predictions for embryonic development gene sets are depleted in this region. PMID:18394174
Microgravity and Immunity: Changes in Lymphocyte Gene Expression
NASA Technical Reports Server (NTRS)
Risin, D.; Pellis, N. R.; Ward, N. E.; Risin, S. A.
2006-01-01
Earlier studies had shown that modeled and true microgravity (MG) cause multiple direct effects on human lymphocytes. MG inhibits lymphocyte locomotion, suppresses polyclonal and antigen-specific activation, affects signal transduction mechanisms, as well as activation-induced apoptosis. In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes (MGSG) in general and specifically those genes that might be responsible for the functional and structural changes observed earlier. Two sets of experiments targeting different goals were conducted. In the first set, T-lymphocytes from normal donors were activated with antiCD3 and IL2 and then cultured in 1g (static) and modeled MG (MMG) conditions (Rotating Wall Vessel bioreactor) for 24 hours. This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity. In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus (PHA) thus triggering the apoptotic pathway. Total RNA was extracted using the RNeasy isolation kit (Qiagen, Valencia, CA). Affymetrix Gene Chips (U133A), allowing testing for 18,400 human genes, were used for microarray analysis. In the first set of experiments MMG exposure resulted in altered expression of 89 genes, 10 of them were up-regulated and 79 down-regulated. In the second set, changes in expression were revealed in 85 genes, 20 were up-regulated and 65 were down-regulated. The analysis revealed that significant numbers of MGS genes are associated with signal transduction and apoptotic pathways. Interestingly, the majority of genes that responded by up- or down-regulation in the alternative sets of experiments were not the same, possibly reflecting different functional states of the examined T-lymphocyte populations. The responder genes (MGSG) might play an essential role in adaptation to MG and/or be responsible for pathologic changes encountered in Space and thus represent potential targets for molecular-based countermeasures
shRNA-Induced Gene Knockdown In Vivo to Investigate Neutrophil Function.
Basit, Abdul; Tang, Wenwen; Wu, Dianqing
2016-01-01
To silence genes in neutrophils efficiently, we exploited the RNA interference and developed an shRNA-based gene knockdown technique. This method involves transfection of mouse bone marrow-derived hematopoietic stem cells with retroviral vector carrying shRNA directed at a specific gene. Transfected stem cells are then transplanted into irradiated wild-type mice. After engraftment of stem cells, the transplanted mice have two sets of circulating neutrophils. One set has a gene of interest knocked down while the other set has full complement of expressed genes. This efficient technique provides a unique way to directly compare the response of neutrophils with a knocked-down gene to that of neutrophils with the full complement of expressed genes in the same environment.
Gu, Y R; Li, M Z; Zhang, K; Chen, L; Jiang, A A; Wang, J Y; Li, X W
2011-08-01
To normalize a set of quantitative real-time PCR (q-PCR) data, it is essential to determine an optimal number/set of housekeeping genes, as the abundance of housekeeping genes can vary across tissues or cells during different developmental stages, or even under certain environmental conditions. In this study, of the 20 commonly used endogenous control genes, 13, 18 and 17 genes exhibited credible stability in 56 different tissues, 10 types of adipose tissue and five types of muscle tissue, respectively. Our analysis clearly showed that three optimal housekeeping genes are adequate for an accurate normalization, which correlated well with the theoretical optimal number (r ≥ 0.94). In terms of economical and experimental feasibility, we recommend the use of the three most stable housekeeping genes for calculating the normalization factor. Based on our results, the three most stable housekeeping genes in all analysed samples (TOP2B, HSPCB and YWHAZ) are recommended for accurate normalization of q-PCR data. We also suggest that two different sets of housekeeping genes are appropriate for 10 types of adipose tissue (the HSPCB, ALDOA and GAPDH genes) and five types of muscle tissue (the TOP2B, HSPCB and YWHAZ genes), respectively. Our report will serve as a valuable reference for other studies aimed at measuring tissue-specific mRNA abundance in porcine samples. © 2011 Blackwell Verlag GmbH.
Srivastava, Mousami; Khurana, Pankaj; Sugadev, Ragumani
2012-11-02
The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. Digital differential display (DDD) rapidly creates transcription profiles based on EST comparisons and numerically calculates, as a fraction of the pool of ESTs, the relative sequence abundance of known and novel genes. However, the process of identifying the most likely tissue for a specific disease in which to search for candidate genes from the pool of differentially expressed genes remains difficult. Therefore, we have used 'Gene Ontology semantic similarity score' to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal) and disease (cancer) sets. This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. The dendrogram cluster stability was assessed by multiple bootstrapping. Multiple bootstrapping also computes a p-value for each cluster and corrects the bias of the bootstrap probability. Subsequent hierarchical clustering by the multiple bootstrapping method (α = 0.95) identified seven clusters. The comparative, as well as subtractive, approach revealed a set of 38 biomarkers comprising four distinct lung cancer signature biomarker clusters (panel 1-4). Further gene enrichment analysis of the four panels revealed that each panel represents a set of lung cancer linked metastasis diagnostic biomarkers (panel 1), chemotherapy/drug resistance biomarkers (panel 2), hypoxia regulated biomarkers (panel 3) and lung extra cellular matrix biomarkers (panel 4). Expression analysis reveals that hypoxia induced lung cancer related biomarkers (panel 3), HIF and its modulating proteins (TGM2, CSNK1A1, CTNNA1, NAMPT/Visfatin, TNFRSF1A, ETS1, SRC-1, FN1, APLP2, DMBT1/SAG, AIB1 and AZIN1) are significantly down regulated. All down regulated genes in this panel were highly up regulated in most other types of cancers. These panels of proteins may represent signature biomarkers for lung cancer and will aid in lung cancer diagnosis and disease monitoring as well as in the prediction of responses to therapeutics.
The Gene Set Builder: collation, curation, and distribution of sets of genes
Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W
2005-01-01
Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of genes in a user-friendly environment. The application can be accessed via . PMID:16371163
Ienasescu, Hans; Li, Kang; Andersson, Robin; Vitezic, Morana; Rennie, Sarah; Chen, Yun; Vitting-Seerup, Kristoffer; Lagoni, Emil; Boyd, Mette; Bornholdt, Jette; de Hoon, Michiel J. L.; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Forrest, Alistair R. R.; Carninci, Piero; Sandelin, Albin
2016-01-01
Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.dk PMID:28025337
Raychaudhuri, Soumya; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David; Sklar, Pamela; Purcell, Shaun; Daly, Mark J.
2010-01-01
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package. PMID:20838587
ExAtlas: An interactive online tool for meta-analysis of gene expression data.
Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H
2015-12-01
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
Transcriptional regulation by the Set7 lysine methyltransferase
Keating, Samuel; El-Osta, Assam
2013-01-01
Posttranslational histone modifications define chromatin structure and function. In recent years, a number of studies have characterized many of the enzymatic activities and diverse regulatory components required for monomethylation of histone H3 lysine 4 (H3K4me1) and the expression of specific genes. The challenge now is to understand how this specific chemical modification is written and the Set7 methyltransferase has emerged as a key regulatory enzyme mediating methylation of lysine residues of histone and non-histone proteins. In this review, we comprehensively explore the regulatory proteins modified by Set7 and highlight mechanisms of specific co-recruitment of the enzyme to activating promoters. With a focus on signaling and transcriptional control in disease we discuss recent experimental data emphasizing specific components of diverse regulatory complexes that mediate chromatin modification and reinterpretation of Set7-mediated gene expression. PMID:23478572
Orgeur, Mickael; Martens, Marvin; Leonte, Georgeta; Nassari, Sonya; Bonnin, Marie-Ange; Börno, Stefan T; Timmermann, Bernd; Hecht, Jochen; Duprez, Delphine; Stricker, Sigmar
2018-03-29
Connective tissues support organs and play crucial roles in development, homeostasis and fibrosis, yet our understanding of their formation is still limited. To gain insight into the molecular mechanisms of connective tissue specification, we selected five zinc-finger transcription factors - OSR1, OSR2, EGR1, KLF2 and KLF4 - based on their expression patterns and/or known involvement in connective tissue subtype differentiation. RNA-seq and ChIP-seq profiling of chick limb micromass cultures revealed a set of common genes regulated by all five transcription factors, which we describe as a connective tissue core expression set. This common core was enriched with genes associated with axon guidance and myofibroblast signature, including fibrosis-related genes. In addition, each transcription factor regulated a specific set of signalling molecules and extracellular matrix components. This suggests a concept whereby local molecular niches can be created by the expression of specific transcription factors impinging on the specification of local microenvironments. The regulatory network established here identifies common and distinct molecular signatures of limb connective tissue subtypes, provides novel insight into the signalling pathways governing connective tissue specification, and serves as a resource for connective tissue development. © 2018. Published by The Company of Biologists Ltd.
We have previously developed a statistical method to identify gene sets enriched with condition-specific genetic dependencies. The method constructs gene dependency networks from bootstrapped samples in one condition and computes the divergence between distributions of network likelihood scores from different conditions. It was shown to be capable of sensitive and specific identification of pathways with phenotype-specific dysregulation, i.e., rewiring of dependencies between genes in different conditions.
Genes involved in convergent evolution of eusociality in bees
Woodard, S. Hollis; Fischman, Brielle J.; Venkat, Aarti; Hudson, Matt E.; Varala, Kranthi; Cameron, Sydney A.; Clark, Andrew G.; Robinson, Gene E.
2011-01-01
Eusociality has arisen independently at least 11 times in insects. Despite this convergence, there are striking differences among eusocial lifestyles, ranging from species living in small colonies with overt conflict over reproduction to species in which colonies contain hundreds of thousands of highly specialized sterile workers produced by one or a few queens. Although the evolution of eusociality has been intensively studied, the genetic changes involved in the evolution of eusociality are relatively unknown. We examined patterns of molecular evolution across three independent origins of eusociality by sequencing transcriptomes of nine socially diverse bee species and combining these data with genome sequence from the honey bee Apis mellifera to generate orthologous sequence alignments for 3,647 genes. We found a shared set of 212 genes with a molecular signature of accelerated evolution across all eusocial lineages studied, as well as unique sets of 173 and 218 genes with a signature of accelerated evolution specific to either highly or primitively eusocial lineages, respectively. These results demonstrate that convergent evolution can involve a mosaic pattern of molecular changes in both shared and lineage-specific sets of genes. Genes involved in signal transduction, gland development, and carbohydrate metabolism are among the most prominent rapidly evolving genes in eusocial lineages. These findings provide a starting point for linking specific genetic changes to the evolution of eusociality. PMID:21482769
Radiation Quality Effects on Transcriptome Profiles in 3-d Cultures After Particle Irradiation
NASA Technical Reports Server (NTRS)
Patel, Z. S.; Kidane, Y. H.; Huff, J. L.
2014-01-01
In this work, we evaluate the differential effects of low- and high-LET radiation on 3-D organotypic cultures in order to investigate radiation quality impacts on gene expression and cellular responses. Reducing uncertainties in current risk models requires new knowledge on the fundamental differences in biological responses (the so-called radiation quality effects) triggered by heavy ion particle radiation versus low-LET radiation associated with Earth-based exposures. We are utilizing novel 3-D organotypic human tissue models that provide a format for study of human cells within a realistic tissue framework, thereby bridging the gap between 2-D monolayer culture and animal models for risk extrapolation to humans. To identify biological pathway signatures unique to heavy ion particle exposure, functional gene set enrichment analysis (GSEA) was used with whole transcriptome profiling. GSEA has been used extensively as a method to garner biological information in a variety of model systems but has not been commonly used to analyze radiation effects. It is a powerful approach for assessing the functional significance of radiation quality-dependent changes from datasets where the changes are subtle but broad, and where single gene based analysis using rankings of fold-change may not reveal important biological information. We identified 45 statistically significant gene sets at 0.05 q-value cutoff, including 14 gene sets common to gamma and titanium irradiation, 19 gene sets specific to gamma irradiation, and 12 titanium-specific gene sets. Common gene sets largely align with DNA damage, cell cycle, early immune response, and inflammatory cytokine pathway activation. The top gene set enriched for the gamma- and titanium-irradiated samples involved KRAS pathway activation and genes activated in TNF-treated cells, respectively. Another difference noted for the high-LET samples was an apparent enrichment in gene sets involved in cycle cycle/mitotic control. It is plausible that the enrichment in these particular pathways results from the complex DNA damage resulting from high-LET exposure where repair processes are not completed during the same time scale as the less complex damage resulting from low-LET radiation.
Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J
2016-02-01
Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Ravens, Sarina; Fournier, Marjorie; Ye, Tao; Stierle, Matthieu; Dembele, Doulaye; Chavant, Virginie; Tora, Làszlò
2014-01-01
The histone acetyltransferase (HAT) Mof is essential for mouse embryonic stem cell (mESC) pluripotency and early development. Mof is the enzymatic subunit of two different HAT complexes, MSL and NSL. The individual contribution of MSL and NSL to transcription regulation in mESCs is not well understood. Our genome-wide analysis show that i) MSL and NSL bind to specific and common sets of expressed genes, ii) NSL binds exclusively at promoters, iii) while MSL binds in gene bodies. Nsl1 regulates proliferation and cellular homeostasis of mESCs. MSL is the main HAT acetylating H4K16 in mESCs, is enriched at many mESC-specific and bivalent genes. MSL is important to keep a subset of bivalent genes silent in mESCs, while developmental genes require MSL for expression during differentiation. Thus, NSL and MSL HAT complexes differentially regulate specific sets of expressed genes in mESCs and during differentiation. DOI: http://dx.doi.org/10.7554/eLife.02104.001 PMID:24898753
Comparative analyses of Xanthomonas and Xylella complete genomes.
Moreira, Leandro M; De Souza, Robson F; Digiampietri, Luciano A; Da Silva, Ana C R; Setubal, João C
2005-01-01
Computational analyses of four bacterial genomes of the Xanthomonadaceae family reveal new unique genes that may be involved in adaptation, pathogenicity, and host specificity. The Xanthomonas genus presents 3636 unique genes distributed in 1470 families, while Xylella genus presents 1026 unique genes distributed in 375 families. Among Xanthomonas-specific genes, we highlight a large number of cell wall degrading enzymes, proteases, and iron receptors, a set of energy metabolism genes, second copy of the type II secretion system, type III secretion system, flagella and chemotactic machinery, and the xanthomonadin synthesis gene cluster. Important genes unique to the Xylella genus are an additional copy of a type IV pili gene cluster and the complete machinery of colicin V synthesis and secretion. Intersections of gene sets from both genera reveal a cluster of genes homologous to Salmonella's SPI-7 island in Xanthomonas axonopodis pv citri and Xylella fastidiosa 9a5c, which might be involved in host specificity. Each genome also presents important unique genes, such as an HMS cluster, the kdgT gene, and O-antigen in Xanthomonas axonopodis pv citri; a number of avrBS genes and a distinct O-antigen in Xanthomonas campestris pv campestris, a type I restriction-modification system and a nickase gene in Xylella fastidiosa 9a5c, and a type II restriction-modification system and two genes related to peptidoglycan biosynthesis in Xylella fastidiosa temecula 1. All these differences imply a considerable number of gene gains and losses during the divergence of the four lineages, and are associated with structural genome modifications that may have a direct relation with the mode of transmission, adaptation to specific environments and pathogenicity of each organism.
Computational dissection of human episodic memory reveals mental process-specific genetic profiles
Luksys, Gediminas; Fastenrath, Matthias; Coynel, David; Freytag, Virginie; Gschwind, Leo; Heck, Angela; Jessen, Frank; Maier, Wolfgang; Milnik, Annette; Riedel-Heller, Steffi G.; Scherer, Martin; Spalek, Klara; Vogler, Christian; Wagner, Michael; Wolfsgruber, Steffen; Papassotiropoulos, Andreas; de Quervain, Dominique J.-F.
2015-01-01
Episodic memory performance is the result of distinct mental processes, such as learning, memory maintenance, and emotional modulation of memory strength. Such processes can be effectively dissociated using computational models. Here we performed gene set enrichment analyses of model parameters estimated from the episodic memory performance of 1,765 healthy young adults. We report robust and replicated associations of the amine compound SLC (solute-carrier) transporters gene set with the learning rate, of the collagen formation and transmembrane receptor protein tyrosine kinase activity gene sets with the modulation of memory strength by negative emotional arousal, and of the L1 cell adhesion molecule (L1CAM) interactions gene set with the repetition-based memory improvement. Furthermore, in a large functional MRI sample of 795 subjects we found that the association between L1CAM interactions and memory maintenance revealed large clusters of differences in brain activity in frontal cortical areas. Our findings provide converging evidence that distinct genetic profiles underlie specific mental processes of human episodic memory. They also provide empirical support to previous theoretical and neurobiological studies linking specific neuromodulators to the learning rate and linking neural cell adhesion molecules to memory maintenance. Furthermore, our study suggests additional memory-related genetic pathways, which may contribute to a better understanding of the neurobiology of human memory. PMID:26261317
Computational dissection of human episodic memory reveals mental process-specific genetic profiles.
Luksys, Gediminas; Fastenrath, Matthias; Coynel, David; Freytag, Virginie; Gschwind, Leo; Heck, Angela; Jessen, Frank; Maier, Wolfgang; Milnik, Annette; Riedel-Heller, Steffi G; Scherer, Martin; Spalek, Klara; Vogler, Christian; Wagner, Michael; Wolfsgruber, Steffen; Papassotiropoulos, Andreas; de Quervain, Dominique J-F
2015-09-01
Episodic memory performance is the result of distinct mental processes, such as learning, memory maintenance, and emotional modulation of memory strength. Such processes can be effectively dissociated using computational models. Here we performed gene set enrichment analyses of model parameters estimated from the episodic memory performance of 1,765 healthy young adults. We report robust and replicated associations of the amine compound SLC (solute-carrier) transporters gene set with the learning rate, of the collagen formation and transmembrane receptor protein tyrosine kinase activity gene sets with the modulation of memory strength by negative emotional arousal, and of the L1 cell adhesion molecule (L1CAM) interactions gene set with the repetition-based memory improvement. Furthermore, in a large functional MRI sample of 795 subjects we found that the association between L1CAM interactions and memory maintenance revealed large clusters of differences in brain activity in frontal cortical areas. Our findings provide converging evidence that distinct genetic profiles underlie specific mental processes of human episodic memory. They also provide empirical support to previous theoretical and neurobiological studies linking specific neuromodulators to the learning rate and linking neural cell adhesion molecules to memory maintenance. Furthermore, our study suggests additional memory-related genetic pathways, which may contribute to a better understanding of the neurobiology of human memory.
Joshi, Anagha
2014-12-30
Transcriptional hotspots are defined as genomic regions bound by multiple factors. They have been identified recently as cell type specific enhancers regulating developmentally essential genes in many species such as worm, fly and humans. The in-depth analysis of hotspots across multiple cell types in same species still remains to be explored and can bring new biological insights. We therefore collected 108 transcription-related factor (TF) ChIP sequencing data sets in ten murine cell types and classified the peaks in each cell type in three groups according to binding occupancy as singletons (low-occupancy), combinatorials (mid-occupancy) and hotspots (high-occupancy). The peaks in the three groups clustered largely according to the occupancy, suggesting priming of genomic loci for mid occupancy irrespective of cell type. We then characterized hotspots for diverse structural functional properties. The genes neighbouring hotspots had a small overlap with hotspot genes in other cell types and were highly enriched for cell type specific function. Hotspots were enriched for sequence motifs of key TFs in that cell type and more than 90% of hotspots were occupied by pioneering factors. Though we did not find any sequence signature in the three groups, the H3K4me1 binding profile had bimodal peaks at hotspots, distinguishing hotspots from mono-modal H3K4me1 singletons. In ES cells, differentially expressed genes after perturbation of activators were enriched for hotspot genes suggesting hotspots primarily act as transcriptional activator hubs. Finally, we proposed that ES hotspots might be under control of SetDB1 and not DNMT for silencing. Transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes. In ES cells, they are predicted to act as transcriptional activator hubs and might be under SetDB1 control for silencing.
Identification of a set of genes showing regionally enriched expression in the mouse brain
D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM
2008-01-01
Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066
Identification of a set of genes showing regionally enriched expression in the mouse brain.
D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa L C; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven J M
2008-07-14
The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.
Microgravity and immunity: Changes in lymphocyte gene expression.
NASA Astrophysics Data System (ADS)
Risin, D.; Ward, N. E.; Risin, S. A.; Pellis, N. R.
Earlier studies had shown that modeled and true microgravity MG cause multiple direct effects on human lymphocytes MG inhibits lymphocyte locomotion suppresses polyclonal and antigen-specific activation affects signal transduction mechanisms as well as activation-induced apoptosis In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes MGSG in general and specifically those genes that might be responsible for the functional and structural changes observed earlier Two sets of experiments targeting different goals were conducted In the first set T-lymphocytes from normal donors were activated with anti-CD3 and IL2 and then cultured in 1g static and modeled MG MMG conditions Rotating Wall Vessel bioreactor for 24 hours This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus PHA thus triggering the apoptotic pathway Total RNA was extracted using the RNeasy isolation kit Qiagen Valencia CA Affymetrix Gene Chips U133A allowing testing for 18 400 human genes were used for microarray analysis The experiments were performed in triplicates with T-cells obtained from different blood donors to minimize the possible input of biological variation in gene expression and discriminate changes that are associated with the
Yue, Zongliang; Zheng, Qi; Neylon, Michael T; Yoo, Minjae; Shin, Jimin; Zhao, Zhiying; Tan, Aik Choon
2018-01-01
Abstract Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug–gene, miRNA–gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/. PMID:29126216
Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona
2013-06-01
To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values≤false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 post-anakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra.
Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona
2014-01-01
Objective To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. Methods We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values ≤ false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Results Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 postanakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. Conclusions We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra. PMID:23223423
NASA Astrophysics Data System (ADS)
Jung, Jae-Ho; Choi, Jung Min; Kim, Young-Ok
2018-03-01
We designed a genus-specific primer pair targeting the intracellular parasite Euduboscquella. To increase target specificity and inhibit untargeted PCR, two nucleotides were added at the 3' end of the reverse primer, one being a complementary nucleotide to the Euduboscquella-specific SNP (single-nucleotide polymorphism) and the other a deliberately mismatched nucleotide. Target specificity of the primer set was verified experimentally using PCR of two Euduboscquella species (positive controls) and 15 related species (negative controls composed of ciliates, diatoms and dinoflagellates), and analytical comparison with SILVA SSU rRNA gene database (release 119) in silico. In addition, we applied the Euduboscquella-specific primer set to four environmental samples previously determined by cytological staining to be either positive or negative for Euduboscquella. As expected, only positive controls and environmental samples known to contain Euduboscquella were successfully amplified by the primer set. An inferred SSU rRNA gene phylogeny placed environmental samples containing aloricate ciliates infected by Euduboscquella in a cluster discrete from Euduboscquella groups a-d previously reported from loricate, tintinnid ciliates.
ADGO: analysis of differentially expressed gene sets using composite GO annotation.
Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun
2006-09-15
Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.
oPOSSUM: integrated tools for analysis of regulatory motif over-representation
Ho Sui, Shannan J.; Fulton, Debra L.; Arenillas, David J.; Kwon, Andrew T.; Wasserman, Wyeth W.
2007-01-01
The identification of over-represented transcription factor binding sites from sets of co-expressed genes provides insights into the mechanisms of regulation for diverse biological contexts. oPOSSUM, an internet-based system for such studies of regulation, has been improved and expanded in this new release. New features include a worm-specific version for investigating binding sites conserved between Caenorhabditis elegans and C. briggsae, as well as a yeast-specific version for the analysis of co-expressed sets of Saccharomyces cerevisiae genes. The human and mouse applications feature improvements in ortholog mapping, sequence alignments and the delineation of multiple alternative promoters. oPOSSUM2, introduced for the analysis of over-represented combinations of motifs in human and mouse genes, has been integrated with the original oPOSSUM system. Analysis using user-defined background gene sets is now supported. The transcription factor binding site models have been updated to include new profiles from the JASPAR database. oPOSSUM is available at http://www.cisreg.ca/oPOSSUM/ PMID:17576675
Exploring Genetic Attributions Underlying Radiotherapy-Induced Fatigue in Prostate Cancer Patients.
Hashemi, Sepehr; Fernandez Martinez, Juan Luis; Saligan, Leorey; Sonis, Stephen
2017-09-01
Despite numerous proposed mechanisms, no definitive pathophysiology underlying radiotherapy-induced fatigue (RIF) has been established. However, the dysregulation of a set of 35 genes was recently validated to predict development of fatigue in prostate cancer patients receiving radiotherapy. To hypothesize novel pathways, and provide genetic targets for currently proposed pathways implicated in RIF development through analysis of the previously validated gene set. The gene set was analyzed for all phenotypic attributions implicated in the phenotype of fatigue. Initially, a "directed" approach was used by querying specific fatigue-related sub-phenotypes against all known phenotypic attributions of the gene set. Then, an "undirected" approach, reviewing the entirety of the literature referencing the 35 genes, was used to increase analysis sensitivity. The dysregulated genes attribute to neural, immunological, mitochondrial, muscular, and metabolic pathways. In addition, certain genes suggest phenotypes not previously emphasized in the context of RIF, such as ionizing radiation sensitivity, DNA damage, and altered DNA repair frequency. Several genes also associated with prostate cancer depression, possibly emphasizing variable radiosensitivity by RIF-prone patients, which may have palliative care implications. Despite the relevant findings, many of the 35 RIF-predictive genes are poorly characterized, warranting their investigation. The implications of herein presented RIF pathways are purely theoretical until specific end-point driven experiments are conducted in more congruent contexts. Nevertheless, the presented attributions are informative, directing future investigation to definitively elucidate RIF's pathoetiology. This study demonstrates an arguably comprehensive method of approaching known differential expression underlying a complex phenotype, to correlate feasible pathophysiology. Copyright © 2017 American Academy of Hospice and Palliative Medicine. All rights reserved.
HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology
Hodzic, Ermin; Sauerwald, Thomas; Dao, Phuong; Wang, Kendric; Yeung, Jake; Anderson, Shawn; Vandin, Fabio; Haffari, Gholamreza; Collins, Colin C.; Sahinalp, S. Cenk
2017-01-01
Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the “random walk facility location” (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: “multihitting time,” the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients’ survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology. PMID:28768687
Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources
Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel
2013-01-01
An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964
Blatti, Charles; Sinha, Saurabh
2016-07-15
Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Saka, Ernur; Harrison, Benjamin J; West, Kirk; Petruska, Jeffrey C; Rouchka, Eric C
2017-12-06
Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature. We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences. Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.
Inference of Evolutionary Forces Acting on Human Biological Pathways
Daub, Josephine T.; Dupanloup, Isabelle; Robinson-Rechavi, Marc; Excoffier, Laurent
2015-01-01
Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures. PMID:25971280
Meyers, Robin M; Bryan, Jordan G; McFarland, James M; Weir, Barbara A; Sizemore, Ann E; Xu, Han; Dharia, Neekesh V; Montgomery, Phillip G; Cowley, Glenn S; Pantel, Sasha; Goodale, Amy; Lee, Yenarae; Ali, Levi D; Jiang, Guozhi; Lubonja, Rakela; Harrington, William F; Strickland, Matthew; Wu, Ting; Hawes, Derek C; Zhivich, Victor A; Wyatt, Meghan R; Kalani, Zohra; Chang, Jaime J; Okamoto, Michael; Stegmaier, Kimberly; Golub, Todd R; Boehm, Jesse S; Vazquez, Francisca; Root, David E; Hahn, William C; Tsherniak, Aviad
2017-12-01
The CRISPR-Cas9 system has revolutionized gene editing both at single genes and in multiplexed loss-of-function screens, thus enabling precise genome-scale identification of genes essential for proliferation and survival of cancer cells. However, previous studies have reported that a gene-independent antiproliferative effect of Cas9-mediated DNA cleavage confounds such measurement of genetic dependency, thereby leading to false-positive results in copy number-amplified regions. We developed CERES, a computational method to estimate gene-dependency levels from CRISPR-Cas9 essentiality screens while accounting for the copy number-specific effect. In our efforts to define a cancer dependency map, we performed genome-scale CRISPR-Cas9 essentiality screens across 342 cancer cell lines and applied CERES to this data set. We found that CERES decreased false-positive results and estimated sgRNA activity for both this data set and previously published screens performed with different sgRNA libraries. We further demonstrate the utility of this collection of screens, after CERES correction, for identifying cancer-type-specific vulnerabilities.
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes
Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.
2005-01-01
Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209
Gene integrated set profile analysis: a context-based approach for inferring biological endpoints
Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.
2016-01-01
The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710
A genome-wide resource of cell cycle and cell shape genes of fission yeast
Hayles, Jacqueline; Wood, Valerie; Jeffery, Linda; Hoe, Kwang-Lae; Kim, Dong-Uk; Park, Han-Oh; Salas-Pino, Silvia; Heichinger, Christian; Nurse, Paul
2013-01-01
To identify near complete sets of genes required for the cell cycle and cell shape, we have visually screened a genome-wide gene deletion library of 4843 fission yeast deletion mutants (95.7% of total protein encoding genes) for their effects on these processes. A total of 513 genes have been identified as being required for cell cycle progression, 276 of which have not been previously described as cell cycle genes. Deletions of a further 333 genes lead to specific alterations in cell shape and another 524 genes result in generally misshapen cells. Here, we provide the first eukaryotic resource of gene deletions, which describes a near genome-wide set of genes required for the cell cycle and cell shape. PMID:23697806
An improved method for functional similarity analysis of genes based on Gene Ontology.
Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia
2016-12-23
Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .
Walter, Ronald B; Boswell, Mikki; Chang, Jordan; Boswell, William T; Lu, Yuan; Navarro, Kaela; Walter, Sean M; Walter, Dylan J; Salinas, Raquel; Savage, Markita
2018-05-10
Evolution occurred exclusively under the full spectrum of sunlight. Conscription of narrow regions of the solar spectrum by specific photoreceptors suggests a common strategy for regulation of genetic pathways. Fluorescent light (FL) does not possess the complexity of the solar spectrum and has only been in service for about 60 years. If vertebrates evolved specific genetic responses regulated by light wavelengths representing the entire solar spectrum, there may be genetic consequences to reducing the spectral complexity of light. We utilized RNA-Seq to assess changes in the transcriptional profiles of Xiphophorus maculatus skin after exposure to FL ("cool white"), or narrow wavelength regions of light between 350 and 600 nm (i.e., 50 nm or 10 nm regions, herein termed "wavebands"). Exposure to each 50 nm waveband identified sets of genes representing discrete pathways that showed waveband specific transcriptional modulation. For example, 350-400 or 450-500 nm waveband exposures resulted in opposite regulation of gene sets marking necrosis and apoptosis (i.e., 350-400 nm; necrosis suppression, apoptosis activation, while 450-500 nm; apoptosis suppression, necrosis activation). Further investigation of specific transcriptional modulation employing successive 10 nm waveband exposures between 500 and 550 nm showed; (a) greater numbers of genes may be transcriptionally modulated after 10 nm exposures, than observed for 50 nm or FL exposures, (b) the 10 nm wavebands induced gene sets showing greater functional specificity than 50 nm or FL exposures, and (c) the genetic effects of FL are primarily due to 30 nm between 500 and 530 nm. Interestingly, many genetic pathways exhibited completely opposite transcriptional effects after different waveband exposures. For example, the epidermal growth factor (EGF) pathway exhibits transcriptional suppression after FL exposure, becomes highly active after 450-500 nm waveband exposure, and again, exhibits strong transcriptional suppression after exposure to the 520-530 nm waveband. Collectively, these results suggest one may manipulate transcription of specific genetic pathways in skin by exposure of the intact animal to specific wavebands of light. In addition, we identify genes transcriptionally modulated in a predictable manner by specific waveband exposures. Such genes, and their regulatory elements, may represent valuable tools for genetic engineering and gene therapy protocols.
Bockamp, Ernesto; Sprengel, Rolf; Eshkind, Leonid; Lehmann, Thomas; Braun, Jan M; Emmrich, Frank; Hengstler, Jan G
2008-03-01
Many mouse models are currently available, providing avenues to elucidate gene function and to recapitulate specific pathological conditions. To a large extent, successful translation of clinical evidence or analytical data into appropriate mouse models is possible through progress in transgenic or gene-targeting technology. Beginning with a review of standard mouse transgenics and conventional gene targeting, this article will move on to discussing the basics of conditional gene expression: the tetracycline (tet)-off and tet-on systems based on the transactivators tet-controlled transactivator (Tta) and reverse tet-on transactivator (rtTA) that allow downregulation or induction of gene expression; Cre or Flp recombinase-mediated modifications, including excision, inversion, insertion and interchromosomal translocation; combination of the tet and Cre systems, permitting inducible knockout, reporter gene activation or activation of point mutations; the avian retroviral system based on delivery of rtTA specifically into cells expressing the avian retroviral receptor, which enables cell type-specific, inducible gene expression; the tamoxifen system, one of the most frequently applied steroid receptor-based systems, allows rapid activation of a fusion protein between the gene of interest and a mutant domain of the estrogen receptor, whereby activation does not depend on transcription; and techniques for cell type-specific ablation. The diphtheria toxin receptor system offers the advantage that it can be combined with the 'zoo' of Cre recombinase driver mice. Having described the basics we move on to the cutting edge: generation of genome-wide sets of conditional knockout mice. To this end, large ongoing projects apply two strategies: gene trapping based on random integration of trapping vectors into introns leading to truncation of the transcript, and gene targeting, representing the directed approach using homologous recombination. It can be expected that in the near future genome-wide sets of such mice will be available. Finally, the possibilities of conditional expression systems for investigating gene function in tissue regeneration will be illustrated by examples for neurodegenerative disease, liver regeneration and wound healing of the skin.
Genetic identification of brain cell types underlying schizophrenia.
Skene, Nathan G; Bryois, Julien; Bakken, Trygve E; Breen, Gerome; Crowley, James J; Gaspar, Héléna A; Giusti-Rodriguez, Paola; Hodge, Rebecca D; Miller, Jeremy A; Muñoz-Manchado, Ana B; O'Donovan, Michael C; Owen, Michael J; Pardiñas, Antonio F; Ryge, Jesper; Walters, James T R; Linnarsson, Sten; Lein, Ed S; Sullivan, Patrick F; Hjerling-Leffler, Jens
2018-06-01
With few exceptions, the marked advances in knowledge about the genetic basis of schizophrenia have not converged on findings that can be confidently used for precise experimental modeling. By applying knowledge of the cellular taxonomy of the brain from single-cell RNA sequencing, we evaluated whether the genomic loci implicated in schizophrenia map onto specific brain cell types. We found that the common-variant genomic results consistently mapped to pyramidal cells, medium spiny neurons (MSNs) and certain interneurons, but far less consistently to embryonic, progenitor or glial cells. These enrichments were due to sets of genes that were specifically expressed in each of these cell types. We also found that many of the diverse gene sets previously associated with schizophrenia (genes involved in synaptic function, those encoding mRNAs that interact with FMRP, antipsychotic targets, etc.) generally implicated the same brain cell types. Our results suggest a parsimonious explanation: the common-variant genetic results for schizophrenia point at a limited set of neurons, and the gene sets point to the same cells. The genetic risk associated with MSNs did not overlap with that of glutamatergic pyramidal cells and interneurons, suggesting that different cell types have biologically distinct roles in schizophrenia.
Wang, Hao; Sun, Xuming; Chou, Jeff; Lin, Marina; Ferrario, Carlos M; Zapata-Sudo, Gisele; Groban, Leanne
2017-02-01
We previously showed that cardiomyocyte-specific G protein-coupled estrogen receptor (GPER) gene deletion leads to sex-specific adverse effects on cardiac structure and function; alterations which may be due to distinct differences in mitochondrial and inflammatory processes between sexes. Here, we provide the results of Gene Set Enrichment Analysis (GSEA) based on the DNA microarray data from GPER-knockout versus GPER-intact (intact) cardiomyocytes. This article contains complete data on the mitochondrial and inflammatory response-related gene expression changes that were significant in GPER knockout versus intact cardiomyocytes from adult male and female mice. The data are supplemental to our original research article "Cardiomyocyte-specific deletion of the G protein-coupled estrogen receptor (GPER) leads to left ventricular dysfunction and adverse remodeling: a sex-specific gene profiling" (Wang et al., 2016) [1]. Data have been deposited to the Gene Expression Omnibus (GEO) database repository with the dataset identifier GSE86843.
In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pandi, Narayanan Sathiya, E-mail: sathiyapandi@gmail.com; Suganya, Sivagurunathan; Rajendran, Suriliyandi
Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However,more » the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.« less
Babben, Steve; Perovic, Dragan; Koch, Michael; Ordon, Frank
2015-01-01
Recent declines in costs accelerated sequencing of many species with large genomes, including hexaploid wheat (Triticum aestivum L.). Although the draft sequence of bread wheat is known, it is still one of the major challenges to developlocus specific primers suitable to be used in marker assisted selection procedures, due to the high homology of the three genomes. In this study we describe an efficient approach for the development of locus specific primers comprising four steps, i.e. (i) identification of genomic and coding sequences (CDS) of candidate genes, (ii) intron- and exon-structure reconstruction, (iii) identification of wheat A, B and D sub-genome sequences and primer development based on sequence differences between the three sub-genomes, and (iv); testing of primers for functionality, correct size and localisation. This approach was applied to single, low and high copy genes involved in frost tolerance in wheat. In summary for 27 of these genes for which sequences were derived from Triticum aestivum, Triticum monococcum and Hordeum vulgare, a set of 119 primer pairs was developed and after testing on Nulli-tetrasomic (NT) lines, a set of 65 primer pairs (54.6%), corresponding to 19 candidate genes, turned out to be specific. Out of these a set of 35 fragments was selected for validation via Sanger's amplicon re-sequencing. All fragments, with the exception of one, could be assigned to the original reference sequence. The approach presented here showed a much higher specificity in primer development in comparison to techniques used so far in bread wheat and can be applied to other polyploid species with a known draft sequence. PMID:26565976
Maeso, Ignacio; Dunwell, Thomas L; Wyatt, Chris D R; Marlétaz, Ferdinand; Vető, Borbála; Bernal, Juan A; Quah, Shan; Irimia, Manuel; Holland, Peter W H
2016-06-13
A central goal of evolutionary biology is to link genomic change to phenotypic evolution. The origin of new transcription factors is a special case of genomic evolution since it brings opportunities for novel regulatory interactions and potentially the emergence of new biological properties. We demonstrate that a group of four homeobox gene families (Argfx, Leutx, Dprx, Tprx), plus a gene newly described here (Pargfx), arose by tandem gene duplication from the retinal-expressed Crx gene, followed by asymmetric sequence evolution. We show these genes arose as part of repeated gene gain and loss events on a dynamic chromosomal region in the stem lineage of placental mammals, on the forerunner of human chromosome 19. The human orthologues of these genes are expressed specifically in early embryo totipotent cells, peaking from 8-cell to morula, prior to cell fate restrictions; cow orthologues have similar expression. To examine biological roles, we used ectopic gene expression in cultured human cells followed by high-throughput RNA-seq and uncovered extensive transcriptional remodelling driven by three of the genes. Comparison to transcriptional profiles of early human embryos suggest roles in activating and repressing a set of developmentally-important genes that spike at 8-cell to morula, rather than a general role in genome activation. We conclude that a dynamic chromosome region spawned a set of evolutionarily new homeobox genes, the ETCHbox genes, specifically in eutherian mammals. After these genes diverged from the parental Crx gene, we argue they were recruited for roles in the preimplantation embryo including activation of genes at the 8-cell stage and repression after morula. We propose these new homeobox gene roles permitted fine-tuning of cell fate decisions necessary for specification and function of embryonic and extra-embryonic tissues utilised in mammalian development and pregnancy.
Suh, Yeunsu; Davis, Michael E.; Lee, Kichoon
2013-01-01
Understanding the tissue-specific pattern of gene expression is critical in elucidating the molecular mechanisms of tissue development, gene function, and transcriptional regulations of biological processes. Although tissue-specific gene expression information is available in several databases, follow-up strategies to integrate and use these data are limited. The objective of the current study was to identify and evaluate novel tissue-specific genes in human and mouse tissues by performing comparative microarray database analysis and semi-quantitative PCR analysis. We developed a powerful approach to predict tissue-specific genes by analyzing existing microarray data from the NCBI′s Gene Expression Omnibus (GEO) public repository. We investigated and confirmed tissue-specific gene expression in the human and mouse kidney, liver, lung, heart, muscle, and adipose tissue. Applying our novel comparative microarray approach, we confirmed 10 kidney, 11 liver, 11 lung, 11 heart, 8 muscle, and 8 adipose specific genes. The accuracy of this approach was further verified by employing semi-quantitative PCR reaction and by searching for gene function information in existing publications. Three novel tissue-specific genes were discovered by this approach including AMDHD1 (amidohydrolase domain containing 1) in the liver, PRUNE2 (prune homolog 2) in the heart, and ACVR1C (activin A receptor, type IC) in adipose tissue. We further confirmed the tissue-specific expression of these 3 novel genes by real-time PCR. Among them, ACVR1C is adipose tissue-specific and adipocyte-specific in adipose tissue, and can be used as an adipocyte developmental marker. From GEO profiles, we predicted the processes in which AMDHD1 and PRUNE2 may participate. Our approach provides a novel way to identify new sets of tissue-specific genes and to predict functions in which they may be involved. PMID:23741331
Xu, Xiaodan; Li, Yingcong; Zhao, Heng; Wen, Si-yuan; Wang, Sheng-qi; Huang, Jian; Huang, Kun-lun; Luo, Yun-bo
2005-05-18
To devise a rapid and reliable method for the detection and identification of genetically modified (GM) events, we developed a multiplex polymerase chain reaction (PCR) coupled with a DNA microarray system simultaneously aiming at many targets in a single reaction. The system included probes for screening gene, species reference gene, specific gene, construct-specific gene, event-specific gene, and internal and negative control genes. 18S rRNA was combined with species reference genes as internal controls to assess the efficiency of all reactions and to eliminate false negatives. Two sets of the multiplex PCR system were used to amplify four and five targets, respectively. Eight different structure genes could be detected and identified simultaneously for Roundup Ready soybean in a single microarray. The microarray specificity was validated by its ability to discriminate two GM maizes Bt176 and Bt11. The advantages of this method are its high specificity and greatly reduced false-positives and -negatives. The multiplex PCR coupled with microarray technology presented here is a rapid and reliable tool for the simultaneous detection of GM organism ingredients.
Gene Editing and Gene-Based Therapeutics for Cardiomyopathies.
Ohiri, Joyce C; McNally, Elizabeth M
2018-04-01
With an increasing understanding of genetic defects leading to cardiomyopathy, focus is shifting to correcting these underlying genetic defects. One approach involves treating mutant RNA through antisense oligonucleotides; the first drug has received regulatory approval to treat specific mutations associated with Duchenne muscular dystrophy. Gene editing is being evaluated in the preclinical setting. For inherited cardiomyopathies, genetic correction strategies require tight specificity for the mutant allele. Gene-editing methods are being tested to create deletions that may be useful to restore protein expression by through the bypass of mutations that restore protein production. Site-specific gene editing, which is required to correct many point mutations, is a less efficient process than inducing deletions. Copyright © 2017 Elsevier Inc. All rights reserved.
Determining Semantically Related Significant Genes.
Taha, Kamal
2014-01-01
GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis
2012-01-01
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
Becnel, Lauren B; Ochsner, Scott A; Darlington, Yolanda F; McOwiti, Apollo; Kankanamge, Wasula H; Dehart, Michael; Naumov, Alexey; McKenna, Neil J
2017-04-25
We previously developed a web tool, Transcriptomine, to explore expression profiling data sets involving small-molecule or genetic manipulations of nuclear receptor signaling pathways. We describe advances in biocuration, query interface design, and data visualization that enhance the discovery of uncharacterized biology in these pathways using this tool. Transcriptomine currently contains about 45 million data points encompassing more than 2000 experiments in a reference library of nearly 550 data sets retrieved from public archives and systematically curated. To make the underlying data points more accessible to bench biologists, we classified experimental small molecules and gene manipulations into signaling pathways and experimental tissues and cell lines into physiological systems and organs. Incorporation of these mappings into Transcriptomine enables the user to readily evaluate tissue-specific regulation of gene expression by nuclear receptor signaling pathways. Data points from animal and cell model experiments and from clinical data sets elucidate the roles of nuclear receptor pathways in gene expression events accompanying various normal and pathological cellular processes. In addition, data sets targeting non-nuclear receptor signaling pathways highlight transcriptional cross-talk between nuclear receptors and other signaling pathways. We demonstrate with specific examples how data points that exist in isolation in individual data sets validate each other when connected and made accessible to the user in a single interface. In summary, Transcriptomine allows bench biologists to routinely develop research hypotheses, validate experimental data, or model relationships between signaling pathways, genes, and tissues. Copyright © 2017, American Association for the Advancement of Science.
Welker, Noah C; Habig, Jeffrey W; Bass, Brenda L
2007-07-01
We describe the first microarray analysis of a whole animal containing a mutation in the Dicer gene. We used adult Caenorhabditis elegans and, to distinguish among different roles of Dicer, we also performed microarray analyses of animals with mutations in rde-4 and rde-1, which are involved in silencing by siRNA, but not miRNA. Surprisingly, we find that the X chromosome is greatly enriched for genes regulated by Dicer. Comparison of all three microarray data sets indicates the majority of Dicer-regulated genes are not dependent on RDE-4 or RDE-1, including the X-linked genes. However, all three data sets are enriched in genes important for innate immunity and, specifically, show increased expression of innate immunity genes.
Welker, Noah C.; Habig, Jeffrey W.; Bass, Brenda L.
2007-01-01
We describe the first microarray analysis of a whole animal containing a mutation in the Dicer gene. We used adult Caenorhabditis elegans and, to distinguish among different roles of Dicer, we also performed microarray analyses of animals with mutations in rde-4 and rde-1, which are involved in silencing by siRNA, but not miRNA. Surprisingly, we find that the X chromosome is greatly enriched for genes regulated by Dicer. Comparison of all three microarray data sets indicates the majority of Dicer-regulated genes are not dependent on RDE-4 or RDE-1, including the X-linked genes. However, all three data sets are enriched in genes important for innate immunity and, specifically, show increased expression of innate immunity genes. PMID:17526642
A new fast method for inferring multiple consensus trees using k-medoids.
Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir
2018-04-05
Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.
Comparative Bacterial Proteomics: Analysis of the Core Genome Concept
Callister, Stephen J.; McCue, Lee Ann; Turse, Joshua E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.
2008-01-01
While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits. PMID:18253490
Beretta, Lorenzo; Santaniello, Alessandro; van Riel, Piet L C M; Coenen, Marieke J H; Scorza, Raffaella
2010-08-06
Epistasis is recognized as a fundamental part of the genetic architecture of individuals. Several computational approaches have been developed to model gene-gene interactions in case-control studies, however, none of them is suitable for time-dependent analysis. Herein we introduce the Survival Dimensionality Reduction (SDR) algorithm, a non-parametric method specifically designed to detect epistasis in lifetime datasets. The algorithm requires neither specification about the underlying survival distribution nor about the underlying interaction model and proved satisfactorily powerful to detect a set of causative genes in synthetic epistatic lifetime datasets with a limited number of samples and high degree of right-censorship (up to 70%). The SDR method was then applied to a series of 386 Dutch patients with active rheumatoid arthritis that were treated with anti-TNF biological agents. Among a set of 39 candidate genes, none of which showed a detectable marginal effect on anti-TNF responses, the SDR algorithm did find that the rs1801274 SNP in the Fc gamma RIIa gene and the rs10954213 SNP in the IRF5 gene non-linearly interact to predict clinical remission after anti-TNF biologicals. Simulation studies and application in a real-world setting support the capability of the SDR algorithm to model epistatic interactions in candidate-genes studies in presence of right-censored data. http://sourceforge.net/projects/sdrproject/.
Molecular mechanisms of floral organ specification by MADS domain proteins.
Yan, Wenhao; Chen, Dijun; Kaufmann, Kerstin
2016-02-01
Flower development is a model system to understand organ specification in plants. The identities of different types of floral organs are specified by homeotic MADS transcription factors that interact in a combinatorial fashion. Systematic identification of DNA-binding sites and target genes of these key regulators show that they have shared and unique sets of target genes. DNA binding by MADS proteins is not based on 'simple' recognition of a specific DNA sequence, but depends on DNA structure and combinatorial interactions. Homeotic MADS proteins regulate gene expression via alternative mechanisms, one of which may be to modulate chromatin structure and accessibility in their target gene promoters. Copyright © 2015 Elsevier Ltd. All rights reserved.
Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila
2017-07-12
The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets. SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.
2010-01-01
Background Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development. Conclusions The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction. PMID:21062474
Aspler, Anne L; Bolshin, Carly; Vernon, Suzanne D; Broderick, Gordon
2008-09-26
Genomic profiling of peripheral blood reveals altered immunity in chronic fatigue syndrome (CFS) however interpretation remains challenging without immune demographic context. The object of this work is to identify modulation of specific immune functional components and restructuring of co-expression networks characteristic of CFS using the quantitative genomics of peripheral blood. Gene sets were constructed a priori for CD4+ T cells, CD8+ T cells, CD19+ B cells, CD14+ monocytes and CD16+ neutrophils from published data. A group of 111 women were classified using empiric case definition (U.S. Centers for Disease Control and Prevention) and unsupervised latent cluster analysis (LCA). Microarray profiles of peripheral blood were analyzed for expression of leukocyte-specific gene sets and characteristic changes in co-expression identified from topological evaluation of linear correlation networks. Median expression for a set of 6 genes preferentially up-regulated in CD19+ B cells was significantly lower in CFS (p = 0.01) due mainly to PTPRK and TSPAN3 expression. Although no other gene set was differentially expressed at p < 0.05, patterns of co-expression in each group differed markedly. Significant co-expression of CD14+ monocyte with CD16+ neutrophil (p = 0.01) and CD19+ B cell sets (p = 0.00) characterized CFS and fatigue phenotype groups. Also in CFS was a significant negative correlation between CD8+ and both CD19+ up-regulated (p = 0.02) and NK gene sets (p = 0.08). These patterns were absent in controls. Dissection of blood microarray profiles points to B cell dysfunction with coordinated immune activation supporting persistent inflammation and antibody-mediated NK cell modulation of T cell activity. This has clinical implications as the CD19+ genes identified could provide robust and biologically meaningful basis for the early detection and unambiguous phenotyping of CFS.
Biomarkers of the Hedgehog/Smoothened pathway in healthy volunteers
Kadam, Sunil K; Patel, Bharvin K R; Jones, Emma; Nguyen, Tuan S; Verma, Lalit K; Landschulz, Katherine T; Stepaniants, Sergey; Li, Bin; Brandt, John T; Brail, Leslie H
2012-01-01
The Hedgehog (Hh) pathway is involved in oncogenic transformation and tumor maintenance. The primary objective of this study was to select surrogate tissue to measure messenger ribonucleic acid (mRNA) levels of Hh pathway genes for measurement of pharmacodynamic effect. Expression of Hh pathway specific genes was measured by quantitative real time polymerase chain reaction (qRT-PCR) and global gene expression using Affymetrix U133 microarrays. Correlations were made between the expression of specific genes determined by qRT-PCR and normalized microarray data. Gene ontology analysis using microarray data for a broader set of Hh pathway genes was performed to identify additional Hh pathway-related markers in the surrogate tissue. RNA extracted from blood, hair follicle, and skin obtained from healthy subjects was analyzed by qRT-PCR for 31 genes, whereas 8 samples were analyzed for a 7-gene subset. Twelve sample sets, each with ≤500 ng total RNA derived from hair, skin, and blood, were analyzed using Affymetrix U133 microarrays. Transcripts for several Hh pathway genes were undetectable in blood using qRT-PCR. Skin was the most desirable matrix, followed by hair follicle. Whether processed by robust multiarray average or microarray suite 5 (MAS5), expression patterns of individual samples showed co-clustered signals; both normalization methods were equally effective for unsupervised analysis. The MAS5- normalized probe sets appeared better suited for supervised analysis. This work provides the basis for selection of a surrogate tissue and an expression analysis-based approach to evaluate pathway-related genes as markers of pharmacodynamic effect with novel inhibitors of the Hh pathway. PMID:22611475
GAVIN: Gene-Aware Variant INterpretation for medical sequencing.
van der Velde, K Joeri; de Boer, Eddy N; van Diemen, Cleo C; Sikkema-Raddatz, Birgit; Abbott, Kristin M; Knopperts, Alain; Franke, Lude; Sijmons, Rolf H; de Koning, Tom J; Wijmenga, Cisca; Sinke, Richard J; Swertz, Morris A
2017-01-16
We present Gene-Aware Variant INterpretation (GAVIN), a new method that accurately classifies variants for clinical diagnostic purposes. Classifications are based on gene-specific calibrations of allele frequencies from the ExAC database, likely variant impact using SnpEff, and estimated deleteriousness based on CADD scores for >3000 genes. In a benchmark on 18 clinical gene sets, we achieve a sensitivity of 91.4% and a specificity of 76.9%. This accuracy is unmatched by 12 other tools. We provide GAVIN as an online MOLGENIS service to annotate VCF files and as an open source executable for use in bioinformatic pipelines. It can be found at http://molgenis.org/gavin .
Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino
2014-01-30
The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Cornish, Alex J; Filippis, Ioannis; David, Alessia; Sternberg, Michael J E
2015-09-01
Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease. However, there currently exists no systematic mapping between cell types and the diseases they can cause. In this study, we integrate protein-protein interaction data with high-quality cell-type-specific gene expression data from the FANTOM5 project to build the largest collection of cell-type-specific interactomes created to date. We develop a novel method, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes across 73 cell-type-specific interactomes to map genes associated with 196 diseases to the cell types they affect. We conduct text-mining of the PubMed database to produce an independent resource of disease-associated cell types, which we use to validate our method. The GSC method successfully identifies known disease-cell-type associations, as well as highlighting associations that warrant further study. This includes mast cells and multiple sclerosis, a cell population currently being targeted in a multiple sclerosis phase 2 clinical trial. Furthermore, we build a cell-type-based diseasome using the cell types identified as manifesting each disease, offering insight into diseases linked through etiology. The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems. Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.
In silico pathway analysis in cervical carcinoma reveals potential new targets for treatment
van Dam, Peter A.; van Dam, Pieter-Jan H. H.; Rolfo, Christian; Giallombardo, Marco; van Berckelaer, Christophe; Trinh, Xuan Bich; Altintas, Sevilay; Huizing, Manon; Papadimitriou, Kostas; Tjalma, Wiebren A. A.; van Laere, Steven
2016-01-01
An in silico pathway analysis was performed in order to improve current knowledge on the molecular drivers of cervical cancer and detect potential targets for treatment. Three publicly available Affymetrix gene expression data-sets (GSE5787, GSE7803, GSE9750) were retrieved, vouching for a total of 9 cervical cancer cell lines (CCCLs), 39 normal cervical samples, 7 CIN3 samples and 111 cervical cancer samples (CCSs). Predication analysis of microarrays was performed in the Affymetrix sets to identify cervical cancer biomarkers. To select cancer cell-specific genes the CCSs were compared to the CCCLs. Validated genes were submitted to a gene set enrichment analysis (GSEA) and Expression2Kinases (E2K). In the CCSs a total of 1,547 probe sets were identified that were overexpressed (FDR < 0.1). Comparing to CCCLs 560 probe sets (481 unique genes) had a cancer cell-specific expression profile, and 315 of these genes (65%) were validated. GSEA identified 5 cancer hallmarks enriched in CCSs (P < 0.01 and FDR < 0.25) showing that deregulation of the cell cycle is a major component of cervical cancer biology. E2K identified a protein-protein interaction (PPI) network of 162 nodes (including 20 drugable kinases) and 1626 edges. This PPI-network consists of 5 signaling modules associated with MYC signaling (Module 1), cell cycle deregulation (Module 2), TGFβ-signaling (Module 3), MAPK signaling (Module 4) and chromatin modeling (Module 5). Potential targets for treatment which could be identified were CDK1, CDK2, ABL1, ATM, AKT1, MAPK1, MAPK3 among others. The present study identified important driver pathways in cervical carcinogenesis which should be assessed for their potential therapeutic drugability. PMID:26701206
Singh, Anuradha; Mantri, Shrikant; Sharma, Monica; Chaudhury, Ashok; Tuli, Rakesh; Roy, Joy
2014-01-16
The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers.
2014-01-01
Background The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Results Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. Conclusions The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers. PMID:24433256
About miRNAs, miRNA seeds, target genes and target pathways.
Kehl, Tim; Backes, Christina; Kern, Fabian; Fehlmann, Tobias; Ludwig, Nicole; Meese, Eckart; Lenhof, Hans-Peter; Keller, Andreas
2017-12-05
miRNAs are typically repressing gene expression by binding to the 3' UTR, leading to degradation of the mRNA. This process is dominated by the eight-base seed region of the miRNA. Further, miRNAs are known not only to target genes but also to target significant parts of pathways. A logical line of thoughts is: miRNAs with similar (seed) sequence target similar sets of genes and thus similar sets of pathways. By calculating similarity scores for all 3.25 million pairs of 2,550 human miRNAs, we found that this pattern frequently holds, while we also observed exceptions. Respective results were obtained for both, predicted target genes as well as experimentally validated targets. We note that miRNAs target gene set similarity follows a bimodal distribution, pointing at a set of 282 miRNAs that seems to target genes with very high specificity. Further, we discuss miRNAs with different (seed) sequences that nonetheless regulate similar gene sets or pathways. Most intriguingly, we found miRNA pairs that regulate different gene sets but similar pathways such as miR-6886-5p and miR-3529-5p. These are jointly targeting different parts of the MAPK signaling cascade. The main goal of this study is to provide a general overview on the results, to highlight a selection of relevant results on miRNAs, miRNA seeds, target genes and target pathways and to raise awareness for artifacts in respective comparisons. The full set of information that allows to infer detailed results on each miRNA has been included in miRPathDB, the miRNA target pathway database (https://mpd.bioinf.uni-sb.de).
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Pham, Kieu Thi Minh; Inoue, Yoshihiro; Vu, Ba Van; Nguyen, Hanh Hieu; Nakayashiki, Toru; Ikeda, Ken-ichi; Nakayashiki, Hitoshi
2015-01-01
Here we report the genetic analyses of histone lysine methyltransferase (KMT) genes in the phytopathogenic fungus Magnaporthe oryzae. Eight putative M. oryzae KMT genes were targeted for gene disruption by homologous recombination. Phenotypic assays revealed that the eight KMTs were involved in various infection processes at varying degrees. Moset1 disruptants (Δmoset1) impaired in histone H3 lysine 4 methylation (H3K4me) showed the most severe defects in infection-related morphogenesis, including conidiation and appressorium formation. Consequently, Δmoset1 lost pathogenicity on wheat host plants, thus indicating that H3K4me is an important epigenetic mark for infection-related gene expression in M. oryzae. Interestingly, appressorium formation was greatly restored in the Δmoset1 mutants by exogenous addition of cAMP or of the cutin monomer, 16-hydroxypalmitic acid. The Δmoset1 mutants were still infectious on the super-susceptible barley cultivar Nigrate. These results suggested that MoSET1 plays roles in various aspects of infection, including signal perception and overcoming host-specific resistance. However, since Δmoset1 was also impaired in vegetative growth, the impact of MoSET1 on gene regulation was not infection specific. ChIP-seq analysis of H3K4 di- and tri-methylation (H3K4me2/me3) and MoSET1 protein during infection-related morphogenesis, together with RNA-seq analysis of the Δmoset1 mutant, led to the following conclusions: 1) Approximately 5% of M. oryzae genes showed significant changes in H3K4-me2 or -me3 abundance during infection-related morphogenesis. 2) In general, H3K4-me2 and -me3 abundance was positively associated with active transcription. 3) Lack of MoSET1 methyltransferase, however, resulted in up-regulation of a significant portion of the M. oryzae genes in the vegetative mycelia (1,491 genes), and during infection-related morphogenesis (1,385 genes), indicating that MoSET1 has a role in gene repression either directly or more likely indirectly. 4) Among the 4,077 differentially expressed genes (DEGs) between mycelia and germination tubes, 1,201 and 882 genes were up- and down-regulated, respectively, in a Moset1-dependent manner. 5) The Moset1-dependent DEGs were enriched in several gene categories such as signal transduction, transport, RNA processing, and translation. PMID:26230995
A whole blood gene expression-based signature for smoking status
2012-01-01
Background Smoking is the leading cause of preventable death worldwide and has been shown to increase the risk of multiple diseases including coronary artery disease (CAD). We sought to identify genes whose levels of expression in whole blood correlate with self-reported smoking status. Methods Microarrays were used to identify gene expression changes in whole blood which correlated with self-reported smoking status; a set of significant genes from the microarray analysis were validated by qRT-PCR in an independent set of subjects. Stepwise forward logistic regression was performed using the qRT-PCR data to create a predictive model whose performance was validated in an independent set of subjects and compared to cotinine, a nicotine metabolite. Results Microarray analysis of whole blood RNA from 209 PREDICT subjects (41 current smokers, 4 quit ≤ 2 months, 64 quit > 2 months, 100 never smoked; NCT00500617) identified 4214 genes significantly correlated with self-reported smoking status. qRT-PCR was performed on 1,071 PREDICT subjects across 256 microarray genes significantly correlated with smoking or CAD. A five gene (CLDND1, LRRN3, MUC1, GOPC, LEF1) predictive model, derived from the qRT-PCR data using stepwise forward logistic regression, had a cross-validated mean AUC of 0.93 (sensitivity=0.78; specificity=0.95), and was validated using 180 independent PREDICT subjects (AUC=0.82, CI 0.69-0.94; sensitivity=0.63; specificity=0.94). Plasma from the 180 validation subjects was used to assess levels of cotinine; a model using a threshold of 10 ng/ml cotinine resulted in an AUC of 0.89 (CI 0.81-0.97; sensitivity=0.81; specificity=0.97; kappa with expression model = 0.53). Conclusion We have constructed and validated a whole blood gene expression score for the evaluation of smoking status, demonstrating that clinical and environmental factors contributing to cardiovascular disease risk can be assessed by gene expression. PMID:23210427
Genome organization and characteristics of soybean microRNAs
2012-01-01
Background microRNAs (miRNAs) are key regulators of gene expression and play important roles in many aspects of plant biology. The role(s) of miRNAs in nitrogen-fixing root nodules of leguminous plants such as soybean is not well understood. We examined a library of small RNAs from Bradyrhizobium japonicum-inoculated soybean roots and identified novel miRNAs. In order to enhance our understanding of miRNA evolution, diversification and function, we classified all known soybean miRNAs based on their phylogenetic conservation (conserved, legume- and soybean-specific miRNAs) and examined their genome organization, family characteristics and target diversity. We predicted targets of these miRNAs and experimentally validated several of them. We also examined organ-specific expression of selected miRNAs and their targets. Results We identified 120 previously unknown miRNA genes from soybean including 5 novel miRNA families. In the soybean genome, genes encoding miRNAs are primarily intergenic and a small percentage were intragenic or less than 1000 bp from a protein-coding gene, suggesting potential co-regulation between the miRNA and its parent gene. Difference in number and orientation of tandemly duplicated miRNA genes between orthologous genomic loci indicated continuous evolution and diversification. Conserved miRNA families are often larger in size and produce less diverse mature miRNAs than legume- and soybean-specific families. In addition, the majority of conserved and legume-specific miRNA families produce 21 nt long mature miRNAs with distinct nucleotide distribution and regulate a more conserved set of target mRNAs compared to soybean-specific families. A set of nodule-specific target mRNAs and their cognate regulatory miRNAs had inverse expression between root and nodule tissues suggesting that spatial restriction of target gene transcripts by miRNAs might govern nodule-specific gene expression in soybean. Conclusions Genome organization of soybean miRNAs suggests that they are actively evolving. Distinct family characteristics of soybean miRNAs suggest continuous diversification of function. Inverse organ-specific expression between selected miRNAs and their targets in the roots and nodules, suggested a potential role for these miRNAs in regulating nodule development. PMID:22559273
Towards an informative mutant phenotype for every bacterial gene
Deutschbauer, Adam; Price, Morgan N.; Wetmore, Kelly M.; ...
2014-08-11
Mutant phenotypes provide strong clues to the functions of the underlying genes and could allow annotation of the millions of sequenced yet uncharacterized bacterial genes. However, it is not known how many genes have a phenotype under laboratory conditions, how many phenotypes are biologically interpretable for predicting gene function, and what experimental conditions are optimal to maximize the number of genes with a phenotype. To address these issues, we measured the mutant fitness of 1,586 genes of the ethanol-producing bacterium Zymomonas mobilis ZM4 across 492 diverse experiments and found statistically significant phenotypes for 89% of all assayed genes. Thus, inmore » Z. mobilis, most genes have a functional consequence under laboratory conditions. We demonstrate that 41% of Z. mobilis genes have both a strong phenotype and a similar fitness pattern (cofitness) to another gene, and are therefore good candidates for functional annotation using mutant fitness. Among 502 poorly characterized Z. mobilis genes, we identified a significant cofitness relationship for 174. For 57 of these genes without a specific functional annotation, we found additional evidence to support the biological significance of these gene-gene associations, and in 33 instances, we were able to predict specific physiological or biochemical roles for the poorly characterized genes. Last, we identified a set of 79 diverse mutant fitness experiments in Z. mobilis that are nearly as biologically informative as the entire set of 492 experiments. Therefore, our work provides a blueprint for the functional annotation of diverse bacteria using mutant fitness.« less
Robinson, Gene E.; Jakobsson, Eric
2016-01-01
The emerging field of sociogenomics explores the relations between social behavior and genome structure and function. An important question is the extent to which associations between social behavior and gene expression are conserved among the Metazoa. Prior experimental work in an invertebrate model of social behavior, the honey bee, revealed distinct brain gene expression patterns in African and European honey bees, and within European honey bees with different behavioral phenotypes. The present work is a computational study of these previous findings in which we analyze, by orthology determination, the extent to which genes that are socially regulated in honey bees are conserved across the Metazoa. We found that the differentially expressed gene sets associated with alarm pheromone response, the difference between old and young bees, and the colony influence on soldier bees, are enriched in widely conserved genes, indicating that these differences have genomic bases shared with many other metazoans. By contrast, the sets of differentially expressed genes associated with the differences between African and European forager and guard bees are depleted in widely conserved genes, indicating that the genomic basis for this social behavior is relatively specific to honey bees. For the alarm pheromone response gene set, we found a particularly high degree of conservation with mammals, even though the alarm pheromone itself is bee-specific. Gene Ontology identification of human orthologs to the strongly conserved honey bee genes associated with the alarm pheromone response shows overrepresentation of protein metabolism, regulation of protein complex formation, and protein folding, perhaps associated with remodeling of critical neural circuits in response to alarm pheromone. We hypothesize that such remodeling may be an adaptation of social animals to process and respond appropriately to the complex patterns of conspecific communication essential for social organization. PMID:27359102
Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J
2012-01-01
Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. PMID:22882546
Differential Effect of Active Smoking on Gene Expression in Male and Female Smokers
Paul, Sunirmal; Amundson, Sally A
2015-01-01
Smoking is the second leading cause of preventable death in the United States. Cohort epidemiological studies have demonstrated that women are more vulnerable to cigarette-smoking induced diseases than their male counterparts, however, the molecular basis of these differences has remained unknown. In this study, we explored if there were differences in the gene expression patterns between male and female smokers, and how these patterns might reflect different sex-specific responses to the stress of smoking. Using whole genome microarray gene expression profiling, we found that a substantial number of oxidant related genes were expressed in both male and female smokers, however, smoking-responsive genes did indeed differ greatly between male and female smokers. Gene set enrichment analysis (GSEA) against reference oncogenic signature gene sets identified a large number of oncogenic pathway gene-sets that were significantly altered in female smokers compared to male smokers. In addition, functional annotation with Ingenuity Pathway Analysis (IPA) identified smoking-correlated genes associated with biological functions in male and female smokers that are directly relevant to well-known smoking related pathologies. However, these relevant biological functions were strikingly overrepresented in female smokers compared to male smokers. IPA network analysis with the functional categories of immune and inflammatory response gene products suggested potential interactions between smoking response and female hormones. Our results demonstrate a striking dichotomy between male and female gene expression responses to smoking. This is the first genome-wide expression study to compare the sex-specific impacts of smoking at a molecular level and suggests a novel potential connection between sex hormone signaling and smoking-induced diseases in female smokers. PMID:25621181
Liu, Hui; Robinson, Gene E; Jakobsson, Eric
2016-06-01
The emerging field of sociogenomics explores the relations between social behavior and genome structure and function. An important question is the extent to which associations between social behavior and gene expression are conserved among the Metazoa. Prior experimental work in an invertebrate model of social behavior, the honey bee, revealed distinct brain gene expression patterns in African and European honey bees, and within European honey bees with different behavioral phenotypes. The present work is a computational study of these previous findings in which we analyze, by orthology determination, the extent to which genes that are socially regulated in honey bees are conserved across the Metazoa. We found that the differentially expressed gene sets associated with alarm pheromone response, the difference between old and young bees, and the colony influence on soldier bees, are enriched in widely conserved genes, indicating that these differences have genomic bases shared with many other metazoans. By contrast, the sets of differentially expressed genes associated with the differences between African and European forager and guard bees are depleted in widely conserved genes, indicating that the genomic basis for this social behavior is relatively specific to honey bees. For the alarm pheromone response gene set, we found a particularly high degree of conservation with mammals, even though the alarm pheromone itself is bee-specific. Gene Ontology identification of human orthologs to the strongly conserved honey bee genes associated with the alarm pheromone response shows overrepresentation of protein metabolism, regulation of protein complex formation, and protein folding, perhaps associated with remodeling of critical neural circuits in response to alarm pheromone. We hypothesize that such remodeling may be an adaptation of social animals to process and respond appropriately to the complex patterns of conspecific communication essential for social organization.
Mourad, Amira M I; Sallam, Ahmed; Belamkar, Vikas; Wegulo, Stephen; Bowden, Robert; Jin, Yue; Mahdy, Ezzat; Bakheit, Bahy; El-Wafaa, Atif A; Poland, Jesse; Baenziger, Peter S
2018-01-01
Stem rust (caused by Puccinia graminis f. sp. tritici Erikss. & E. Henn.), is a major disease in wheat ( Triticum aestivium L.). However, in recent years it occurs rarely in Nebraska due to weather and the effective selection and gene pyramiding of resistance genes. To understand the genetic basis of stem rust resistance in Nebraska winter wheat, we applied genome-wide association study (GWAS) on a set of 270 winter wheat genotypes (A-set). Genotyping was carried out using genotyping-by-sequencing and ∼35,000 high-quality SNPs were identified. The tested genotypes were evaluated for their resistance to the common stem rust race in Nebraska (QFCSC) in two replications. Marker-trait association identified 32 SNP markers, which were significantly (Bonferroni corrected P < 0.05) associated with the resistance on chromosome 2D. The chromosomal location of the significant SNPs (chromosome 2D) matched the location of Sr6 gene which was expected in these genotypes based on pedigree information. A highly significant linkage disequilibrium (LD, r 2 ) was found between the significant SNPs and the specific SSR marker for the Sr6 gene ( Xcfd43 ). This suggests the significant SNP markers are tagging Sr6 gene. Out of the 32 significant SNPs, eight SNPs were in six genes that are annotated as being linked to disease resistance in the IWGSC RefSeq v1.0. The 32 significant SNP markers were located in nine haplotype blocks. All the 32 significant SNPs were validated in a set of 60 different genotypes (V-set) using single marker analysis. SNP markers identified in this study can be used in marker-assisted selection, genomic selection, and to develop KASP (Kompetitive Allele Specific PCR) marker for the Sr6 gene. Novel SNPs for Sr6 gene, an important stem rust resistant gene, were identified and validated in this study. These SNPs can be used to improve stem rust resistance in wheat.
Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Nao; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming
2015-01-01
Summary Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulating hoxb gene pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated KD or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb gene expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages. PMID:26725110
Serial analysis of gene expression (SAGE) in bovine trypanotolerance: preliminary results
2003-01-01
In Africa, trypanosomosis is a tsetse-transmitted disease which represents the most important constraint to livestock production. Several indigenous West African taurine (Bos taurus) breeds, such as the Longhorn (N'Dama) cattle are well known to control trypanosome infections. This genetic ability named "trypanotolerance" results from various biological mechanisms under multigenic control. The methodologies used so far have not succeeded in identifying the complete pool of genes involved in trypanotolerance. New post genomic biotechnologies such as transcriptome analyses are efficient in characterising the pool of genes involved in the expression of specific biological functions. We used the serial analysis of gene expression (SAGE) technique to construct, from Peripheral Blood Mononuclear Cells of an N'Dama cow, 2 total mRNA transcript libraries, at day 0 of a Trypanosoma congolense experimental infection and at day 10 post-infection, corresponding to the peak of parasitaemia. Bioinformatic comparisons in the bovine genomic databases allowed the identification of 187 up- and down- regulated genes, EST and unknown functional genes. Identification of the genes involved in trypanotolerance will allow to set up specific microarray sets for further metabolic and pharmacological studies and to design field marker-assisted selection by introgression programmes. PMID:12927079
Serial analysis of gene expression (SAGE) in bovine trypanotolerance: preliminary results.
Berthier, David; Quéré, Ronan; Thevenon, Sophie; Belemsaga, Désiré; Piquemal, David; Marti, Jacques; Maillard, Jean-Charles
2003-01-01
In Africa, trypanosomosis is a tsetse-transmitted disease which represents the most important constraint to livestock production. Several indigenous West African taurine Bos taurus) breeds, such as the Longhorn (N'Dama) cattle are well known to control trypanosome infections. This genetic ability named "trypanotolerance" results from various biological mechanisms under multigenic control. The methodologies used so far have not succeeded in identifying the complete pool of genes involved in trypanotolerance. New post genomic biotechnologies such as transcriptome analyses are efficient in characterising the pool of genes involved in the expression of specific biological functions. We used the serial analysis of gene expression (SAGE) technique to construct, from Peripheral Blood Mononuclear Cells of an N'Dama cow, 2 total mRNA transcript libraries, at day 0 of a Trypanosoma congolense experimental infection and at day 10 post-infection, corresponding to the peak of parasitaemia. Bioinformatic comparisons in the bovine genomic databases allowed the identification of 187 up- and down- regulated genes, EST and unknown functional genes. Identification of the genes involved in trypanotolerance will allow to set up specific microarray sets for further metabolic and pharmacological studies and to design field marker-assisted selection by introgression programmes.
Arm-specific dynamics of chromosome evolution in malaria mosquitoes
2011-01-01
Background The malaria mosquito species of subgenus Cellia have rich inversion polymorphisms that correlate with environmental variables. Polymorphic inversions tend to cluster on the chromosomal arms 2R and 2L but not on X, 3R and 3L in Anopheles gambiae and homologous arms in other species. However, it is unknown whether polymorphic inversions on homologous chromosomal arms of distantly related species from subgenus Cellia nonrandomly share similar sets of genes. It is also unclear if the evolutionary breakage of inversion-poor chromosomal arms is under constraints. Results To gain a better understanding of the arm-specific differences in the rates of genome rearrangements, we compared gene orders and established syntenic relationships among Anopheles gambiae, Anopheles funestus, and Anopheles stephensi. We provided evidence that polymorphic inversions on the 2R arms in these three species nonrandomly captured similar sets of genes. This nonrandom distribution of genes was not only a result of preservation of ancestral gene order but also an outcome of extensive reshuffling of gene orders that created new combinations of homologous genes within independently originated polymorphic inversions. The statistical analysis of distribution of conserved gene orders demonstrated that the autosomal arms differ in their tolerance to generating evolutionary breakpoints. The fastest evolving 2R autosomal arm was enriched with gene blocks conserved between only a pair of species. In contrast, all identified syntenic blocks were preserved on the slowly evolving 3R arm of An. gambiae and on the homologous arms of An. funestus and An. stephensi. Conclusions Our results suggest that natural selection favors specific gene combinations within polymorphic inversions when distant species are exposed to similar environmental pressures. This knowledge could be useful for the discovery of genes responsible for an association of inversion polymorphisms with phenotypic variations in multiple species. Our data support the chromosomal arm specificity in rates of gene order disruption during mosquito evolution. We conclude that the distribution of breakpoint regions is evolutionary conserved on slowly evolving arms and tends to be lineage-specific on rapidly evolving arms. PMID:21473772
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deutschbauer, Adam; Price, Morgan N.; Wetmore, Kelly M.
Mutant phenotypes provide strong clues to the functions of the underlying genes and could allow annotation of the millions of sequenced yet uncharacterized bacterial genes. However, it is not known how many genes have a phenotype under laboratory conditions, how many phenotypes are biologically interpretable for predicting gene function, and what experimental conditions are optimal to maximize the number of genes with a phenotype. To address these issues, we measured the mutant fitness of 1,586 genes of the ethanol-producing bacterium Zymomonas mobilis ZM4 across 492 diverse experiments and found statistically significant phenotypes for 89% of all assayed genes. Thus, inmore » Z. mobilis, most genes have a functional consequence under laboratory conditions. We demonstrate that 41% of Z. mobilis genes have both a strong phenotype and a similar fitness pattern (cofitness) to another gene, and are therefore good candidates for functional annotation using mutant fitness. Among 502 poorly characterized Z. mobilis genes, we identified a significant cofitness relationship for 174. For 57 of these genes without a specific functional annotation, we found additional evidence to support the biological significance of these gene-gene associations, and in 33 instances, we were able to predict specific physiological or biochemical roles for the poorly characterized genes. Last, we identified a set of 79 diverse mutant fitness experiments in Z. mobilis that are nearly as biologically informative as the entire set of 492 experiments. Therefore, our work provides a blueprint for the functional annotation of diverse bacteria using mutant fitness.« less
Richert, Kathrin; Brambilla, Evelyne; Stackebrandt, Erko
2005-01-01
PCR primer sets were developed for the specific amplification and sequence analyses encoding the gyrase subunit B (gyrB) of members of the family Microbacteriaceae, class Actinobacteria. The family contains species highly related by 16S rRNA gene sequence analyses. In order to test if the gene sequence analysis of gyrB is appropriate to discriminate between closely related species, we evaluate the 16S rRNA gene phylogeny of its members. As the published universal primer set for gyrB failed to amplify the responding gene of the majority of the 80 type strains of the family, three new primer sets were identified that generated fragments with a composite sequence length of about 900 nt. However, the amplification of all three fragments was successful only in 25% of the 80 type strains. In this study, the substitution frequencies in genes encoding gyrase and 16S rDNA were compared for 10 strains of nine genera. The frequency of gyrB nucleotide substitution is significantly higher than that of the 16S rDNA, and no linear correlation exists between the similarities of both molecules among members of the Microbacteriaceae. The phylogenetic analyses using the gyrB sequences provide higher resolution than using 16S rDNA sequences and seem able to discriminate between closely related species.
Combining Gene Signatures Improves Prediction of Breast Cancer Survival
Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian
2011-01-01
Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775
Lu, Jianguo; Peatman, Eric; Tang, Haibao; Lewis, Joshua; Liu, Zhanjiang
2012-06-15
Gene duplication has had a major impact on genome evolution. Localized (or tandem) duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes. Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks), and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish. We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication. This is particularly true for the zebrafish genome. Further analysis of the duplicated gene sets indicated that a significant portion of duplicated genes in the zebrafish genome were of recent, lineage-specific duplication events. Most strikingly, a subset of duplicated genes is enriched among the recently duplicated genes involved in immune or sensory response pathways. Such findings demonstrated the significance of continuous gene duplication as well as that of whole genome duplication in the course of genome evolution.
Robustness, evolvability, and the logic of genetic regulation.
Payne, Joshua L; Moore, Jason H; Wagner, Andreas
2014-01-01
In gene regulatory circuits, the expression of individual genes is commonly modulated by a set of regulating gene products, which bind to a gene's cis-regulatory region. This region encodes an input-output function, referred to as signal-integration logic, that maps a specific combination of regulatory signals (inputs) to a particular expression state (output) of a gene. The space of all possible signal-integration functions is vast and the mapping from input to output is many-to-one: For the same set of inputs, many functions (genotypes) yield the same expression output (phenotype). Here, we exhaustively enumerate the set of signal-integration functions that yield identical gene expression patterns within a computational model of gene regulatory circuits. Our goal is to characterize the relationship between robustness and evolvability in the signal-integration space of regulatory circuits, and to understand how these properties vary between the genotypic and phenotypic scales. Among other results, we find that the distributions of genotypic robustness are skewed, so that the majority of signal-integration functions are robust to perturbation. We show that the connected set of genotypes that make up a given phenotype are constrained to specific regions of the space of all possible signal-integration functions, but that as the distance between genotypes increases, so does their capacity for unique innovations. In addition, we find that robust phenotypes are (i) evolvable, (ii) easily identified by random mutation, and (iii) mutationally biased toward other robust phenotypes. We explore the implications of these latter observations for mutation-based evolution by conducting random walks between randomly chosen source and target phenotypes. We demonstrate that the time required to identify the target phenotype is independent of the properties of the source phenotype.
Use of molecular techniques to evaluate the survival of a microorganism injected into an aquifer
Thiem, S.M.; Krumme, M.L.; Smith, R.L.; Tiedje, J.M.
1994-01-01
A PCR primer set and an internal probe that are specific for Pseudomonas sp. strain B13, a 3-chlorobenzoate-metabolizing strain, were developed. Using this primer set and probe, we were able to detect Pseudomonas sp. strain B13 DNA sequences in DNA extracted from aquifer samples 14.5 months after Pseudomonas sp. strain B13 had been injected into a sand and gravel aquifer. This primer set and probe were also used to analyze isolates from 3-chlorobenzoate enrichments of the aquifer samples by Southern blot analysis. Hybridization of Southern blots with the Pseudomonas sp. strain B13-specific probe and a catabolic probe in conjunction with restriction fragment length polymorphism (RFLP) analysis of ribosome genes was used to determine that viable Pseudomonas sp. strain B13 persisted in this environment. We isolated a new 3-chlorobenzoate-degrading strain from one of these enrichment cultures. The B13-specific probe does not hybridize to DNA from this isolate. The new strain could be the result of gene exchange between Pseudomonas sp. strain B13 and an indigenous bacterium. This speculation is based on an RFLP pattern of ribosome genes that differs from that of Pseudomonas sp. strain B13, the fact that identically sized restriction fragments hybridized to the catabolic gene probe, and the absence of any enrichable 3-chlorobenzoate-degrading strains in the aquifer prior to inoculation.
Primer sets for cloning the human repertoire of T cell Receptor Variable regions.
Boria, Ilenia; Cotella, Diego; Dianzani, Irma; Santoro, Claudio; Sblattero, Daniele
2008-08-29
Amplification and cloning of naïve T cell Receptor (TR) repertoires or antigen-specific TR is crucial to shape immune response and to develop immuno-based therapies. TR variable (V) regions are encoded by several genes that recombine during T cell development. The cloning of expressed genes as large diverse libraries from natural sources relies upon the availability of primers able to amplify as many V genes as possible. Here, we present a list of primers computationally designed on all functional TR V and J genes listed in the IMGT, the ImMunoGeneTics information system. The list consists of unambiguous or degenerate primers suitable to theoretically amplify and clone the entire TR repertoire. We show that it is possible to selectively amplify and clone expressed TR V genes in one single RT-PCR step and from as little as 1000 cells. This new primer set will facilitate the creation of more diverse TR libraries than has been possible using currently available primer sets.
2010-01-01
Background Epistasis is recognized as a fundamental part of the genetic architecture of individuals. Several computational approaches have been developed to model gene-gene interactions in case-control studies, however, none of them is suitable for time-dependent analysis. Herein we introduce the Survival Dimensionality Reduction (SDR) algorithm, a non-parametric method specifically designed to detect epistasis in lifetime datasets. Results The algorithm requires neither specification about the underlying survival distribution nor about the underlying interaction model and proved satisfactorily powerful to detect a set of causative genes in synthetic epistatic lifetime datasets with a limited number of samples and high degree of right-censorship (up to 70%). The SDR method was then applied to a series of 386 Dutch patients with active rheumatoid arthritis that were treated with anti-TNF biological agents. Among a set of 39 candidate genes, none of which showed a detectable marginal effect on anti-TNF responses, the SDR algorithm did find that the rs1801274 SNP in the FcγRIIa gene and the rs10954213 SNP in the IRF5 gene non-linearly interact to predict clinical remission after anti-TNF biologicals. Conclusions Simulation studies and application in a real-world setting support the capability of the SDR algorithm to model epistatic interactions in candidate-genes studies in presence of right-censored data. Availability: http://sourceforge.net/projects/sdrproject/ PMID:20691091
Lin, Wen-Hsien; Liu, Wei-Chung; Hwang, Ming-Jing
2009-03-11
Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure. Compared to a random selection of proteins, house-keeping gene-encoded proteins tended to have a greater number of directly interacting neighbors and occupy network positions in several shortest paths of interaction between protein pairs, whereas tissue-specific gene-encoded proteins did not. In addition, house-keeping gene-encoded proteins tended to connect with other house-keeping gene-encoded proteins in all tissue types, whereas tissue-specific gene-encoded proteins also tended to connect with other tissue-specific gene-encoded proteins, but only in approximately half of the tissue types examined. Our analysis showed that house-keeping gene-encoded proteins tend to occupy important network positions, while those encoded by tissue-specific genes do not. The biological implications of our findings were discussed and we proposed a hypothesis regarding how cells organize their protein tools in protein-protein interaction networks. Our results led us to speculate that house-keeping gene-encoded proteins might form a core in human protein-protein interaction networks, while clusters of tissue-specific gene-encoded proteins are attached to the core at more peripheral positions of the networks.
Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B
2017-11-24
Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
Nandi, Sutanu; Subramanian, Abhishek; Sarkar, Ram Rup
2017-07-25
Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for the appropriate functioning and survival of a cell. The available machine learning techniques for essential gene prediction have inherent problems, like imbalanced provision of training datasets, biased choice of the best model for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for the prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, a unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate the best machine learning model (the model with the highest accuracy among all the models trained for different sample training sets). For the first time, we also introduce flux-coupled metabolic subnetwork-based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. The testing accuracy was always high as compared to the known techniques, proving that our method outperforms known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate high codon usage bias, GC content and gene expression, and predominantly possess a tendency to form physiological flux modules in metabolism.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; Taylor, Ronald C.; Weisenhorn, Pamela; Olson, Robert D.; Stevens, Rick L.; Rocha, Miguel; Rocha, Isabel; Best, Aaron A.; DeJongh, Matthew; Tintle, Nathan L.; Parrello, Bruce; Overbeek, Ross; Henry, Christopher S.
2016-01-01
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain. PMID:27933038
Qian, Jiang; Esumi, Noriko; Chen, Yangjian; Wang, Qingliang; Chowers, Itay; Zack, Donald J.
2005-01-01
Identification of tissue-specific gene regulatory networks can yield insights into the molecular basis of a tissue's development, function and pathology. Here, we present a computational approach designed to identify potential regulatory target genes of photoreceptor cell-specific transcription factors (TFs). The approach is based on the hypothesis that genes related to the retina in terms of expression, disease and/or function are more likely to be the targets of retina-specific TFs than other genes. A list of genes that are preferentially expressed in retina was obtained by integrating expressed sequence tag, SAGE and microarray datasets. The regulatory targets of retina-specific TFs are enriched in this set of retina-related genes. A Bayesian approach was employed to integrate information about binding site location relative to a gene's transcription start site. Our method was applied to three retina-specific TFs, CRX, NRL and NR2E3, and a number of potential targets were predicted. To experimentally assess the validity of the bioinformatic predictions, mobility shift, transient transfection and chromatin immunoprecipitation assays were performed with five predicted CRX targets, and the results were suggestive of CRX regulation in 5/5, 3/5 and 4/5 cases, respectively. Together, these experiments strongly suggest that RP1, GUCY2D, ABCA4 are novel targets of CRX. PMID:15967807
Ji, Hanlee; Kumm, Jochen; Zhang, Michael; Farnam, Kyle; Salari, Keyan; Faham, Malek; Ford, James M.; Davis, Ronald W.
2006-01-01
Genomic instability is a major feature of neoplastic development in colorectal carcinoma and other cancers. Specific genomic instability events, such as deletions in chromosomes and other alterations in gene copy number, have potential utility as biologically relevant prognostic biomarkers. For example, genomic deletions on chromosome arm 18q are an indicator of colorectal carcinoma behavior and potentially useful as a prognostic indicator. Adapting a novel genomic technology called molecular inversion probes which can determine gene copy alterations, such as genomic deletions, we designed a set of probes to interrogate several hundred individual exons of >200 cancer genes with an overall distribution covering all chromosome arms. In addition, >100 probes were designed in close proximity of microsatellite markers on chromosome arm 18q. We analyzed a set of colorectal carcinoma cell lines and primary colorectal tumor samples for gene copy alterations and deletion mutations in exons. Based on clustering analysis, we distinguished the different categories of genomic instability among the colorectal cancer cell lines. Our analysis of primary tumors uncovered several distinct categories of colorectal carcinoma, each with specific patterns of 18q deletions and deletion mutations in specific genes. This finding has potential clinical ramifications given the application of 18q loss of heterozygosity events as a potential indicator for adjuvant treatment in stage II colorectal carcinoma. PMID:16912164
Chang, Tzu-Hao; Wu, Shih-Lin; Wang, Wei-Jen; Horng, Jorng-Tzong; Chang, Cheng-Wei
2014-01-01
Microarrays are widely used to assess gene expressions. Most microarray studies focus primarily on identifying differential gene expressions between conditions (e.g., cancer versus normal cells), for discovering the major factors that cause diseases. Because previous studies have not identified the correlations of differential gene expression between conditions, crucial but abnormal regulations that cause diseases might have been disregarded. This paper proposes an approach for discovering the condition-specific correlations of gene expressions within biological pathways. Because analyzing gene expression correlations is time consuming, an Apache Hadoop cloud computing platform was implemented. Three microarray data sets of breast cancer were collected from the Gene Expression Omnibus, and pathway information from the Kyoto Encyclopedia of Genes and Genomes was applied for discovering meaningful biological correlations. The results showed that adopting the Hadoop platform considerably decreased the computation time. Several correlations of differential gene expressions were discovered between the relapse and nonrelapse breast cancer samples, and most of them were involved in cancer regulation and cancer-related pathways. The results showed that breast cancer recurrence might be highly associated with the abnormal regulations of these gene pairs, rather than with their individual expression levels. The proposed method was computationally efficient and reliable, and stable results were obtained when different data sets were used. The proposed method is effective in identifying meaningful biological regulation patterns between conditions.
Chromosomal localization of murine and human oligodendrocyte-specific protein genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronstein, J.M.; Wu, S.; Korenberg, J.R.
1996-06-01
Oligodendrocyte-specific protein (OSP) is a recently described protein present only in myelin of the central nervous system. Several inherited disorders of myelin are caused by mutations in myelin genes but the etiology of many remain unknown. We mapped the location of the mouse OSP gene to the proximal region of chromosome 3 using two sets of multilocus crosses and to human chromosome 3 using somatic cell hybrids. Fine mapping with fluorescence in situ hybridization placed the OSP gene at human chromosome 3q26.2-q26.3. To date, there are no known inherited neurological disorders that localize to these regions. 24 refs., 2 figs.
Anderson, Ashley K.; Ohler, Uwe; Wassarman, David A.
2012-01-01
To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5′ untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300–400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation. PMID:22984601
Katzenberger, Rebeccah J; Rach, Elizabeth A; Anderson, Ashley K; Ohler, Uwe; Wassarman, David A
2012-01-01
To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5' untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300-400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation.
Gene-specific cell labeling using MiMIC transposons
Gnerer, Joshua P.; Venken, Koen J. T.; Dierick, Herman A.
2015-01-01
Binary expression systems such as GAL4/UAS, LexA/LexAop and QF/QUAS have greatly enhanced the power of Drosophila as a model organism by allowing spatio-temporal manipulation of gene function as well as cell and neural circuit function. Tissue-specific expression of these heterologous transcription factors relies on random transposon integration near enhancers or promoters that drive the binary transcription factor embedded in the transposon. Alternatively, gene-specific promoter elements are directly fused to the binary factor within the transposon followed by random or site-specific integration. However, such insertions do not consistently recapitulate endogenous expression. We used Minos-Mediated Integration Cassette (MiMIC) transposons to convert host loci into reliable gene-specific binary effectors. MiMIC transposons allow recombinase-mediated cassette exchange to modify the transposon content. We developed novel exchange cassettes to convert coding intronic MiMIC insertions into gene-specific binary factor protein-traps. In addition, we expanded the set of binary factor exchange cassettes available for non-coding intronic MiMIC insertions. We show that binary factor conversions of different insertions in the same locus have indistinguishable expression patterns, suggesting that they reliably reflect endogenous gene expression. We show the efficacy and broad applicability of these new tools by dissecting the cellular expression patterns of the Drosophila serotonin receptor gene family. PMID:25712101
Lee, Chai-Jin; Kang, Dongwon; Lee, Sangseon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun
2018-05-25
Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. http://biohealth.snu.ac.kr/software/insilico/. Copyright © 2018 Elsevier Inc. All rights reserved.
Sample entropy analysis of cervical neoplasia gene-expression signatures
Botting, Shaleen K; Trzeciakowski, Jerome P; Benoit, Michelle F; Salama, Salama A; Diaz-Arrastia, Concepcion R
2009-01-01
Background We introduce Approximate Entropy as a mathematical method of analysis for microarray data. Approximate entropy is applied here as a method to classify the complex gene expression patterns resultant of a clinical sample set. Since Entropy is a measure of disorder in a system, we believe that by choosing genes which display minimum entropy in normal controls and maximum entropy in the cancerous sample set we will be able to distinguish those genes which display the greatest variability in the cancerous set. Here we describe a method of utilizing Approximate Sample Entropy (ApSE) analysis to identify genes of interest with the highest probability of producing an accurate, predictive, classification model from our data set. Results In the development of a diagnostic gene-expression profile for cervical intraepithelial neoplasia (CIN) and squamous cell carcinoma of the cervix, we identified 208 genes which are unchanging in all normal tissue samples, yet exhibit a random pattern indicative of the genetic instability and heterogeneity of malignant cells. This may be measured in terms of the ApSE when compared to normal tissue. We have validated 10 of these genes on 10 Normal and 20 cancer and CIN3 samples. We report that the predictive value of the sample entropy calculation for these 10 genes of interest is promising (75% sensitivity, 80% specificity for prediction of cervical cancer over CIN3). Conclusion The success of the Approximate Sample Entropy approach in discerning alterations in complexity from biological system with such relatively small sample set, and extracting biologically relevant genes of interest hold great promise. PMID:19232110
Conjugative plasmids: vessels of the communal gene pool
Norman, Anders; Hansen, Lars H.; Sørensen, Søren J.
2009-01-01
Comparative whole-genome analyses have demonstrated that horizontal gene transfer (HGT) provides a significant contribution to prokaryotic genome innovation. The evolution of specific prokaryotes is therefore tightly linked to the environment in which they live and the communal pool of genes available within that environment. Here we use the term supergenome to describe the set of all genes that a prokaryotic ‘individual’ can draw on within a particular environmental setting. Conjugative plasmids can be considered particularly successful entities within the communal pool, which have enabled HGT over large taxonomic distances. These plasmids are collections of discrete regions of genes that function as ‘backbone modules’ to undertake different aspects of overall plasmid maintenance and propagation. Conjugative plasmids often carry suites of ‘accessory elements’ that contribute adaptive traits to the hosts and, potentially, other resident prokaryotes within specific environmental niches. Insight into the evolution of plasmid modules therefore contributes to our knowledge of gene dissemination and evolution within prokaryotic communities. This communal pool provides the prokaryotes with an important mechanistic framework for obtaining adaptability and functional diversity that alleviates the need for large genomes of specialized ‘private genes’. PMID:19571247
Freed, Nikki E; Bumann, Dirk; Silander, Olin K
2016-09-06
Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 481 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 revealed that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, implying that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.
A mixture model-based approach to the clustering of microarray expression data.
McLachlan, G J; Bean, R W; Peel, D
2002-03-01
This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/
SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics
2013-01-01
Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. PMID:24267658
Williams, Kelly P.
2003-01-01
A partial screen for genetic elements integrated into completely sequenced bacterial genomes shows more significant bias in specificity for the tmRNA gene (ssrA) than for any type of tRNA gene. Horizontal gene transfer, a major avenue of bacterial evolution, was assessed by focusing on elements using this single attachment locus. Diverse elements use ssrA; among enterobacteria alone, at least four different integrase subfamilies have independently evolved specificity for ssrA, and almost every strain analyzed presents a unique set of integrated elements. Even elements using essentially the same integrase can be very diverse, as is a group with an ssrA-specific integrase of the P4 subfamily. This same integrase appears to promote damage routinely at attachment sites, which may be adaptive. Elements in arrays can recombine; one such event mediated by invertible DNA segments within neighboring elements likely explains the monophasic nature of Salmonella enterica serovar Typhi. One of a limited set of conserved sequences occurs at the attachment site of each enterobacterial element, apparently serving as a transcriptional terminator for ssrA. Elements were usually found integrated into tRNA-like sequence at the 3′ end of ssrA, at subsites corresponding to those used in tRNA genes; an exception was found at the non-tRNA-like 3′ end produced by ssrA gene permutation in cyanobacteria, suggesting that, during the evolution of new site specificity by integrases, tropism toward a conserved 3′ end of an RNA gene may be as strong as toward a tRNA-like sequence. The proximity of ssrA and smpB, which act in concert, was also surveyed. PMID:12533482
Allele-specific gene expression in a wild nonhuman primate population
Tung, J.; Akinyi, M. Y.; Mutura, S.; Altmann, J.; Wray, G. A.; Alberts, S. C.
2015-01-01
Natural populations hold enormous potential for evolutionary genetic studies, especially when phenotypic, genetic and environmental data are all available on the same individuals. However, untangling the genotype-phenotype relationship in natural populations remains a major challenge. Here, we describe results of an investigation of one class of phenotype, allele-specific gene expression (ASGE), in the well-studied natural population of baboons of the Amboseli basin, Kenya. ASGE measurements identify cases in which one allele of a gene is overexpressed relative to the alternative allele of the same gene, within individuals, thus providing a control for background genetic and environmental effects. Here, we characterize the incidence of ASGE in the Amboseli baboon population, focusing on the genetic and environmental contributions to ASGE in a set of eleven genes involved in immunity and defence. Within this set, we identify evidence for common ASGE in four genes. We also present examples of two relationships between cis-regulatory genetic variants and the ASGE phenotype. Finally, we identify one case in which this relationship is influenced by a novel gene-environment interaction. Specifically, the dominance rank of an individual’s mother during its early life (an aspect of that individual’s social environment) influences the expression of the gene CCL5 via an interaction with cis-regulatory genetic variation. These results illustrate how environmental and ecological data can be integrated into evolutionary genetic studies of functional variation in natural populations. They also highlight the potential importance of early life environmental variation in shaping the genetic architecture of complex traits in wild mammals. PMID:21226779
Parallel gene analysis with allele-specific padlock probes and tag microarrays
Banér, Johan; Isaksson, Anders; Waldenström, Erik; Jarvius, Jonas; Landegren, Ulf; Nilsson, Mats
2003-01-01
Parallel, highly specific analysis methods are required to take advantage of the extensive information about DNA sequence variation and of expressed sequences. We present a scalable laboratory technique suitable to analyze numerous target sequences in multiplexed assays. Sets of padlock probes were applied to analyze single nucleotide variation directly in total genomic DNA or cDNA for parallel genotyping or gene expression analysis. All reacted probes were then co-amplified and identified by hybridization to a standard tag oligonucleotide array. The technique was illustrated by analyzing normal and pathogenic variation within the Wilson disease-related ATP7B gene, both at the level of DNA and RNA, using allele-specific padlock probes. PMID:12930977
NASA Technical Reports Server (NTRS)
Newcombe, David; Stuecker, Tara; La Duc, Myron; Venkateswaran, Kasthuri
2005-01-01
Previous studies indicated evidence of opportunistic pathogens samples obtained during missions to the International Space Station (ISS). This study utilized TaqMan quantitative PCR to determine specific gene abundance in potable and non-potable ISS waters. Probe and primer sets specific to the small subunit rRNA genes were used to elucidate overall bacterial rRNA gene numbers. while those specific for Burkholderia cepacia and Stenotrophomonas maltophilia were optimized and used to probe for the presence of these two opportunistic pathogens. This research builds upon previous microbial diversity studies of ISS water and demonstrates the utility of Q-PCR tool to examine water quality.
2012-01-01
Background Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Results Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since multiple TLRs were found in the generic fever network, it is reasonable to hypothesize that vaccine-TLR interactions may play an important role in inducing fever response, which deserves a further investigation. Conclusions This study demonstrated that ontology-based literature mining is a powerful method for analyzing gene interaction networks and generating new scientific hypotheses. PMID:23256563
Benchmarking of Methods for Genomic Taxonomy
Larsen, Mette V.; Cosentino, Salvatore; Lukjancenko, Oksana; ...
2014-02-26
One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is—that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In this paper, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Typemore » that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. Finally, the KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.« less
Robustness, Evolvability, and the Logic of Genetic Regulation
Moore, Jason H.; Wagner, Andreas
2014-01-01
In gene regulatory circuits, the expression of individual genes is commonly modulated by a set of regulating gene products, which bind to a gene’s cis-regulatory region. This region encodes an input-output function, referred to as signal-integration logic, that maps a specific combination of regulatory signals (inputs) to a particular expression state (output) of a gene. The space of all possible signal-integration functions is vast and the mapping from input to output is many-to-one: for the same set of inputs, many functions (genotypes) yield the same expression output (phenotype). Here, we exhaustively enumerate the set of signal-integration functions that yield idential gene expression patterns within a computational model of gene regulatory circuits. Our goal is to characterize the relationship between robustness and evolvability in the signal-integration space of regulatory circuits, and to understand how these properties vary between the genotypic and phenotypic scales. Among other results, we find that the distributions of genotypic robustness are skewed, such that the majority of signal-integration functions are robust to perturbation. We show that the connected set of genotypes that make up a given phenotype are constrained to specific regions of the space of all possible signal-integration functions, but that as the distance between genotypes increases, so does their capacity for unique innovations. In addition, we find that robust phenotypes are (i) evolvable, (ii) easily identified by random mutation, and (iii) mutationally biased toward other robust phenotypes. We explore the implications of these latter observations for mutation-based evolution by conducting random walks between randomly chosen source and target phenotypes. We demonstrate that the time required to identify the target phenotype is independent of the properties of the source phenotype. PMID:23373974
Yang, Chunxiao; Pan, Huipeng; Noland, Jeffrey Edward; Zhang, Deyong; Zhang, Zhanhong; Liu, Yong; Zhou, Xuguo
2015-12-10
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for quantifying gene expression across various biological processes, of which requires a set of suited reference genes to normalize the expression data. Coleomegilla maculata (Coleoptera: Coccinellidae), is one of the most extensively used biological control agents in the field to manage arthropod pest species. In this study, expression profiles of 16 housekeeping genes selected from C. maculata were cloned and investigated. The performance of these candidates as endogenous controls under specific experimental conditions was evaluated by dedicated algorithms, including geNorm, Normfinder, BestKeeper, and ΔCt method. In addition, RefFinder, a comprehensive platform integrating all the above-mentioned algorithms, ranked the overall stability of these candidate genes. As a result, various sets of suitable reference genes were recommended specifically for experiments involving different tissues, developmental stages, sex, and C. maculate larvae treated with dietary double stranded RNA. This study represents the critical first step to establish a standardized RT-qPCR protocol for the functional genomics research in a ladybeetle C. maculate. Furthermore, it lays the foundation for conducting ecological risk assessment of RNAi-based gene silencing biotechnologies on non-target organisms; in this case, a key predatory biological control agent.
Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.
Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P
2017-11-23
The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In addition, the data provides additional evidence in favor of and against the similarity-based functions assigned to uncharacterized genes.
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2011-01-01
Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235
Cabiati, Manuela; Raucci, Serena; Caselli, Chiara; Guzzardi, Maria Angela; D'Amico, Andrea; Prescimone, Tommaso; Giannessi, Daniela; Del Ry, Silvia
2012-06-01
Obesity is a complex pathology with interacting and confounding causes due to the environment, hormonal signaling patterns, and genetic predisposition. At present, the Zucker rat is an eligible genetic model for research on obesity and metabolic syndrome, allowing scrutiny of gene expression profiles. Real-time PCR is the benchmark method for measuring mRNA expressions, but the accuracy and reproducibility of its data greatly depend on appropriate normalization strategies. In the Zucker rat model, no specific reference genes have been identified in myocardium, kidney, and lung, the main organs involved in this syndrome. The aim of this study was to select among ten candidates (Actb, Gapdh, Polr2a, Ywhag, Rpl13a, Sdha, Ppia, Tbp, Hprt1 and Tfrc) a set of reference genes that can be used for the normalization of mRNA expression data obtained by real-time PCR in obese and lean Zucker rats both at fasting and during acute hyperglycemia. The most stable genes in the heart were Sdha, Tbp, and Hprt1; in kidney, Tbp, Actb, and Gapdh were chosen, while Actb, Ywhag, and Sdha were selected as the most stably expressed set for pulmonary tissue. The normalization strategy was used to analyze mRNA expression of tumor necrosis factor α, the main inflammatory mediator in obesity, whose variations were more significant when normalized with the appropriately selected reference genes. The findings obtained in this study underline the importance of having three stably expressed reference gene sets for use in the cardiac, renal, and pulmonary tissues of an experimental model of obese and hyperglycemic Zucker rats.
Evolution of Prdm Genes in Animals: Insights from Comparative Genomics
Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre
2016-01-01
Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352
Ranking metrics in gene set enrichment analysis: do they matter?
Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna
2017-05-12
There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
Martini, Paolo; Sales, Gabriele; Calura, Enrica; Brugiolo, Mattia; Lanfranchi, Gerolamo; Romualdi, Chiara; Cagnin, Stefano
2013-01-01
Genome-wide experiments are routinely used to increase the understanding of the biological processes involved in the development and maintenance of a variety of pathologies. Although the technical feasibility of this type of experiment has improved in recent years, data analysis remains challenging. In this context, gene set analysis has emerged as a fundamental tool for the interpretation of the results. Here, we review strategies used in the gene set approach, and using datasets for the pig cardiocirculatory system as a case study, we demonstrate how the use of a combination of these strategies can enhance the interpretation of results. Gene set analyses are able to distinguish vessels from the heart and arteries from veins in a manner that is consistent with the different cellular composition of smooth muscle cells. By integrating microRNA elements in the regulatory circuits identified, we find that vessel specificity is maintained through specific miRNAs, such as miR-133a and miR-143, which show anti-correlated expression with their mRNA targets. PMID:24284405
Watanabe, Yoshiyuki; Kim, Hyun Soo; Castoro, Ryan J; Chung, Woonbok; Estecio, Marcos R H; Kondo, Kimie; Guo, Yi; Ahmed, Saira S; Toyota, Minoru; Itoh, Fumio; Suk, Ki Tae; Cho, Mee-Yon; Shen, Lanlan; Jelinek, Jaroslav; Issa, Jean-Pierre J
2009-06-01
Aberrant DNA methylation is an early and frequent process in gastric carcinogenesis and could be useful for detection of gastric neoplasia. We hypothesized that methylation analysis of DNA recovered from gastric washes could be used to detect gastric cancer. We studied 51 candidate genes in 7 gastric cancer cell lines and 24 samples (training set) and identified 6 for further studies. We examined the methylation status of these genes in a test set consisting of 131 gastric neoplasias at various stages. Finally, we validated the 6 candidate genes in a different population of 40 primary gastric cancer samples and 113 nonneoplastic gastric mucosa samples. Six genes (MINT25, RORA, GDNF, ADAM23, PRDM5, MLF1) showed frequent differential methylation between gastric cancer and normal mucosa in the training, test, and validation sets. GDNF and MINT25 were most sensitive molecular markers of early stage gastric cancer, whereas PRDM5 and MLF1 were markers of a field defect. There was a close correlation (r = 0.5-0.9, P = .03-.001) between methylation levels in tumor biopsy and gastric washes. MINT25 methylation had the best sensitivity (90%), specificity (96%), and area under the receiver operating characteristic curve (0.961) in terms of tumor detection in gastric washes. These findings suggest MINT25 is a sensitive and specific marker for screening in gastric cancer. Additionally, we have developed a new method for gastric cancer detection by DNA methylation in gastric washes.
Gene context conservation of a higher order than operons.
Lathe, W C; Snel, B; Bork, P
2000-10-01
Operons, co-transcribed and co-regulated contiguous sets of genes, are poorly conserved over short periods of evolutionary time. The gene order, gene content and regulatory mechanisms of operons can be very different, even in closely related species. Here, we present several lines of evidence which suggest that, although an operon and its individual genes and regulatory structures are rearranged when comparing the genomes of different species, this rearrangement is a conservative process. Genomic rearrangements invariably maintain individual genes in very specific functional and regulatory contexts. We call this conserved context an uber-operon.
Differential gene expression in HIV/SIV-associated and spontaneous lymphomas
2005-01-01
Diffuse large B-cell lymphoma (DLBCL) is more prevalent and more often fatal in HIV-infected patients and SIV-infected monkeys compared to immune-competent individuals. Molecular, biological, and immunological data indicate that virus-associated lymphomagenesis is similar in both infected hosts. To find genes specifically overexpressed in HIV/SIV-associated and non-HIV/SIV-associated DLBCL we compared gene expression profiles of HIV/SIV-related and non-HIV-related lymphomas using subtractive hybridization and Northern blot analysis. Our experimental approach allowed us to detect two genes (a-myb and pub) upregulated solely in HIV/SIV-associated DLBCLs potentially involved in virus-specific lymphomagenesis in human and monkey. Downregulation of the pub gene was observed in all non-HIV-associated lymphomas investigated. In addition, we have found genes upregulated in both non-HIV- and HIV-associated lymphomas. Among those were genes both with known (set, ND4, SMG-1) and unknown functions. In summary, we have demonstrated that simultaneous transcriptional upregulation of at least two genes (a-myb and pub) was specific for AIDS-associated lymphomas. PMID:16239949
Yin, L G; Zou, Z Q; Zhao, H Y; Zhang, C L; Shen, J G; Qi, L; Qi, M; Xue, Z Q
2014-01-01
Adenocarcinoma (ADC) and squamous cell carcinomas (SCC) are two subtypes of non-small cell lung carcinomas which are regarded as the leading cause of cancer-related malignancy worldwide. The aim of this study is to detect the differentially methylated loci (DMLs) and differentially methylated genes (DMGs) of these two tumor sets, and then to illustrate the different expression level of specific methylated genes. Using TCGA database and Illumina HumanMethylation 27 arrays, we first screened the DMGs and DMLs in tumor samples. Then, we explored the BiologicalProcess terms of hypermethylated and hypomethylated genes using Functional Gene Ontology (GO) catalogues. Hypermethylation intensively occurred in CpG-island, whereas hypomethylation was located in non-CpG-island. Most SCC and ADC hypermethylated genes involved GO function of DNA dependenit regulation of transcription, and hypomethylated genes mainly 'enriched in the term of immune responses. Additionally, the expression level of specific differentially methylated genesis distinctbetween ADC and SCC. It is concluded that ADC and SCC have different methylated status that might play an important role in carcinogenesis.
ERIC Educational Resources Information Center
Rowland-Goldsmith, Melissa
2009-01-01
DNA microarray is an ordered grid containing known sequences of DNA, which represent many of the genes in a particular organism. Each DNA sequence is unique to a specific gene. This technology enables the researcher to screen many genes from cells or tissue grown in different conditions. We developed an undergraduate lecture and laboratory…
htsint: a Python library for sequencing pipelines that combines data through gene set generation.
Richards, Adam J; Herrel, Anthony; Bonneaud, Camille
2015-09-24
Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
Strotbek, Christoph; Krinninger, Stefan; Frank, Wolfgang
2013-01-01
To comprehensively understand the major processes in plant biology, it is necessary to study a diverse set of species that represent the complexity of plants. This research will help to comprehend common conserved mechanisms and principles, as well as to elucidate those mechanisms that are specific to a particular plant clade. Thereby, we will gain knowledge about the invention and loss of mechanisms and their biological impact causing the distinct specifications throughout the plant kingdom. Since the establishment of transgenic plants, these studies concentrate on the elucidation of gene functions applying an increasing repertoire of molecular techniques. In the last two decades, the moss Physcomitrella patens joined the established set of plant models based on its evolutionary position bridging unicellular algae and vascular plants and a number of specific features alleviating gene function analysis. Here, we want to provide an overview of the specific features of P. patens making it an interesting model for many research fields in plant biology, to present the major achievements in P. patens genetic engineering, and to introduce common techniques to scientists who intend to use P. patens as a model in their research activities.
2009-01-01
Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126
Primer sets for cloning the human repertoire of T cell Receptor Variable regions
Boria, Ilenia; Cotella, Diego; Dianzani, Irma; Santoro, Claudio; Sblattero, Daniele
2008-01-01
Background Amplification and cloning of naïve T cell Receptor (TR) repertoires or antigen-specific TR is crucial to shape immune response and to develop immuno-based therapies. TR variable (V) regions are encoded by several genes that recombine during T cell development. The cloning of expressed genes as large diverse libraries from natural sources relies upon the availability of primers able to amplify as many V genes as possible. Results Here, we present a list of primers computationally designed on all functional TR V and J genes listed in the IMGT®, the ImMunoGeneTics information system®. The list consists of unambiguous or degenerate primers suitable to theoretically amplify and clone the entire TR repertoire. We show that it is possible to selectively amplify and clone expressed TR V genes in one single RT-PCR step and from as little as 1000 cells. Conclusion This new primer set will facilitate the creation of more diverse TR libraries than has been possible using currently available primer sets. PMID:18759974
Molecular Tools for the Detection of Nitrogen Cycling Archaea
Rusch, Antje
2013-01-01
Archaea are widespread in extreme and temperate environments, and cultured representatives cover a broad spectrum of metabolic capacities, which sets them up for potentially major roles in the biogeochemistry of their ecosystems. The detection, characterization, and quantification of archaeal functions in mixed communities require Archaea-specific primers or probes for the corresponding metabolic genes. Five pairs of degenerate primers were designed to target archaeal genes encoding key enzymes of nitrogen cycling: nitrite reductases NirA and NirB, nitrous oxide reductase (NosZ), nitrogenase reductase (NifH), and nitrate reductases NapA/NarG. Sensitivity towards their archaeal target gene, phylogenetic specificity, and gene specificity were evaluated in silico and in vitro. Owing to their moderate sensitivity/coverage, the novel nirB-targeted primers are suitable for pure culture studies only. The nirA-targeted primers showed sufficient sensitivity and phylogenetic specificity, but poor gene specificity. The primers designed for amplification of archaeal nosZ performed well in all 3 criteria; their discrimination against bacterial homologs appears to be weakened when Archaea are strongly outnumbered by bacteria in a mixed community. The novel nifH-targeted primers showed high sensitivity and gene specificity, but failed to discriminate against bacterial homologs. Despite limitations, 4 of the new primer pairs are suitable tools in several molecular methods applied in archaeal ecology. PMID:23365509
Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J
2012-11-01
Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.
Moreno-Sánchez, Natalia; Rueda, Julia; Reverter, Antonio; Carabaño, María Jesús; Díaz, Clara
2012-03-01
Variations on the transcriptome from one skeletal muscle type to another still remain unknown. The reliable identification of stable gene coexpression networks is essential to unravel gene functions and define biological processes. The differential expression of two distinct muscles, M. flexor digitorum (FD) and M. psoas major (PM), was studied using microarrays in cattle to illustrate muscle-specific transcription patterns and to quantify changes in connectivity regarding the expected gene coexpression pattern. A total of 206 genes were differentially expressed (DE), 94 upregulated in PM and 112 in FD. The distribution of DE genes in pathways and biological functions was explored in the context of system biology. Global interactomes for genes of interest were predicted. Fast/slow twitch genes, genes coding for extracellular matrix, ribosomal and heat shock proteins, and fatty acid uptake centred the specific gene expression patterns per muscle. Genes involved in repairing mechanisms, such as ribosomal and heat shock proteins, suggested a differential ability of muscles to react to similar stressing factors, acting preferentially in slow twitch muscles. Muscle attributes do not seem to be completely explained by the muscle fibre composition. Changes in connectivity accounted for 24% of significant correlations between DE genes. Genes changing their connectivity mostly seem to contribute to the main differential attributes that characterize each specific muscle type. These results underscore the unique flexibility of skeletal muscle where a substantial set of genes are able to change their behavior depending on the circumstances.
Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J
2005-01-01
Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
A Versatile Panel of Reference Gene Assays for the Measurement of Chicken mRNA by Quantitative PCR
Maier, Helena J.; Van Borm, Steven; Young, John R.; Fife, Mark
2016-01-01
Quantitative real-time PCR assays are widely used for the quantification of mRNA within avian experimental samples. Multiple stably-expressed reference genes, selected for the lowest variation in representative samples, can be used to control random technical variation. Reference gene assays must be reliable, have high amplification specificity and efficiency, and not produce signals from contaminating DNA. Whilst recent research papers identify specific genes that are stable in particular tissues and experimental treatments, here we describe a panel of ten avian gene primer and probe sets that can be used to identify suitable reference genes in many experimental contexts. The panel was tested with TaqMan and SYBR Green systems in two experimental scenarios: a tissue collection and virus infection of cultured fibroblasts. GeNorm and NormFinder algorithms were able to select appropriate reference gene sets in each case. We show the effects of using the selected genes on the detection of statistically significant differences in expression. The results are compared with those obtained using 28s ribosomal RNA, the present most widely accepted reference gene in chicken work, identifying circumstances where its use might provide misleading results. Methods for eliminating DNA contamination of RNA reduced, but did not completely remove, detectable DNA. We therefore attached special importance to testing each qPCR assay for absence of signal using DNA template. The assays and analyses developed here provide a useful resource for selecting reference genes for investigations of avian biology. PMID:27537060
A novel lineage of myoviruses infecting cyanobacteria is widespread in the oceans.
Sabehi, Gazalah; Shaulov, Lihi; Silver, David H; Yanai, Itai; Harel, Amnon; Lindell, Debbie
2012-02-07
Viruses infecting bacteria (phages) are thought to greatly impact microbial population dynamics as well as the genome diversity and evolution of their hosts. Here we report on the discovery of a novel lineage of tailed dsDNA phages belonging to the family Myoviridae and describe its first representative, S-TIM5, that infects the ubiquitous marine cyanobacterium, Synechococcus. The genome of this phage encodes an entirely unique set of structural proteins not found in any currently known phage, indicating that it uses lineage-specific genes for virion morphogenesis and represents a previously unknown lineage of myoviruses. Furthermore, among its distinctive collection of replication and DNA metabolism genes, it carries a mitochondrial-like DNA polymerase gene, providing strong evidence for the bacteriophage origin of the mitochondrial DNA polymerase. S-TIM5 also encodes an array of bacterial-like metabolism genes commonly found in phages infecting cyanobacteria including photosynthesis, carbon metabolism and phosphorus acquisition genes. This suggests a common gene pool and gene swapping of cyanophage-specific genes among different phage lineages despite distinct sets of structural and replication genes. All cytosines following purine nucleotides are methylated in the S-TIM5 genome, constituting a unique methylation pattern that likely protects the genome from nuclease degradation. This phage is abundant in the Red Sea and S-TIM5 gene homologs are widespread in the oceans. This unusual phage type is thus likely to be an important player in the oceans, impacting the population dynamics and evolution of their primary producing cyanobacterial hosts.
Frieman, M; Chen, Z J; Saez-Vasquez, J; Shen, L A; Pikaard, C S
1999-01-01
In interspecific hybrids or allopolyploids, often one parental set of ribosomal RNA genes is transcribed and the other is silent, an epigenetic phenomenon known as nucleolar dominance. Silencing is enforced by cytosine methylation and histone deacetylation, but the initial discrimination mechanism is unknown. One hypothesis is that a species-specific transcription factor is inactivated, thereby silencing one set of rRNA genes. Another is that dominant rRNA genes have higher binding affinities for limiting transcription factors. A third suggests that selective methylation of underdominant rRNA genes blocks transcription factor binding. We tested these hypotheses using Brassica napus (canola), an allotetraploid derived from B. rapa and B. oleracea in which only B. rapa rRNA genes are transcribed. B. oleracea and B. rapa rRNA genes were active when transfected into protoplasts of the other species, which argues against the species-specific transcription factor model. B. oleracea and B. rapa rRNA genes also competed equally for the pol I transcription machinery in vitro and in vivo. Cytosine methylation had no effect on rRNA gene transcription in vitro, which suggests that transcription factor binding was unimpaired. These data are inconsistent with the prevailing models and point to discrimination mechanisms that are likely to act at a chromosomal level. PMID:10224274
In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.
Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi
2013-10-04
Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC. Copyright © 2013 Elsevier Inc. All rights reserved.
Jani, Saurin D; Argraves, Gary L; Barth, Jeremy L; Argraves, W Scott
2010-04-01
An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies, intermolecular interactions, overlays of genes onto KEGG pathway diagrams and heatmaps of expression intensity values. GeneMesh is freely available online at http://proteogenomics.musc.edu/genemesh/.
Assessment of the reliability of protein-protein interactions and protein function prediction.
Deng, Minghua; Sun, Fengzhu; Chen, Ting
2003-01-01
As more and more high-throughput protein-protein interaction data are collected, the task of estimating the reliability of different data sets becomes increasingly important. In this paper, we present our study of two groups of protein-protein interaction data, the physical interaction data and the protein complex data, and estimate the reliability of these data sets using three different measurements: (1) the distribution of gene expression correlation coefficients, (2) the reliability based on gene expression correlation coefficients, and (3) the accuracy of protein function predictions. We develop a maximum likelihood method to estimate the reliability of protein interaction data sets according to the distribution of correlation coefficients of gene expression profiles of putative interacting protein pairs. The results of the three measurements are consistent with each other. The MIPS protein complex data have the highest mean gene expression correlation coefficients (0.256) and the highest accuracy in predicting protein functions (70% sensitivity and specificity), while Ito's Yeast two-hybrid data have the lowest mean (0.041) and the lowest accuracy (15% sensitivity and specificity). Uetz's data are more reliable than Ito's data in all three measurements, and the TAP protein complex data are more reliable than the HMS-PCI data in all three measurements as well. The complex data sets generally perform better in function predictions than do the physical interaction data sets. Proteins in complexes are shown to be more highly correlated in gene expression. The results confirm that the components of a protein complex can be assigned to functions that the complex carries out within a cell. There are three interaction data sets different from the above two groups: the genetic interaction data, the in-silico data and the syn-express data. Their capability of predicting protein functions generally falls between that of the Y2H data and that of the MIPS protein complex data. The supplementary information is available at the following Web site: http://www-hto.usc.edu/-msms/AssessInteraction/.
Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.
2015-01-01
Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509
Gene-specific cell labeling using MiMIC transposons.
Gnerer, Joshua P; Venken, Koen J T; Dierick, Herman A
2015-04-30
Binary expression systems such as GAL4/UAS, LexA/LexAop and QF/QUAS have greatly enhanced the power of Drosophila as a model organism by allowing spatio-temporal manipulation of gene function as well as cell and neural circuit function. Tissue-specific expression of these heterologous transcription factors relies on random transposon integration near enhancers or promoters that drive the binary transcription factor embedded in the transposon. Alternatively, gene-specific promoter elements are directly fused to the binary factor within the transposon followed by random or site-specific integration. However, such insertions do not consistently recapitulate endogenous expression. We used Minos-Mediated Integration Cassette (MiMIC) transposons to convert host loci into reliable gene-specific binary effectors. MiMIC transposons allow recombinase-mediated cassette exchange to modify the transposon content. We developed novel exchange cassettes to convert coding intronic MiMIC insertions into gene-specific binary factor protein-traps. In addition, we expanded the set of binary factor exchange cassettes available for non-coding intronic MiMIC insertions. We show that binary factor conversions of different insertions in the same locus have indistinguishable expression patterns, suggesting that they reliably reflect endogenous gene expression. We show the efficacy and broad applicability of these new tools by dissecting the cellular expression patterns of the Drosophila serotonin receptor gene family. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
A forward genetic screen reveals essential and non-essential RNAi factors in Paramecium tetraurelia
Marker, Simone; Carradec, Quentin; Tanty, Véronique; Arnaiz, Olivier; Meyer, Eric
2014-01-01
In most eukaryotes, small RNA-mediated gene silencing pathways form complex interacting networks. In the ciliate Paramecium tetraurelia, at least two RNA interference (RNAi) mechanisms coexist, involving distinct but overlapping sets of protein factors and producing different types of short interfering RNAs (siRNAs). One is specifically triggered by high-copy transgenes, and the other by feeding cells with double-stranded RNA (dsRNA)-producing bacteria. In this study, we designed a forward genetic screen for mutants deficient in dsRNA-induced silencing, and a powerful method to identify the relevant mutations by whole-genome sequencing. We present a set of 47 mutant alleles for five genes, revealing two previously unknown RNAi factors: a novel Paramecium-specific protein (Pds1) and a Cid1-like nucleotidyl transferase. Analyses of allelic diversity distinguish non-essential and essential genes and suggest that the screen is saturated for non-essential, single-copy genes. We show that non-essential genes are specifically involved in dsRNA-induced RNAi while essential ones are also involved in transgene-induced RNAi. One of the latter, the RNA-dependent RNA polymerase RDR2, is further shown to be required for all known types of siRNAs, as well as for sexual reproduction. These results open the way for the dissection of the genetic complexity, interconnection, mechanisms and natural functions of RNAi pathways in P. tetraurelia. PMID:24860163
Lee, Hong Kai; Lee, Chun Kiat; Loh, Tze Ping; Tang, Julian Wei-Tze; Chiu, Lily; Tambyah, Paul A; Sethi, Sunil K; Koay, Evelyn Siew-Chuan
2010-09-01
With the relative global lack of immunity to the pandemic influenza A/H1N1/2009 virus that emerged in April 2009 as well as the sustained susceptibility to infection, rapid and accurate diagnostic assays are essential to detect this novel influenza A variant. Among the molecular diagnostic methods that have been developed to date, most are in tandem monoplex assays targeting either different regions of a single viral gene segment or different viral gene segments. We describe a dual-gene (duplex) quantitative real-time RT-PCR method selectively targeting pandemic influenza A/H1N1/2009. The assay design includes a primer-probe set specific to only the hemagglutinin (HA) gene of this novel influenza A variant and a second set capable of detecting the nucleoprotein (NP) gene of all swine-origin influenza A virus. In silico analysis of the specific HA oligonucleotide sequence used in the assay showed that it targeted only the swine-origin pandemic strain; there was also no cross-reactivity against a wide spectrum of noninfluenza respiratory viruses. The assay has a diagnostic sensitivity and specificity of 97.7% and 100%, respectively, a lower detection limit of 50 viral gene copies/PCR, and can be adapted to either a qualitative or quantitative mode. It was first applied to 3512 patients with influenza-like illnesses at a tertiary hospital in Singapore, during the containment phase of the pandemic (May to July 2009).
Wang, Deguo; Liu, Yanhong
2015-05-26
Streptococcus dysgalactiae, Streptococcus uberis and Streptococcus agalactiae are the three main pathogens causing bovine mastitis, with great losses to the dairy industry. Rapid and specific loop-mediated isothermal amplification methods (LAMP) for identification and differentiation of these three pathogens are not available. With the 16S rRNA gene and 16S-23S rRNA intergenic spacers as targets, four sets of LAMP primers were designed for identification and differentiation of S. dysgalactiae, S. uberis and S. agalactiae. The detection limit of all four LAMP primer sets were 0.1 pg DNA template per reaction, the LAMP method with 16S rRNA gene and 16S-23S rRNA intergenic spacers as the targets can differentiate the three pathogens, which is potentially useful in epidemiological studies.
Strakova, Eva; Zikova, Alice; Vohradsky, Jiri
2014-01-01
A computational model of gene expression was applied to a novel test set of microarray time series measurements to reveal regulatory interactions between transcriptional regulators represented by 45 sigma factors and the genes expressed during germination of a prokaryote Streptomyces coelicolor. Using microarrays, the first 5.5 h of the process was recorded in 13 time points, which provided a database of gene expression time series on genome-wide scale. The computational modeling of the kinetic relations between the sigma factors, individual genes and genes clustered according to the similarity of their expression kinetics identified kinetically plausible sigma factor-controlled networks. Using genome sequence annotations, functional groups of genes that were predominantly controlled by specific sigma factors were identified. Using external binding data complementing the modeling approach, specific genes involved in the control of the studied process were identified and their function suggested.
PINTA: a web server for network-based gene prioritization from expression data
Nitsch, Daniela; Tranchevent, Léon-Charles; Gonçalves, Joana P.; Vogt, Josef Korbinian; Madeira, Sara C.; Moreau, Yves
2011-01-01
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user. PMID:21602267
Mengual, Lourdes; Burset, Moisès; Ribal, María José; Ars, Elisabet; Marín-Aguilera, Mercedes; Fernández, Manuel; Ingelmo-Torres, Mercedes; Villavicencio, Humberto; Alcaraz, Antonio
2010-05-01
To develop an accurate and noninvasive method for bladder cancer diagnosis and prediction of disease aggressiveness based on the gene expression patterns of urine samples. Gene expression patterns of 341 urine samples from bladder urothelial cell carcinoma (UCC) patients and 235 controls were analyzed via TaqMan Arrays. In a first phase of the study, three consecutive gene selection steps were done to identify a gene set expression signature to detect and stratify UCC in urine. Subsequently, those genes more informative for UCC diagnosis and prediction of tumor aggressiveness were combined to obtain a classification system of bladder cancer samples. In a second phase, the obtained gene set signature was evaluated in a routine clinical scenario analyzing only voided urine samples. We have identified a 12+2 gene expression signature for UCC diagnosis and prediction of tumor aggressiveness on urine samples. Overall, this gene set panel had 98% sensitivity (SN) and 99% specificity (SP) in discriminating between UCC and control samples and 79% SN and 92% SP in predicting tumor aggressiveness. The translation of the model to the clinically applicable format corroborates that the 12+2 gene set panel described maintains a high accuracy for UCC diagnosis (SN = 89% and SP = 95%) and tumor aggressiveness prediction (SN = 79% and SP = 91%) in voided urine samples. The 12+2 gene expression signature described in urine is able to identify patients suffering from UCC and predict tumor aggressiveness. We show that a panel of molecular markers may improve the schedule for diagnosis and follow-up in UCC patients. Copyright 2010 AACR.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friddle, Carl J; Koga, Teiichiro; Rubin, Edward M.
2000-03-15
While cardiac hypertrophy has been the subject of intensive investigation, regression of hypertrophy has been significantly less studied, precluding large-scale analysis of the relationship between these processes. In the present study, using pharmacological models of hypertrophy in mice, expression profiling was performed with fragments of more than 3,000 genes to characterize and contrast expression changes during induction and regression of hypertrophy. Administration of angiotensin II and isoproterenol by osmotic minipump produced increases in heart weight (15% and 40% respectively) that returned to pre-induction size following drug withdrawal. From multiple expression analyses of left ventricular RNA isolated at daily time-points duringmore » cardiac hypertrophy and regression, we identified sets of genes whose expression was altered at specific stages of this process. While confirming the participation of 25 genes or pathways previously known to be altered by hypertrophy, a larger set of 30 genes was identified whose expression had not previously been associated with cardiac hypertrophy or regression. Of the 55 genes that showed reproducible changes during the time course of induction and regression, 32 genes were altered only during induction and 8 were altered only during regression. This study identified both known and novel genes whose expression is affected at different stages of cardiac hypertrophy and regression and demonstrates that cardiac remodeling during regression utilizes a set of genes that are distinct from those used during induction of hypertrophy.« less
Tommasini, Livia; Svensson, Jan T; Rodriguez, Edmundo M; Wahid, Abdul; Malatrasi, Marina; Kato, Kenji; Wanamaker, Steve; Resnik, Josh; Close, Timothy J
2008-11-01
Low temperature and drought have major influences on plant growth and productivity. To identify barley genes involved in responses to these stresses and to specifically test the hypothesis that the dehydrin (Dhn) multigene family can serve as an indicator of the entire transcriptome response, we investigated the response of barley cv. Morex to: (1) gradual drought over 21 days and (2) low temperature including chilling, freeze-thaw cycles, and deacclimation over 33 days. We found 4,153 genes that responded to at least one component of these two stress regimes, about one fourth of all genes called "present" under any condition. About 44% (1,822 of 4,153) responded specifically to drought, whereas only 3.8% (158 of 4,153) were chilling specific and 2.8% (119 of 4,153) freeze-thaw specific, with 34.1% responsive to freeze-thaw and drought. The intersection between chilling and drought (31.9%) was somewhat smaller than the intersection between freeze-thaw and drought, implying an element of osmotic stress response to freeze-thaw. About 82.4% of the responsive genes were similar to Arabidopsis genes. The expression of 13 barley Dhn genes mirrored the global clustering of all transcripts, with specific combinations of Dhn genes providing an excellent indicator of each stress response. Data from these studies provide a robust reference data set for abiotic stress.
Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Naohiro; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming
2016-01-05
Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1(+) mesoderm and then promotes hematopoietic differentiation through regulation of hoxb pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated knockdown or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1(+) precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1(+) precursors and differentiation of Flk1(+) cells into hematopoietic lineages. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Literature-based compound profiling: application to toxicogenomics.
Frijters, Raoul; Verhoeven, Stefan; Alkema, Wynand; van Schaik, René; Polman, Jan
2007-11-01
To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective of this study was to introduce an additional approach, in which literature information is used for compound profiling to evaluate compound toxicity and mode of toxicity. Gene annotations were built by text mining in Medline abstracts for retrieval of co-publications between genes, pathology terms, biological processes and pathways. This literature information was used to generate compound-specific keyword fingerprints, representing over-represented keywords calculated in a set of regulated genes after compound administration. To see whether keyword fingerprints can be used for assessment of compound toxicity, we analyzed microarray data sets of rat liver treated with 11 hepatotoxicants. Analysis of keyword fingerprints of two genotoxic carcinogens, two nongenotoxic carcinogens, two peroxisome proliferators and two randomly generated gene sets, showed that each compound produced a specific keyword fingerprint that correlated with the experimentally observed histopathological events induced by the individual compounds. By contrast, the random sets produced a flat aspecific keyword profile, indicating that the fingerprints induced by the compounds reflect biological events rather than random noise. A more detailed analysis of the keyword profiles of diethylhexylphthalate, dimethylnitrosamine and methapyrilene (MPy) showed that the differences in the keyword fingerprints of these three compounds are based upon known distinct modes of action. Visualization of MPy-linked keywords and MPy-induced genes in a literature network enabled us to construct a mode of toxicity proposal for MPy, which is in agreement with known effects of MPy in literature. Compound keyword fingerprinting based on information retrieved from literature is a powerful approach for compound profiling, allowing evaluation of compound toxicity and analysis of the mode of action.
Martin, Guiomar; Soy, Judit; Monte, Elena
2016-01-01
Members of the PIF quartet (PIFq; PIF1, PIF3, PIF4, and PIF5) collectively contribute to induce growth in Arabidopsis seedlings under short day (SD) conditions, specifically promoting elongation at dawn. Their action involves the direct regulation of growth-related and hormone-associated genes. However, a comprehensive definition of the PIFq-regulated transcriptome under SD is still lacking. We have recently shown that SD and free-running (LL) conditions correspond to "growth" and "no growth" conditions, respectively, correlating with greater abundance of PIF protein in SD. Here, we present a genomic analysis whereby we first define SD-regulated genes at dawn compared to LL in the wild type, followed by identification of those SD-regulated genes whose expression depends on the presence of PIFq. By using this sequential strategy, we have identified 349 PIF/SD-regulated genes, approximately 55% induced and 42% repressed by both SD and PIFq. Comparison with available databases indicates that PIF/SD-induced and PIF/SD-repressed sets are differently phased at dawn and mid-morning, respectively. In addition, we found that whereas rhythmicity of the PIF/SD-induced gene set is lost in LL, most PIF/SD-repressed genes keep their rhythmicity in LL, suggesting differential regulation of both gene sets by the circadian clock. Moreover, we also uncovered distinct overrepresented functions in the induced and repressed gene sets, in accord with previous studies in other examined PIF-regulated processes. Interestingly, promoter analyses showed that, whereas PIF/SD-induced genes are enriched in direct PIF targets, PIF/SD-repressed genes are mostly indirectly regulated by the PIFs and might be more enriched in ABA-regulated genes.
Functionally Enigmatic Genes: A Case Study of the Brain Ignorome
Pandey, Ashutosh K.; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W.
2014-01-01
What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed—the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases—ELMOD1, TMEM88B, and DZANK1—we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes. PMID:24523945
Functionally enigmatic genes: a case study of the brain ignorome.
Pandey, Ashutosh K; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W
2014-01-01
What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed--the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum--a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases--ELMOD1, TMEM88B, and DZANK1--we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes.
Weier, Heinz -Ulrich G
2015-08-04
Herein are described multicolor FISH probe sets termed "genetic barcodes" targeting several cancer or disease-related loci to assess gene rearrangements and copy number changes in tumor cells. Two, three or more different fluorophores are used to detect the genetic barcode sections thus permitting unique labeling and multilocus analysis in individual cell nuclei. Gene specific barcodes can be generated and combined to provide both numerical and structural genetic information for these and other pertinent disease associated genes.
Method to determine transcriptional regulation pathways in organisms
Gardner, Timothy S.; Collins, James J.; Hayete, Boris; Faith, Jeremiah
2012-11-06
The invention relates to computer-implemented methods and systems for identifying regulatory relationships between expressed regulating polypeptides and targets of the regulatory activities of such regulating polypeptides. More specifically, the invention provides a new method for identifying regulatory dependencies between biochemical species in a cell. In particular embodiments, provided are computer-implemented methods for identifying a regulatory interaction between a transcription factor and a gene target of the transcription factor, or between a transcription factor and a set of gene targets of the transcription factor. Further provided are genome-scale methods for predicting regulatory interactions between a set of transcription factors and a corresponding set of transcriptional target substrates thereof.
De Nicola, Raffaele; Hazelwood, Lucie A.; De Hulster, Erik A. F.; Walsh, Michael C.; Knijnenburg, Theo A.; Reinders, Marcel J. T.; Walker, Graeme M.; Pronk, Jack T.; Daran, Jean-Marc; Daran-Lapujade, Pascale
2007-01-01
Transcriptional responses of the yeast Saccharomyces cerevisiae to Zn availability were investigated at a fixed specific growth rate under limiting and abundant Zn concentrations in chemostat culture. To investigate the context dependency of this transcriptional response and eliminate growth rate-dependent variations in transcription, yeast was grown under several chemostat regimens, resulting in various carbon (glucose), nitrogen (ammonium), zinc, and oxygen supplies. A robust set of genes that responded consistently to Zn limitation was identified, and the set enabled the definition of the Zn-specific Zap1p regulon, comprised of 26 genes and characterized by a broader zinc-responsive element consensus (MHHAACCBYNMRGGT) than so far described. Most surprising was the Zn-dependent regulation of genes involved in storage carbohydrate metabolism. Their concerted down-regulation was physiologically relevant as revealed by a substantial decrease in glycogen and trehalose cellular content under Zn limitation. An unexpectedly large number of genes were synergistically or antagonistically regulated by oxygen and Zn availability. This combinatorial regulation suggested a more prominent involvement of Zn in mitochondrial biogenesis and function than hitherto identified. PMID:17933919
2013-01-01
Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432
Behr, Jürgen; Geissler, Andreas J; Preissler, Patrick; Ehrenreich, Armin; Angelov, Angel; Vogel, Rudi F
2015-10-01
The tolerance to hop compounds, which is mainly associated with inhibition of bacterial growth in beer, is a multi-factorial trait. Any approaches to predict the physiological differences between beer-spoiling and non-spoiling strains on the basis of a single marker gene are limited. We identified ecotype-specific genes related to the ability to grow in Pilsner beer via comparative genome sequencing. The genome sequences of four different strains of Lactobacillus brevis were compared, including newly established genomes of two highly hop tolerant beer isolates, one strain isolated from faeces and one published genome of a silage isolate. Gene fragments exclusively occurring in beer-spoiling strains as well as sequences only occurring in non-spoiling strains were identified. Comparative genomic arrays were established and hybridized with a set of L. brevis strains, which are characterized by their ability to spoil beer. As result, a set of 33 and 4 oligonucleotide probes could be established specifically detecting beer-spoilers and non-spoilers, respectively. The detection of more than one of these marker sequences according to a genetic barcode enables scoring of L. brevis for their beer-spoiling potential and can thus assist in risk evaluation in brewing industry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Vandenbon, Alexis; Dinh, Viet H.; Mikami, Norihisa; Kitagawa, Yohko; Teraguchi, Shunsuke; Ohkura, Naganari; Sakaguchi, Shimon
2016-01-01
High-throughput gene expression data are one of the primary resources for exploring complex intracellular dynamics in modern biology. The integration of large amounts of public data may allow us to examine general dynamical relationships between regulators and target genes. However, obstacles for such analyses are study-specific biases or batch effects in the original data. Here we present Immuno-Navigator, a batch-corrected gene expression and coexpression database for 24 cell types of the mouse immune system. We systematically removed batch effects from the underlying gene expression data and showed that this removal considerably improved the consistency between inferred correlations and prior knowledge. The data revealed widespread cell type-specific correlation of expression. Integrated analysis tools allow users to use this correlation of expression for the generation of hypotheses about biological networks and candidate regulators in specific cell types. We show several applications of Immuno-Navigator as examples. In one application we successfully predicted known regulators of importance in naturally occurring Treg cells from their expression correlation with a set of Treg-specific genes. For one high-scoring gene, integrin β8 (Itgb8), we confirmed an association between Itgb8 expression in forkhead box P3 (Foxp3)-positive T cells and Treg-specific epigenetic remodeling. Our results also suggest that the regulation of Treg-specific genes within Treg cells is relatively independent of Foxp3 expression, supporting recent results pointing to a Foxp3-independent component in the development of Treg cells. PMID:27078110
Effect of storage time on gene expression data acquired from unfrozen archived newborn blood spots.
Ho, Nhan T; Busik, Julia V; Resau, James H; Paneth, Nigel; Khoo, Sok Kean
2016-11-01
Unfrozen archived newborn blood spots (NBS) have been shown to retain sufficient messenger RNA (mRNA) for gene expression profiling. However, the effect of storage time at ambient temperature for NBS samples in relation to the quality of gene expression data is relatively unknown. Here, we evaluated mRNA expression from quantitative real-time PCR (qRT-PCR) and microarray data obtained from NBS samples stored at ambient temperature to determine the effect of storage time on the quality of gene expression. These data were generated in a previous case-control study examining NBS in 53 children with cerebral palsy (CP) and 53 matched controls. NBS sample storage period ranged from 3 to 16years at ambient temperature. We found persistently low RNA integrity numbers (RIN=2.3±0.71) and 28S/18S rRNA ratios (~0) across NBS samples for all storage periods. In both qRT-PCR and microarray data, the expression of three common housekeeping genes-beta cytoskeletal actin (ACTB), glyceraldehyde 3-phosphate dehydrogenase (GAPDH), and peptidylprolyl isomerase A (PPIA)-decreased with increased storage time. Median values of each microarray probe intensity at log 2 scale also decreased over time. After eight years of storage, probe intensity values were largely reduced to background intensity levels. Of 21,500 genes tested, 89% significantly decreased in signal intensity, with 13,551, 10,730, and 9925 genes detected within 5years, > 5 to <10years, and >10years of storage, respectively. We also examined the expression of two gender-specific genes (X inactivation-specific transcript, XIST and lysine-specific demethylase 5D, KDM5D) and seven gene sets representing the inflammatory, hypoxic, coagulative, and thyroidal pathways hypothesized to be related to CP risk to determine the effect of storage time on the detection of these biologically relevant genes. We found the gender-specific genes and CP-related gene sets detectable in all storage periods, but exhibited differential expression (between male vs. female or CP vs. control) only within the first six years of storage. We concluded that gene expression data quality deteriorates in unfrozen archived NBS over time and that differential gene expression profiling and analysis is recommended for those NBS samples collected and stored within six years at ambient temperature. Copyright © 2016 Elsevier Inc. All rights reserved.
Molecular method for determining sex of walruses
Fischbach, Anthony S.; Jay, C.V.; Jackson, J.V.; Andersen, L.W.; Sage, G.K.; Talbot, S.L.
2008-01-01
We evaluated the ability of a set of published trans-species molecular sexing primers and a set of walrus-specific primers, which we developed, to accurately identify sex of 235 Pacific walruses (Odobenus rosmarus divergens). The trans-species primers were developed for mammals and targeted the X- and Y-gametologs of the zinc finger protein genes (ZFX, ZFY). We extended this method by using these primers to obtain sequence from Pacific and Atlantic walrus (0. r. rosmarus) ZFX and ZFY genes to develop new walrus-specific primers, which yield polymerase chain reaction products of distinct lengths (327 and 288 base pairs from the X- and Y-chromosome, respectively), allowing them to be used for sex determination. Both methods yielded a determination of sex in all but 1-2% of samples with an accuracy of 99.6-100%. Our walrus-specific primers offer the advantage of small fragment size and facile application to automated electrophoresis and visualization.
Król, Jaroslaw; Bania, Jacek; Florek, Magdalena; Pliszczak-Król, Aleksandra; Staroniewicz, Zdzislaw
2011-05-01
A set of polymerase chain reaction (PCR) assays for identification of the most important Pasteurellaceae species encountered in cats and dogs were developed. Primers for Pasteurella multocida were designed to detect a fragment of the kmt, a gene encoding the outer-membrane protein. Primers specific to Pasteurella canis, Pasteurella dagmatis, and Pasteurella stomatis were based on the manganese-dependent superoxide dismutase gene (sodA) and those specific to [Haemophilus] haemoglobinophilus on species-specific sequences of the 16S ribosomal RNA gene. All the primers were tested on respective reference and control strains and applied to the identification of 47 canine and feline field isolates of Pasteurellaceae. The PCR assays were shown to be species specific, providing a valuable supplement to phenotypic identification of species within this group of bacteria. © 2011 The Author(s)
Jourda, Cyril; Cardi, Céline; Gibert, Olivier; Giraldo Toro, Andrès; Ricci, Julien; Mbéguié-A-Mbéguié, Didier; Yahiaoui, Nabila
2016-01-01
Starch is the most widespread and abundant storage carbohydrate in plants. It is also a major feature of cultivated bananas as it accumulates to large amounts during banana fruit development before almost complete conversion to soluble sugars during ripening. Little is known about the structure of major gene families involved in banana starch metabolism and their evolution compared to other species. To identify genes involved in banana starch metabolism and investigate their evolutionary history, we analyzed six gene families playing a crucial role in plant starch biosynthesis and degradation: the ADP-glucose pyrophosphorylases (AGPases), starch synthases (SS), starch branching enzymes (SBE), debranching enzymes (DBE), α-amylases (AMY) and β-amylases (BAM). Using comparative genomics and phylogenetic approaches, these genes were classified into families and sub-families and orthology relationships with functional genes in Eudicots and in grasses were identified. In addition to known ancestral duplications shaping starch metabolism gene families, independent evolution in banana and grasses also occurred through lineage-specific whole genome duplications for specific sub-families of AGPase, SS, SBE, and BAM genes; and through gene-scale duplications for AMY genes. In particular, banana lineage duplications yielded a set of AGPase, SBE and BAM genes that were highly or specifically expressed in banana fruits. Gene expression analysis highlighted a complex transcriptional reprogramming of starch metabolism genes during ripening of banana fruits. A differential regulation of expression between banana gene duplicates was identified for SBE and BAM genes, suggesting that part of starch metabolism regulation in the fruit evolved in the banana lineage. PMID:27994606
A polymorphism in the bovine gamma-S-crystallin gene revealed by allele-specific amplification.
Kemp, S J; Maillard, J C; Teale, A J
1993-04-01
A polymorphism was detected in the 3' untranslated region of the bovine gamma-S-crystallin gene by direct sequencing of polymerase chain reaction (PCR) products from genomic DNA of an N'Dama bull and a Boran cow. A set of three PCR primers was designed to detect this difference and thus give allele-specific amplification. The two allele-specific primers differ in length by 20 nucleotides so that the allelic products may be distinguished by simple agarose gel electrophoresis following a single PCR reaction. This provides a simple and rapid assay for this polymorphism.
The phenotypic manifestations of rare genic CNVs in autism spectrum disorder
Merikangas, A K; Segurado, R; Heron, E A; Anney, R J L; Paterson, A D; Cook, E H; Pinto, D; Scherer, S W; Szatmari, P; Gill, M; Corvin, A P; Gallagher, L
2015-01-01
Significant evidence exists for the association between copy number variants (CNVs) and Autism Spectrum Disorder (ASD); however, most of this work has focused solely on the diagnosis of ASD. There is limited understanding of the impact of CNVs on the ‘sub-phenotypes' of ASD. The objective of this paper is to evaluate associations between CNVs in differentially brain expressed (DBE) genes or genes previously implicated in ASD/intellectual disability (ASD/ID) and specific sub-phenotypes of ASD. The sample consisted of 1590 cases of European ancestry from the Autism Genome Project (AGP) with a diagnosis of an ASD and at least one rare CNV impacting any gene and a core set of phenotypic measures, including symptom severity, language impairments, seizures, gait disturbances, intelligence quotient (IQ) and adaptive function, as well as paternal and maternal age. Classification analyses using a non-parametric recursive partitioning method (random forests) were employed to define sets of phenotypic characteristics that best classify the CNV-defined groups. There was substantial variation in the classification accuracy of the two sets of genes. The best variables for classification were verbal IQ for the ASD/ID genes, paternal age at birth for the DBE genes and adaptive function for de novo CNVs. CNVs in the ASD/ID list were primarily associated with communication and language domains, whereas CNVs in DBE genes were related to broader manifestations of adaptive function. To our knowledge, this is the first study to examine the associations between sub-phenotypes and CNVs genome-wide in ASD. This work highlights the importance of examining the diverse sub-phenotypic manifestations of CNVs in ASD, including the specific features, comorbid conditions and clinical correlates of ASD that comprise underlying characteristics of the disorder. PMID:25421404
The phenotypic manifestations of rare genic CNVs in autism spectrum disorder.
Merikangas, A K; Segurado, R; Heron, E A; Anney, R J L; Paterson, A D; Cook, E H; Pinto, D; Scherer, S W; Szatmari, P; Gill, M; Corvin, A P; Gallagher, L
2015-11-01
Significant evidence exists for the association between copy number variants (CNVs) and Autism Spectrum Disorder (ASD); however, most of this work has focused solely on the diagnosis of ASD. There is limited understanding of the impact of CNVs on the 'sub-phenotypes' of ASD. The objective of this paper is to evaluate associations between CNVs in differentially brain expressed (DBE) genes or genes previously implicated in ASD/intellectual disability (ASD/ID) and specific sub-phenotypes of ASD. The sample consisted of 1590 cases of European ancestry from the Autism Genome Project (AGP) with a diagnosis of an ASD and at least one rare CNV impacting any gene and a core set of phenotypic measures, including symptom severity, language impairments, seizures, gait disturbances, intelligence quotient (IQ) and adaptive function, as well as paternal and maternal age. Classification analyses using a non-parametric recursive partitioning method (random forests) were employed to define sets of phenotypic characteristics that best classify the CNV-defined groups. There was substantial variation in the classification accuracy of the two sets of genes. The best variables for classification were verbal IQ for the ASD/ID genes, paternal age at birth for the DBE genes and adaptive function for de novo CNVs. CNVs in the ASD/ID list were primarily associated with communication and language domains, whereas CNVs in DBE genes were related to broader manifestations of adaptive function. To our knowledge, this is the first study to examine the associations between sub-phenotypes and CNVs genome-wide in ASD. This work highlights the importance of examining the diverse sub-phenotypic manifestations of CNVs in ASD, including the specific features, comorbid conditions and clinical correlates of ASD that comprise underlying characteristics of the disorder.
Orthopoxvirus Genome Evolution: The Role of Gene Loss
Hendrickson, Robert Curtis; Wang, Chunlin; Hatcher, Eneida L.; Lefkowitz, Elliot J.
2010-01-01
Poxviruses are highly successful pathogens, known to infect a variety of hosts. The family Poxviridae includes Variola virus, the causative agent of smallpox, which has been eradicated as a public health threat but could potentially reemerge as a bioterrorist threat. The risk scenario includes other animal poxviruses and genetically engineered manipulations of poxviruses. Studies of orthologous gene sets have established the evolutionary relationships of members within the Poxviridae family. It is not clear, however, how variations between family members arose in the past, an important issue in understanding how these viruses may vary and possibly produce future threats. Using a newly developed poxvirus-specific tool, we predicted accurate gene sets for viruses with completely sequenced genomes in the genus Orthopoxvirus. Employing sensitive sequence comparison techniques together with comparison of syntenic gene maps, we established the relationships between all viral gene sets. These techniques allowed us to unambiguously identify the gene loss/gain events that have occurred over the course of orthopoxvirus evolution. It is clear that for all existing Orthopoxvirus species, no individual species has acquired protein-coding genes unique to that species. All existing species contain genes that are all present in members of the species Cowpox virus and that cowpox virus strains contain every gene present in any other orthopoxvirus strain. These results support a theory of reductive evolution in which the reduction in size of the core gene set of a putative ancestral virus played a critical role in speciation and confining any newly emerging virus species to a particular environmental (host or tissue) niche. PMID:21994715
Kugler, Jamie E; Kerner, Pierre; Bouquet, Jean-Marie; Jiang, Di; Di Gregorio, Anna
2011-01-20
The notochord is a defining feature of the chordate clade, and invertebrate chordates, such as tunicates, are uniquely suited for studies of this structure. Here we used a well-characterized set of 50 notochord genes known to be targets of the notochord-specific Brachyury transcription factor in one tunicate, Ciona intestinalis (Class Ascidiacea), to begin determining whether the same genetic toolkit is employed to build the notochord in another tunicate, Oikopleura dioica (Class Larvacea). We identified Oikopleura orthologs of the Ciona notochord genes, as well as lineage-specific duplicates for which we determined the phylogenetic relationships with related genes from other chordates, and we analyzed their expression patterns in Oikopleura embryos. Of the 50 Ciona notochord genes that were used as a reference, only 26 had clearly identifiable orthologs in Oikopleura. Two of these conserved genes appeared to have undergone Oikopleura- and/or tunicate-specific duplications, and one was present in three copies in Oikopleura, thus bringing the number of genes to test to 30. We were able to clone and test 28 of these genes. Thirteen of the 28 Oikopleura orthologs of Ciona notochord genes showed clear expression in all or in part of the Oikopleura notochord, seven were diffusely expressed throughout the tail, six were expressed in tissues other than the notochord, while two probes did not provide a detectable signal at any of the stages analyzed. One of the notochord genes identified, Oikopleura netrin, was found to be unevenly expressed in notochord cells, in a pattern reminiscent of that previously observed for one of the Oikopleura Hox genes. A surprisingly high number of Ciona notochord genes do not have apparent counterparts in Oikopleura, and only a fraction of the evolutionarily conserved genes show clear notochord expression. This suggests that Ciona and Oikopleura, despite the morphological similarities of their notochords, have developed rather divergent sets of notochord genes after their split from a common tunicate ancestor. This study demonstrates that comparisons between divergent tunicates can lead to insights into the basic complement of genes sufficient for notochord development, and elucidate the constraints that control its composition.
2011-01-01
Background The notochord is a defining feature of the chordate clade, and invertebrate chordates, such as tunicates, are uniquely suited for studies of this structure. Here we used a well-characterized set of 50 notochord genes known to be targets of the notochord-specific Brachyury transcription factor in one tunicate, Ciona intestinalis (Class Ascidiacea), to begin determining whether the same genetic toolkit is employed to build the notochord in another tunicate, Oikopleura dioica (Class Larvacea). We identified Oikopleura orthologs of the Ciona notochord genes, as well as lineage-specific duplicates for which we determined the phylogenetic relationships with related genes from other chordates, and we analyzed their expression patterns in Oikopleura embryos. Results Of the 50 Ciona notochord genes that were used as a reference, only 26 had clearly identifiable orthologs in Oikopleura. Two of these conserved genes appeared to have undergone Oikopleura- and/or tunicate-specific duplications, and one was present in three copies in Oikopleura, thus bringing the number of genes to test to 30. We were able to clone and test 28 of these genes. Thirteen of the 28 Oikopleura orthologs of Ciona notochord genes showed clear expression in all or in part of the Oikopleura notochord, seven were diffusely expressed throughout the tail, six were expressed in tissues other than the notochord, while two probes did not provide a detectable signal at any of the stages analyzed. One of the notochord genes identified, Oikopleura netrin, was found to be unevenly expressed in notochord cells, in a pattern reminiscent of that previously observed for one of the Oikopleura Hox genes. Conclusions A surprisingly high number of Ciona notochord genes do not have apparent counterparts in Oikopleura, and only a fraction of the evolutionarily conserved genes show clear notochord expression. This suggests that Ciona and Oikopleura, despite the morphological similarities of their notochords, have developed rather divergent sets of notochord genes after their split from a common tunicate ancestor. This study demonstrates that comparisons between divergent tunicates can lead to insights into the basic complement of genes sufficient for notochord development, and elucidate the constraints that control its composition. PMID:21251251
Zha, Xianfeng; Yin, Qingsong; Tan, Huo; Wang, Chunyan; Chen, Shaohua; Yang, Lijian; Li, Bo; Wu, Xiuli; Li, Yangqiu
2013-05-01
Antigen-specific, T-cell receptor (TCR)-modified cytotoxic T lymphocytes (CTLs) that target tumors are an attractive strategy for specific adoptive immunotherapy. Little is known about whether there are any alterations in the gene expression profile after TCR gene transduction in T cells. We constructed TCR gene-redirected CTLs with specificity for diffuse large B-cell lymphoma (DLBCL)-associated antigens to elucidate the gene expression profiles of TCR gene-redirected T-cells, and we further analyzed the gene expression profile pattern of these redirected T-cells by Affymetrix microarrays. The resulting data were analyzed using Bioconductor software, a two-fold cut-off expression change was applied together with anti-correlation of the profile ratios to render the microarray analysis set. The fold change of all genes was calculated by comparing the three TCR gene-modified T-cells and a negative control counterpart. The gene pathways were analyzed using Bioconductor and Kyoto Encyclopedia of Genes and Genomes. Identical genes whose fold change was greater than or equal to 2.0 in all three TCR gene-redirected T-cell groups in comparison with the negative control were identified as the differentially expressed genes. The differentially expressed genes were comprised of 33 up-regulated genes and 1 down-regulated gene including JUNB, FOS, TNF, INF-γ, DUSP2, IL-1B, CXCL1, CXCL2, CXCL9, CCL2, CCL4, and CCL8. These genes are mainly involved in the TCR signaling, mitogen-activated protein kinase signaling, and cytokine-cytokine receptor interaction pathways. In conclusion, we characterized the gene expression profile of DLBCL-specific TCR gene-redirected T-cells. The changes corresponded to an up-regulation in the differentiation and proliferation of the T-cells. These data may help to explain some of the characteristics of the redirected T-cells.
Latent Gammaherpesvirus 68 Infection Induces Distinct Transcriptional Changes in Different Organs
Canny, Susan P.; Goel, Gautam; Reese, Tiffany A.; Zhang, Xin; Xavier, Ramnik
2014-01-01
Previous studies identified a role for latent herpesvirus infection in cross-protection against infection and exacerbation of chronic inflammatory diseases. Here, we identified more than 500 genes differentially expressed in spleens, livers, or brains of mice latently infected with gammaherpesvirus 68 and found that distinct sets of genes linked to different pathways were altered in the spleen compared to those in the liver. Several of the most differentially expressed latency-specific genes (e.g., the gamma interferon [IFN-γ], Cxcl9, and Ccl5 genes) are associated with known latency-specific phenotypes. Chronic herpesvirus infection, therefore, significantly alters the transcriptional status of host organs. We speculate that such changes may influence host physiology, the status of the immune system, and disease susceptibility. PMID:24155394
Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe
Duhig, Trevor; Nam, Miyoung; Palmer, Georgia; Han, Sangjo; Jeffery, Linda; Baek, Seung-Tae; Lee, Hyemi; Shim, Young Sam; Lee, Minho; Kim, Lila; Heo, Kyung-Sun; Noh, Eun Joo; Lee, Ah-Reum; Jang, Young-Joo; Chung, Kyung-Sook; Choi, Shin-Jung; Park, Jo-Young; Park, Youngwoo; Kim, Hwan Mook; Park, Song-Kyu; Park, Hae-Joon; Kang, Eun-Jung; Kim, Hyong Bai; Kang, Hyun-Sam; Park, Hee-Moon; Kim, Kyunghoon; Song, Kiwon; Song, Kyung Bin; Nurse, Paul; Hoe, Kwang-Lae
2014-01-01
SUMMARY We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome. This resource provides a powerful tool for biotechnological and eukaryotic cell biology research. Comprehensive gene dispensability comparisons with budding yeast, the first time such studies have been possible between two eukaryotes, revealed that 83% of single copy orthologues in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than non-essential genes to be single copy, broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth. PMID:20473289
Strategies to explore functional genomics data sets in NCBI's GEO database.
Wilhite, Stephen E; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database
Wilhite, Stephen E.; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
Gene expression analysis using a highly sensitive DNA microarray for colorectal cancer screening.
Koga, Yoshikatsu; Yamazaki, Nobuyoshi; Takizawa, Satoko; Kawauchi, Junpei; Nomura, Osamu; Yamamoto, Seiichiro; Saito, Norio; Kakugawa, Yasuo; Otake, Yosuke; Matsumoto, Minori; Matsumura, Yasuhiro
2014-01-01
Half of all patients with small, right-sided, non-metastatic colorectal cancer (CRC) have negative results for the fecal occult blood test (FOBT). In the present study, the usefulness of CRC screening with a highly sensitive DNA microarray was evaluated in comparison with that by FOBT using fecal samples. A total of 53 patients with CRC and 61 healthy controls were divided into "training" and "validation sets". For the gene profiling, total RNA extracted from 0.5 g of feces was hybridized to a highly sensitive DNA chip. The expressions of 43 genes were significantly higher in the patients with CRC than in healthy controls (p<0.05). In the training set, the sensitivity and specificity of the DNA chip assay using six genes were 85.4% and 85.2%, respectively. On the other hand, in the validation set, the sensitivity and specificity of the DNA chip assay were 85.2% and 85.7%, respectively. The sensitivities of the DNA chip assay were higher than those of FOBT in cases of the small, right-sided, early-CRC, tumor invading up to the muscularis propria (i.e. surface tumor) subgroups. In particular, the sensitivities of the DNA chip assay in the surface tumor and early-CRC subgroups were significantly higher than those of FOBT (p=0.023 and 0.019, respectively.). Gene profiling assay using a highly sensitive DNA chip was more effective than FOBT at detecting patients with small, right-sided, surface tumor, and early-stage CRC.
Knapp, Dunja; Schulz, Herbert; Rascon, Cynthia Alexander; Volkmer, Michael; Scholz, Juliane; Nacu, Eugen; Le, Mu; Novozhilov, Sergey; Tazaki, Akira; Protze, Stephanie; Jacob, Tina; Hubner, Norbert; Habermann, Bianca; Tanaka, Elly M.
2013-01-01
Understanding how the limb blastema is established after the initial wound healing response is an important aspect of regeneration research. Here we performed parallel expression profile time courses of healing lateral wounds versus amputated limbs in axolotl. This comparison between wound healing and regeneration allowed us to identify amputation-specific genes. By clustering the expression profiles of these samples, we could detect three distinguishable phases of gene expression – early wound healing followed by a transition-phase leading to establishment of the limb development program, which correspond to the three phases of limb regeneration that had been defined by morphological criteria. By focusing on the transition-phase, we identified 93 strictly amputation-associated genes many of which are implicated in oxidative-stress response, chromatin modification, epithelial development or limb development. We further classified the genes based on whether they were or were not significantly expressed in the developing limb bud. The specific localization of 53 selected candidates within the blastema was investigated by in situ hybridization. In summary, we identified a set of genes that are expressed specifically during regeneration and are therefore, likely candidates for the regulation of blastema formation. PMID:23658691
Tissue-specific NETs alter genome organization and regulation even in a heterologous system.
de Las Heras, Jose I; Zuleger, Nikolaj; Batrakou, Dzmitry G; Czapiewski, Rafal; Kerr, Alastair R W; Schirmer, Eric C
2017-01-02
Different cell types exhibit distinct patterns of 3D genome organization that correlate with changes in gene expression in tissue and differentiation systems. Several tissue-specific nuclear envelope transmembrane proteins (NETs) have been found to influence the spatial positioning of genes and chromosomes that normally occurs during tissue differentiation. Here we study 3 such NETs: NET29, NET39, and NET47, which are expressed preferentially in fat, muscle and liver, respectively. We found that even when exogenously expressed in a heterologous system they can specify particular genome organization patterns and alter gene expression. Each NET affected largely different subsets of genes. Notably, the liver-specific NET47 upregulated many genes in HT1080 fibroblast cells that are normally upregulated in hepatogenesis, showing that tissue-specific NETs can favor expression patterns associated with the tissue where the NET is normally expressed. Similarly, global profiling of peripheral chromatin after exogenous expression of these NETs using lamin B1 DamID revealed that each NET affected the nuclear positioning of distinct sets of genomic regions with a significant tissue-specific component. Thus NET influences on genome organization can contribute to gene expression changes associated with differentiation even in the absence of other factors and overt cellular differentiation changes.
An internal regulatory element controls troponin I gene expression.
Yutzey, K E; Kline, R L; Konieczny, S F
1989-01-01
During skeletal myogenesis, approximately 20 contractile proteins and related gene products temporally accumulate as the cells fuse to form multinucleated muscle fibers. In most instances, the contractile protein genes are regulated transcriptionally, which suggests that a common molecular mechanism may coordinate the expression of this diverse and evolutionarily unrelated gene set. Recent studies have examined the muscle-specific cis-acting elements associated with numerous contractile protein genes. All of the identified regulatory elements are positioned in the 5'-flanking regions, usually within 1,500 base pairs of the transcription start site. Surprisingly, a DNA consensus sequence that is common to each contractile protein gene has not been identified. In contrast to the results of these earlier studies, we have found that the 5'-flanking region of the quail troponin I (TnI) gene is not sufficient to permit the normal myofiber transcriptional activation of the gene. Instead, the TnI gene utilizes a unique internal regulatory element that is responsible for the correct myofiber-specific expression pattern associated with the TnI gene. This is the first example in which a contractile protein gene has been shown to rely primarily on an internal regulatory element to elicit transcriptional activation during myogenesis. The diversity of regulatory elements associated with the contractile protein genes suggests that the temporal expression of the genes may involve individual cis-trans regulatory components specific for each gene. Images PMID:2725509
Watanabe, Yoshiyuki; Kim, Hyun Soo; Castoro, Ryan J.; Chung, Woonbok; Estecio, Marcos R. H.; Kondo, Kimie; Guo, Yi; Ahmed, Saira S.; Toyota, Minoru; Itoh, Fumio; Suk, Ki Tae; Cho, Mee-Yon; Shen, Lanlan; Jelinek, Jaroslav; Issa, Jean-Pierre J.
2009-01-01
Background & Aims Aberrant DNA methylation is an early and frequent process in gastric carcinogenesis and could be useful for detection of gastric neoplasia. We hypothesized that methylation analysis of DNA recovered from gastric washes could be used to detect gastric cancer. Methods We studied 51 candidate genes in 7 gastric cancer cell lines and 24 samples (training set) and identified 6 for further studies. We examined the methylation status of these genes in a test set consisting of 131 gastric neoplasias at various stages. Finally, we validated the 6 candidate genes in a different population of 40 primary gastric cancer samples and 113 non-neoplastic gastric mucosa samples. Results 6 genes (MINT25, RORA, GDNF, ADAM23, PRDM5, MLF1) showed frequent differential methylation between gastric cancer and normal mucosa in the training, test and validation sets. GDNF and MINT25 were most sensitive molecular markers of early stage gastric cancer while PRDM5 and MLF1 were markers of a field defect. There was a close correlation (r=0.5 to 0.9, p=0.03 to 0.001) between methylation levels in tumor biopsy and gastric washes. MINT25 methylation had the best sensitivity (90%), specificity (96%), and area under the ROC curve (0.961) in terms of tumor detection in gastric washes. Conclusions These findings suggest MINT25 is a sensitive and specific marker for screening in gastric cancer. Additionally we have developed a new methodology for gastric cancer detection by DNA methylation in gastric washes. PMID:19375421
Chang, Dan; Duda, Thomas F
2014-06-05
Predatory marine gastropods of the genus Conus exhibit substantial variation in venom composition both within and among species. Apart from mechanisms associated with extensive turnover of gene families and rapid evolution of genes that encode venom components ('conotoxins'), the evolution of distinct conotoxin expression patterns is an additional source of variation that may drive interspecific differences in the utilization of species' 'venom gene space'. To determine the evolution of expression patterns of venom genes of Conus species, we evaluated the expression of A-superfamily conotoxin genes of a set of closely related Conus species by comparing recovered transcripts of A-superfamily genes that were previously identified from the genomes of these species. We modified community phylogenetics approaches to incorporate phylogenetic history and disparity of genes and their expression profiles to determine patterns of venom gene space utilization. Less than half of the A-superfamily gene repertoire of these species is expressed, and only a few orthologous genes are coexpressed among species. Species exhibit substantially distinct expression strategies, with some expressing sets of closely related loci ('under-dispersed' expression of available genes) while others express sets of more disparate genes ('over-dispersed' expression). In addition, expressed genes show higher dN/dS values than either unexpressed or ancestral genes; this implies that expression exposes genes to selection and facilitates rapid evolution of these genes. Few recent lineage-specific gene duplicates are expressed simultaneously, suggesting that expression divergence among redundant gene copies may be established shortly after gene duplication. Our study demonstrates that venom gene space is explored differentially by Conus species, a process that effectively permits the independent and rapid evolution of venoms in these species.
Hu, Fengyi; Wang, Di; Zhao, Xiuqin; Zhang, Ting; Sun, Haixi; Zhu, Linghua; Zhang, Fan; Li, Lijuan; Li, Qiong; Tao, Dayun; Fu, Binying; Li, Zhikang
2011-01-24
Rhizomatousness is a key component of perenniality of many grasses that contribute to competitiveness and invasiveness of many noxious grass weeds, but can potentially be used to develop perennial cereal crops for sustainable farmers in hilly areas of tropical Asia. Oryza longistaminata, a perennial wild rice with strong rhizomes, has been used as the model species for genetic and molecular dissection of rhizome development and in breeding efforts to transfer rhizome-related traits into annual rice species. In this study, an effort was taken to get insights into the genes and molecular mechanisms underlying the rhizomatous trait in O. longistaminata by comparative analysis of the genome-wide tissue-specific gene expression patterns of five different tissues of O. longistaminata using the Affymetrix GeneChip Rice Genome Array. A total of 2,566 tissue-specific genes were identified in five different tissues of O. longistaminata, including 58 and 61 unique genes that were specifically expressed in the rhizome tips (RT) and internodes (RI), respectively. In addition, 162 genes were up-regulated and 261 genes were down-regulated in RT compared to the shoot tips. Six distinct cis-regulatory elements (CGACG, GCCGCC, GAGAC, AACGG, CATGCA, and TAAAG) were found to be significantly more abundant in the promoter regions of genes differentially expressed in RT than in the promoter regions of genes uniformly expressed in all other tissues. Many of the RT and/or RI specifically or differentially expressed genes were located in the QTL regions associated with rhizome expression, rhizome abundance and rhizome growth-related traits in O. longistaminata and thus are good candidate genes for these QTLs. The initiation and development of the rhizomatous trait in O. longistaminata are controlled by very complex gene networks involving several plant hormones and regulatory genes, different members of gene families showing tissue specificity and their regulated pathways. Auxin/IAA appears to act as a negative regulator in rhizome development, while GA acts as the activator in rhizome development. Co-localization of the genes specifically expressed in rhizome tips and rhizome internodes with the QTLs for rhizome traits identified a large set of candidate genes for rhizome initiation and development in rice for further confirmation.
Godec, Jernej; Tan, Yan; Liberzon, Arthur; Tamayo, Pablo; Bhattacharya, Sanchita; Butte, Atul J; Mesirov, Jill P; Haining, W Nicholas
2016-01-19
Gene-expression profiling has become a mainstay in immunology, but subtle changes in gene networks related to biological processes are hard to discern when comparing various datasets. For instance, conservation of the transcriptional response to sepsis in mouse models and human disease remains controversial. To improve transcriptional analysis in immunology, we created ImmuneSigDB: a manually annotated compendium of ∼5,000 gene-sets from diverse cell states, experimental manipulations, and genetic perturbations in immunology. Analysis using ImmuneSigDB identified signatures induced in activated myeloid cells and differentiating lymphocytes that were highly conserved between humans and mice. Sepsis triggered conserved patterns of gene expression in humans and mouse models. However, we also identified species-specific biological processes in the sepsis transcriptional response: although both species upregulated phagocytosis-related genes, a mitosis signature was specific to humans. ImmuneSigDB enables granular analysis of transcriptomic data to improve biological understanding of immune processes of the human and mouse immune systems. Copyright © 2016 Elsevier Inc. All rights reserved.
A PCR primer bank for quantitative gene expression analysis.
Wang, Xiaowei; Seed, Brian
2003-12-15
Although gene expression profiling by microarray analysis is a useful tool for assessing global levels of transcriptional activity, variability associated with the data sets usually requires that observed differences be validated by some other method, such as real-time quantitative polymerase chain reaction (real-time PCR). However, non-specific amplification of non-target genes is frequently observed in the latter, confounding the analysis in approximately 40% of real-time PCR attempts when primer-specific labels are not used. Here we present an experimentally validated algorithm for the identification of transcript-specific PCR primers on a genomic scale that can be applied to real-time PCR with sequence-independent detection methods. An online database, PrimerBank, has been created for researchers to retrieve primer information for their genes of interest. PrimerBank currently contains 147 404 primers encompassing most known human and mouse genes. The primer design algorithm has been tested by conventional and real-time PCR for a subset of 112 primer pairs with a success rate of 98.2%.
Alcohol-related Genes Show an Enrichment of Associations with a Persistent Externalizing Factor
Ashenhurst, James R.; Harden, K. Paige; Corbin, William R.; Fromme, Kim
2016-01-01
Research using twins has found that much of the variability in externalizing phenotypes – including alcohol and drug use, impulsive personality traits, risky sex and property crime – is explained by genetic factors. Nevertheless, identification of specific genes and variants associated with these traits has proven to be difficult, likely because individual differences in externalizing are explained by many genes of small individual effect. Moreover, twin research indicates that heritable variance in externalizing behaviors is mostly shared across the externalizing spectrum rather than specific to any behavior. We use a longitudinal, “deep phenotyping” approach to model a general externalizing factor reflecting persistent engagement in a variety of socially problematic behaviors measured at eleven assessment occasions spanning early adulthood (ages 18 to 28). In an ancestrally homogenous sample of non-Hispanic Whites (N = 337), we then tested for enrichment of associations between the persistent externalizing factor and a set of 3,281 polymorphisms within 104 genes that were previously identified as associated with alcohol-use behaviors. Next we tested for enrichment among domain-specific factors (e.g., property crime) composed of residual variance not accounted for by the common factor. Significance was determined relative to bootstrapped empirical thresholds derived from permutations of phenotypic data. Results indicated significant enrichment of genetic associations for persistent externalizing, but not for domain-specific factors. Consistent with twin research findings, these results suggest that genetic variants are broadly associated with externalizing behaviors rather than unique to specific behaviors. General Scientific Summary This study shows that variation in 104 genes is associated with socially problematic “externalizing” behavior, including substance misuse, property crime, risky sex, and aspects of impulsive personality. Importantly, this association was with the common variation across these behaviors rather than with the variation unique to any given behavior. The manuscript demonstrates a potentially advantageous technique for relating sets of hypothesized genes to complex traits or behaviors. PMID:27505405
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, M; Craft, D
Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less
ISAAC - InterSpecies Analysing Application using Containers.
Baier, Herbert; Schultz, Jörg
2014-01-15
Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.
Coregulation of srGAP1 by Wnt and Androgen Receptor Signaling: A New Target for Treatment of CRPC
2015-10-01
Specific Aim 1: Test the... Specific Aim2: Test the hypothesis that down regulating srGAP1 in CRPC cells change phenotypic...direct interaction between AR and β-catenin seemed to elicit a specific expression of a set of target genes in low androgen conditions in CRPC.
USDA Potato Small RNA Database
USDA-ARS?s Scientific Manuscript database
Small RNAs (sRNAs) are now understood to be involved in gene regulation, function and development. High throughput sequencing (HTS) of sRNAs generates large data sets for analyzing the abundance, source and roles for specific sRNAs. These sRNAs result from transcript degradation as well as specific ...
The genetic architecture of gene expression levels in wild baboons.
Tung, Jenny; Zhou, Xiang; Alberts, Susan C; Stephens, Matthew; Gilad, Yoav
2015-02-25
Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates.
The genetic architecture of gene expression levels in wild baboons
Tung, Jenny; Zhou, Xiang; Alberts, Susan C; Stephens, Matthew; Gilad, Yoav
2015-01-01
Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates. DOI: http://dx.doi.org/10.7554/eLife.04729.001 PMID:25714927
Dittmar, W James; McIver, Lauren; Michalak, Pawel; Garner, Harold R; Valdez, Gregorio
2014-07-01
The wealth of publicly available gene expression and genomic data provides unique opportunities for computational inference to discover groups of genes that function to control specific cellular processes. Such genes are likely to have co-evolved and be expressed in the same tissues and cells. Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets make this type of analysis difficult for the average end-user. Here, we describe the implementation of a web server that predicts genes involved in affecting specific cellular processes together with a gene of interest. We termed the server 'EvoCor', to denote that it detects functional relationships among genes through evolutionary analysis and gene expression correlation. This web server integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. This server is easy to use and freely available at http://pilot-hmm.vbi.vt.edu/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Nicoletti, Paola; Bansal, Mukesh; Lefebvre, Celine; Guarnieri, Paolo; Shen, Yufeng; Pe'er, Itsik; Califano, Andrea; Floratos, Aris
2015-01-01
Stevens-Johnson syndrome (SJS) and Toxic Epidermal Necrolysis (TEN) represent rare but serious adverse drug reactions (ADRs). Both are characterized by distinctive blistering lesions and significant mortality rates. While there is evidence for strong drug-specific genetic predisposition related to HLA alleles, recent genome wide association studies (GWAS) on European and Asian populations have failed to identify genetic susceptibility alleles that are common across multiple drugs. We hypothesize that this is a consequence of the low to moderate effect size of individual genetic risk factors. To test this hypothesis we developed Pointer, a new algorithm that assesses the aggregate effect of multiple low risk variants on a pathway using a gene set enrichment approach. A key advantage of our method is the capability to associate SNPs with genes by exploiting physical proximity as well as by using expression quantitative trait loci (eQTLs) that capture information about both cis- and trans-acting regulatory effects. We control for known bias-inducing aspects of enrichment based analyses, such as: 1) gene length, 2) gene set size, 3) presence of biologically related genes within the same linkage disequilibrium (LD) region, and, 4) genes shared among multiple gene sets. We applied this approach to publicly available SJS/TEN genome-wide genotype data and identified the ABC transporter and Proteasome pathways as potentially implicated in the genetic susceptibility of non-drug-specific SJS/TEN. We demonstrated that the innovative SNP-to-gene mapping phase of the method was essential in detecting the significant enrichment for those pathways. Analysis of an independent gene expression dataset provides supportive functional evidence for the involvement of Proteasome pathways in SJS/TEN cutaneous lesions. These results suggest that Pointer provides a useful framework for the integrative analysis of pharmacogenetic GWAS data, by increasing the power to detect aggregate effects of multiple low risk variants. The software is available for download at https://sourceforge.net/projects/pointergsa/.
Kakrana, Atul; Kumar, Anil; Satheesh, Viswanathan; Abdin, M. Z.; Subramaniam, Kuppuswamy; Bhattacharya, R. C.; Srinivasan, Ramamurthy; Sirohi, Anil; Jain, Pradeep K.
2017-01-01
The root-knot nematode (RKN), Meloidogyne incognita, is an obligate, sedentary endoparasite that infects a large number of crops and severely affects productivity. The commonly used nematode control strategies have their own limitations. Of late, RNA interference (RNAi) has become a popular approach for the development of nematode resistance in plants. Transgenic crops capable of expressing dsRNAs, specifically in roots for disrupting the parasitic process, offer an effective and efficient means of producing resistant crops. We identified nematode-responsive and root-specific (NRRS) promoters by using microarray data from the public domain and known conserved cis-elements. A set of 51 NRRS genes was identified which was narrowed down further on the basis of presence of cis-elements combined with minimal expression in the absence of nematode infection. The comparative analysis of promoters from the enriched NRRS set, along with earlier reported nematode-responsive genes, led to the identification of specific cis-elements. The promoters of two candidate genes were used to generate transgenic plants harboring promoter GUS constructs and tested in planta against nematodes. Both promoters showed preferential expression upon nematode infection, exclusively in the root in one and galls in the other. One of these NRRS promoters was used to drive the expression of splicing factor, a nematode-specific gene, for generating host-delivered RNAi-mediated nematode-resistant plants. Transgenic lines expressing dsRNA of splicing factor under the NRRS promoter exhibited upto a 32% reduction in number of galls compared to control plants. PMID:29312363
An internal regulatory element controls troponin I gene expression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yutzey, K.E.; Kline, R.L.; Konieczmy, S.F.
1989-04-01
During skeletal myogenesis, approximately 20 contractile proteins and related gene products temporally accumulate as the cells fuse to form multinucleated muscle fibers. In most instances, the contractile protein genes are regulated transcriptionally, which suggests that a common molecular mechanism may coordinate the expression of this diverse and evolutionarily unrelated gene set. Recent studies have examined the muscle-specific cis-acting elements associated with numerous contractile protein genes. All of the identified regulatory elements are positioned in the 5'-flanking regions, usually within 1,500 base pairs of the transcription start site. Surprisingly, a DNA consensus sequence that is common to each contractile protein genemore » has not been identified. In contrast to the results of these earlier studies, the authors have found that the 5'-flanking region of the quail troponin I (TnI) gene is not sufficient to permit the normal myofiber transcriptional activation of the gene. Instead, the TnI gene utilizes a unique internal regulatory element that is responsible for the correct myofiber-specific expression pattern associated with the TnI gene. This is the first example in which a contractile protein gene has been shown to rely primarily on an internal regulatory element to elicit transcriptional activation during myogenesis. The diversity of regulatory elements associated with the contractile protein genes suggests that the temporal expression of the genes may involve individual cis-trans regulatory components specific for each gene.« less
Gabory, Anne; Ferry, Laure; Fajardy, Isabelle; Jouneau, Luc; Gothié, Jean-David; Vigé, Alexandre; Fleur, Cécile; Mayeur, Sylvain; Gallou-Kabani, Catherine; Gross, Marie-Sylvie; Attig, Linda; Vambergue, Anne; Lesage, Jean; Reusens, Brigitte; Vieau, Didier; Remacle, Claude; Jais, Jean-Philippe; Junien, Claudine
2012-01-01
Males and females responses to gestational overnutrition set the stage for subsequent sex-specific differences in adult onset non communicable diseases. Placenta, as a widely recognized programming agent, contibutes to the underlying processes. According to our previous findings, a high-fat diet during gestation triggers sex-specific epigenetic alterations within CpG and throughout the genome, together with the deregulation of clusters of imprinted genes. We further investigated the impact of diet and sex on placental histology, transcriptomic and epigenetic signatures in mice. Both basal gene expression and response to maternal high-fat diet were sexually dimorphic in whole placentas. Numerous genes showed sexually dimorphic expression, but only 11 genes regardless of the diet. In line with the key role of genes belonging to the sex chromosomes, 3 of these genes were Y-specific and 3 were X-specific. Amongst all the genes that were differentially expressed under a high-fat diet, only 16 genes were consistently affected in both males and females. The differences were not only quantitative but remarkably qualitative. The biological functions and networks of genes dysregulated differed markedly between the sexes. Seven genes of the epigenetic machinery were dysregulated, due to effects of diet, sex or both, including the Y- and X-linked histone demethylase paralogues Kdm5c and Kdm5d, which could mark differently male and female epigenomes. The DNA methyltransferase cofactor Dnmt3l gene expression was affected, reminiscent of our previous observation of changes in global DNA methylation. Overall, this striking sexual dimorphism of programming trajectories impose a considerable revision of the current dietary interventions protocols. PMID:23144842
Verhagen, Lilly M; Zomer, Aldert; Maes, Mailis; Villalba, Julian A; Del Nogal, Berenice; Eleveld, Marc; van Hijum, Sacha Aft; de Waard, Jacobus H; Hermans, Peter Wm
2013-02-01
Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.
2013-01-01
Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts. PMID:23375113
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir; Emmert-Streib, Frank
2016-01-01
Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
Dimerization of a Viral SET Protein Endows its Function
DOE Office of Scientific and Technical Information (OSTI.GOV)
H Wei; M Zhou
Histone modifications are regarded as the most indispensible phenomena in epigenetics. Of these modifications, lysine methylation is of the greatest complexity and importance as site- and state-specific lysine methylation exerts a plethora of effects on chromatin structure and gene transcription. Notably, paramecium bursaria chlorella viruses encode a conserved SET domain methyltransferase, termed vSET, that functions to suppress host transcription by methylating histone H3 at lysine 27 (H3K27), a mark for eukaryotic gene silencing. Unlike mammalian lysine methyltransferases (KMTs), vSET functions only as a dimer, but the underlying mechanism has remained elusive. In this study, we demonstrate that dimeric vSET operatesmore » with negative cooperativity between the two active sites and engages in H3K27 methylation one site at a time. New atomic structures of vSET in the free form and a ternary complex with S-adenosyl homocysteine and a histone H3 peptide and biochemical analyses reveal the molecular origin for the negative cooperativity and explain the substrate specificity of H3K27 methyltransferases. Our study suggests a 'walking' mechanism, by which vSET acts all by itself to globally methylate host H3K27, which is accomplished by the mammalian EZH2 KMT only in the context of the Polycomb repressive complex.« less
Biedler, James K.; Qi, Yumin; Pledger, David; Macias, Vanessa M.; James, Anthony A.; Tu, Zhijian
2014-01-01
Anopheles stephensi is a principal vector of urban malaria on the Indian subcontinent and an emerging model for molecular and genetic studies of mosquito biology. To enhance our understanding of female mosquito reproduction, and to develop new tools for basic research and for genetic strategies to control mosquito-borne infectious diseases, we identified 79 genes that displayed previtellogenic germline-specific expression based on RNA-Seq data generated from 11 life stage–specific and sex-specific samples. Analysis of this gene set provided insights into the biology and evolution of female reproduction. Promoters from two of these candidates, vitellogenin receptor and nanos, were used in independent transgenic cassettes for the expression of artificial microRNAs against suspected mosquito maternal-effect genes, discontinuous actin hexagon and myd88. We show these promoters have early germline-specific expression and demonstrate 73% and 42% knockdown of myd88 and discontinuous actin hexagon mRNA in ovaries 48 hr after blood meal, respectively. Additionally, we demonstrate maternal-specific delivery of mRNA and protein to progeny embryos. We discuss the application of this system of maternal delivery of mRNA/miRNA/protein in research on mosquito reproduction and embryonic development, and for the development of a gene drive system based on maternal-effect dominant embryonic arrest. PMID:25480960
Nomoto, R; Kagawa, H; Yoshida, T
2008-01-01
To investigate the difference between Lancefield group C Streptococcus dysgalactiae (GCSD) strains isolated from diseased fish and animals by sequencing and phylogenetic analysis of the sodA gene. The sodA gene of Strep. dysgalactiae strains isolated from fish and animals were amplified and its nucleotide sequences were determined. Although 100% sequence identity was observed among fish GCSD strains, the determined sequences from animal isolates showed variations against fish isolate sequences. Thus, all fish GCSD strains were clearly separated from the GCSD strains of other origin by using phylogenetic tree analysis. In addition, the original primer set was designed based on the determined sequences for specifically amplify the sodA gene of fish GCSD strains. The primer set yield amplification products from only fish GCSD strains. By sequencing analysis of the sodA gene, the genetic divergence between Strep. dysgalactiae strains isolated from fish and mammals was demonstrated. Moreover, an original oligonucletide primer set, which could simply detect the genotype of fish GCSD strains was designed. This study shows that Strep. dysgalactiae isolated from diseased fish could be distinguished from conventional GCSD strains by the difference in the sequence of the sodA gene.
Luo, Yushuang; Kou, Xiaoxiao; Ding, Xuezhi; Hu, Shengbiao; Tang, Ying; Li, Wenping; Huang, Fan; Yang, Qi; Chen, Hanna; Xia, Liqiu
2012-02-01
To promote spinosad biosynthesis by improving the limited oxygen supply during high-density fermentation of Saccharopolyspora spinosa, the open reading frame of the Vitreoscilla hemoglobin gene was placed under the control of the promoter for the erythromycin resistance gene by splicing using overlapping extension PCR. This was cloned into the integrating vector pSET152, yielding the Vitreoscilla hemoglobin gene expression plasmid pSET152EVHB. This was then introduced into S. spinosa SP06081 by conjugal transfer, and integrated into the chromosome by site-specific recombination at the integration site ΦC31 on pSET152EVHB. The resultant conjugant, S. spinosa S078-1101, was genetically stable. The integration was further confirmed by PCR and Southern blotting analysis. A carbon monoxide differential spectrum assay showed that active Vitreoscilla hemoglobin was successfully expressed in S. spinosa S078-1101. Fermentation results revealed that expression of the Vitreoscilla hemoglobin gene significantly promoted spinosad biosynthesis under normal oxygen and moderately oxygen-limiting conditions (P<0.01). These findings demonstrate that integrating expression of the Vitreoscilla hemoglobin gene improves oxygen uptake and is an effective means for the genetic improvement of S. spinosa fermentation.
Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R
2005-09-01
We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
Tumor-stroma interactions a trademark for metastasis.
Morales, Monica; Planet, Evarist; Arnal-Estape, Anna; Pavlovic, Milica; Tarragona, Maria; Gomis, Roger R
2011-10-01
We aimed to unravel genes that are significantly associated with metastasis in order to identify functions that support disseminated disease. We identify genes associated with metastasis and verify its clinical correlations using publicly available primary tumor expression profile data sets. We used facilities in R and Bioconductor (GSEA). Specific data structures and functions were imported. Our results show that genes associated with metastasis in primary tumor enriched for pathways associated with immune infiltration or cytokine-cytokine receptor interaction. As an example, we focus on the enrichment of TGFBR2 and TGF|X A set of communication tools capital for tumor-stroma interactions that define metastasis to the lung and support bone colonization. We showed that tumor-stroma communication through cytokine-cytokine receptor interaction pathway is selected in primary tumors with high risk of relapse. High levels of these factors support systemic instigation of the far metastatic nest as well as local metastatic-specific functions that provide solid ground for metastatic development. Copyright © 2011 Elsevier Ltd. All rights reserved.
Master, Adam; Wójcicka, Anna; Giżewska, Kamilla; Popławski, Piotr; Williams, Graham R.; Nauman, Alicja
2016-01-01
Background Translational control is a mechanism of protein synthesis regulation emerging as an important target for new therapeutics. Naturally occurring microRNAs and synthetic small inhibitory RNAs (siRNAs) are the most recognized regulatory molecules acting via RNA interference. Surprisingly, recent studies have shown that interfering RNAs may also activate gene transcription via the newly discovered phenomenon of small RNA-induced gene activation (RNAa). Thus far, the small activating RNAs (saRNAs) have only been demonstrated as promoter-specific transcriptional activators. Findings We demonstrate that oligonucleotide-based trans-acting factors can also specifically enhance gene expression at the level of protein translation by acting at sequence-specific targets within the messenger RNA 5’-untranslated region (5’UTR). We designed a set of short synthetic oligonucleotides (dGoligos), specifically targeting alternatively spliced 5’UTRs in transcripts expressed from the THRB and CDKN2A suppressor genes. The in vitro translation efficiency of reporter constructs containing alternative TRβ1 5’UTRs was increased by up to more than 55-fold following exposure to specific dGoligos. Moreover, we found that the most folded 5’UTR has higher translational regulatory potential when compared to the weakly folded TRβ1 variant. This suggests such a strategy may be especially applied to enhance translation from relatively inactive transcripts containing long 5’UTRs of complex structure. Significance This report represents the first method for gene-specific translation enhancement using selective trans-acting factors designed to target specific 5’UTR cis-acting elements. This simple strategy may be developed further to complement other available methods for gene expression regulation including gene silencing. The dGoligo-mediated translation-enhancing approach has the potential to be transferred to increase the translation efficiency of any suitable target gene and may have future application in gene therapy strategies to enhance expression of proteins including tumor suppressors. PMID:27171412
Co-expression networks reveal the tissue-specific regulation of transcription and splicing
Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D.H.; Jo, Brian; Gao, Chuan; McDowell, Ian C.; Engelhardt, Barbara E.
2017-01-01
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues. PMID:29021288
nGASP - the nematode genome annotation assessment project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coghlan, A; Fiedler, T J; McKay, S J
2008-12-19
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner'more » algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.« less
Properties of genes essential for mouse development
Kabir, Mitra; Barradas, Ana; Tzotzos, George T.; Hentges, Kathryn E.
2017-01-01
Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development. PMID:28562614
Polonikov, Alexey V.; Ivanov, Vladimir P.; Bogomazov, Alexey D.; Freidin, Maxim B.; Illig, Thomas; Solodilova, Maria A.
2014-01-01
Oxidative stress resulting from an increased amount of reactive oxygen species and an imbalance between oxidants and antioxidants plays an important role in the pathogenesis of asthma. The present study tested the hypothesis that genetic susceptibility to allergic and nonallergic variants of asthma is determined by complex interactions between genes encoding antioxidant defense enzymes (ADE). We carried out a comprehensive analysis of the associations between adult asthma and 46 single nucleotide polymorphisms of 34 ADE genes and 12 other candidate genes of asthma in Russian population using set association analysis and multifactor dimensionality reduction approaches. We found for the first time epistatic interactions between ADE genes underlying asthma susceptibility and the genetic heterogeneity between allergic and nonallergic variants of the disease. We identified GSR (glutathione reductase) and PON2 (paraoxonase 2) as novel candidate genes for asthma susceptibility. We observed gender-specific effects of ADE genes on the risk of asthma. The results of the study demonstrate complexity and diversity of interactions between genes involved in oxidative stress underlying susceptibility to allergic and nonallergic asthma. PMID:24895604
Differential Sensitivity of Target Genes to Translational Repression by miR-17~92
Jin, Hyun Yong; Oda, Hiroyo; Chen, Pengda; Kang, Seung Goo; Valentine, Elizabeth; Liao, Lujian; Zhang, Yaoyang; Gonzalez-Martin, Alicia; Shepherd, Jovan; Head, Steven R.; Kim, Pyeung-Hyeun; Fu, Guo; Liu, Wen-Hsien; Han, Jiahuai
2017-01-01
MicroRNAs (miRNAs) are thought to exert their functions by modulating the expression of hundreds of target genes and each to a small degree, but it remains unclear how small changes in hundreds of target genes are translated into the specific function of a miRNA. Here, we conducted an integrated analysis of transcriptome and translatome of primary B cells from mutant mice expressing miR-17~92 at three different levels to address this issue. We found that target genes exhibit differential sensitivity to miRNA suppression and that only a small fraction of target genes are actually suppressed by a given concentration of miRNA under physiological conditions. Transgenic expression and deletion of the same miRNA gene regulate largely distinct sets of target genes. miR-17~92 controls target gene expression mainly through translational repression and 5’UTR plays an important role in regulating target gene sensitivity to miRNA suppression. These findings provide molecular insights into a model in which miRNAs exert their specific functions through a small number of key target genes. PMID:28241004
Chiapello, Hélène; Mallet, Ludovic; Guérin, Cyprien; Aguileta, Gabriela; Amselem, Joëlle; Kroj, Thomas; Ortega-Abboud, Enrique; Lebrun, Marc-Henri; Henrissat, Bernard; Gendrault, Annie; Rodolphe, François; Tharreau, Didier; Fournier, Elisabeth
2015-01-01
Deciphering the genetic bases of pathogen adaptation to its host is a key question in ecology and evolution. To understand how the fungus Magnaporthe oryzae adapts to different plants, we sequenced eight M. oryzae isolates differing in host specificity (rice, foxtail millet, wheat, and goosegrass), and one Magnaporthe grisea isolate specific of crabgrass. Analysis of Magnaporthe genomes revealed small variation in genome sizes (39–43 Mb) and gene content (12,283–14,781 genes) between isolates. The whole set of Magnaporthe genes comprised 14,966 shared families, 63% of which included genes present in all the nine M. oryzae genomes. The evolutionary relationships among Magnaporthe isolates were inferred using 6,878 single-copy orthologs. The resulting genealogy was mostly bifurcating among the different host-specific lineages, but was reticulate inside the rice lineage. We detected traces of introgression from a nonrice genome in the rice reference 70-15 genome. Among M. oryzae isolates and host-specific lineages, the genome composition in terms of frequencies of genes putatively involved in pathogenicity (effectors, secondary metabolism, cazome) was conserved. However, 529 shared families were found only in nonrice lineages, whereas the rice lineage possessed 86 specific families absent from the nonrice genomes. Our results confirmed that the host specificity of M. oryzae isolates was associated with a divergence between lineages without major gene flow and that, despite the strong conservation of gene families between lineages, adaptation to different hosts, especially to rice, was associated with the presence of a small number of specific gene families. All information was gathered in a public database (http://genome.jouy.inra.fr/gemo). PMID:26454013
Song, Yuepeng; Ma, Kaifeng; Ci, Dong; Chen, Qingqing; Tian, Jiaxing; Zhang, Deqiang
2013-12-01
Dioecious plants have evolved sex-specific floral development mechanisms. However, the precise gene expression patterns in dioecious plant flower development remain unclear. Here, we used andromonoecious poplar, an exceptional model system, to eliminate the confounding effects of genetic background of dioecious plants. Comparative transcriptome and physiological analysis allowed us to characterize sex-specific development of female and male flowers. Transcriptome analysis identified genes significantly differentially expressed between the sexes, including genes related to floral development, phytohormone synthesis and metabolism, and DNA methylation. Correlation analysis revealed a significant correlation between phytohormone signaling and gene expression, identifying specific phytohormone-responsive genes and their cis-regulatory elements. Two genes related to DNA methylation, METHYLTRANSFERASE1 (MET1) and DECREASED DNA METHYLATION 1 (DDM1), which are located in the sex determination region of Chromosome XIX, have differential expression between female and male flowers. A time-course analysis revealed that MET1 and DDM1 expression may produce different DNA methylation levels in female and male flowers. Understanding the interactions of phytohormone signaling, DNA methylation and target gene expression should lead to a better understanding of sexual differences in floral development. Thus, this study identifies a set of candidate genes for further studies of poplar sexual dimorphism and relates sex-specific floral development to physiological and epigenetic changes.
Identifying Candidate Reprogramming Genes in Mouse Induced Pluripotent Stem Cells.
Gao, Fang; Li, Jingyu; Zhang, Heng; Yang, Xu; An, Tiezhu
2017-08-01
Factor-based induced reprogramming approaches have tremendous potential for human regenerative medicine, but the efficiencies of these approaches are still low. In this study, we analyzed the global transcriptional profiles of mouse induced pluripotent stem cells (miPSCs) and mouse embryonic stem cells (mESCs) from seven different labs and present here the first successful clustering according to cell type, not by lab of origin. We identified 2131 different expression genes (DEs) as candidate pluripotency-associated genes by comparing mESCs/miPSCs with somatic cells and 720 DEs between miPSCs and mESCs. Interestingly, there was a significant overlap between the two DE sets. Therefore, we defined the overlap DEs as "consensus DEs" including 313 miPSC-specific genes expressed at a higher level in miPSCs versus mESCs and 184 mESC-specific genes in total and reasoned that these may contribute to the differences in pluripotency between mESCs and miPSCs. A classification of "consensus DEs" according to their different expression levels between somatic cells and mESCs/miPSCs shows that 86% of the miPSC-specific genes are more highly expressed in somatic cells, while 73% of mESC-specific genes are highly expressed in mESCs/miPSCs, indicating that the miPSCs have not efficiently silenced the expression pattern of the somatic cells from which they are derived and failed to completely induce the genes with high expression levels in mESCs. We further revealed a strong correlation between oocyte-enriched factors and insufficiently induced mESC-specific genes and identified 11 hub genes via network analysis. In light of these findings, we postulated that these key hub genes might not only drive somatic cell nuclear transfer (SCNT) reprogramming but also augment the efficiency and quality of miPSC reprogramming.
Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans
Moretti, S.; Davydov, I.I.; Excoffier, L.
2017-01-01
Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345
Molecular profiles to biology and pathways: a systems biology approach.
Van Laere, Steven; Dirix, Luc; Vermeulen, Peter
2016-06-16
Interpreting molecular profiles in a biological context requires specialized analysis strategies. Initially, lists of relevant genes were screened to identify enriched concepts associated with pathways or specific molecular processes. However, the shortcoming of interpreting gene lists by using predefined sets of genes has resulted in the development of novel methods that heavily rely on network-based concepts. These algorithms have the advantage that they allow a more holistic view of the signaling properties of the condition under study as well as that they are suitable for integrating different data types like gene expression, gene mutation, and even histological parameters.
Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou
2017-01-01
Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259
Reliable pre-eclampsia pathways based on multiple independent microarray data sets.
Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo
2015-02-01
Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre-eclampsia. © The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
2012-01-01
Background Oxidative Stress contributes to the pathogenesis of many diseases. The NRF2/KEAP1 axis is a key transcriptional regulator of the anti-oxidant response in cells. Nrf2 knockout mice have implicated this pathway in regulating inflammatory airway diseases such as asthma and COPD. To better understand the role the NRF2 pathway has on respiratory disease we have taken a novel approach to define NRF2 dependent gene expression in a relevant lung system. Methods Normal human lung fibroblasts were transfected with siRNA specific for NRF2 or KEAP1. Gene expression changes were measured at 30 and 48 hours using a custom Affymetrix Gene array. Changes in Eotaxin-1 gene expression and protein secretion were further measured under various inflammatory conditions with siRNAs and pharmacological tools. Results An anti-correlated gene set (inversely regulated by NRF2 and KEAP1 RNAi) that reflects specific NRF2 regulated genes was identified. Gene annotations show that NRF2-mediated oxidative stress response is the most significantly regulated pathway, followed by heme metabolism, metabolism of xenobiotics by Cytochrome P450 and O-glycan biosynthesis. Unexpectedly the key eosinophil chemokine Eotaxin-1/CCL11 was found to be up-regulated when NRF2 was inhibited and down-regulated when KEAP1 was inhibited. This transcriptional regulation leads to modulation of Eotaxin-1 secretion from human lung fibroblasts under basal and inflammatory conditions, and is specific to Eotaxin-1 as NRF2 or KEAP1 knockdown had no effect on the secretion of a set of other chemokines and cytokines. Furthermore, the known NRF2 small molecule activators CDDO and Sulphoraphane can also dose dependently inhibit Eotaxin-1 release from human lung fibroblasts. Conclusions These data uncover a previously unknown role for NRF2 in regulating Eotaxin-1 expression and further the mechanistic understanding of this pathway in modulating inflammatory lung disease. PMID:23061798
Fourtounis, Jimmy; Wang, I-Ming; Mathieu, Marie-Claude; Claveau, David; Loo, Tenneille; Jackson, Aimee L; Peters, Mette A; Therien, Alex G; Boie, Yves; Crackower, Michael A
2012-10-12
Oxidative Stress contributes to the pathogenesis of many diseases. The NRF2/KEAP1 axis is a key transcriptional regulator of the anti-oxidant response in cells. Nrf2 knockout mice have implicated this pathway in regulating inflammatory airway diseases such as asthma and COPD. To better understand the role the NRF2 pathway has on respiratory disease we have taken a novel approach to define NRF2 dependent gene expression in a relevant lung system. Normal human lung fibroblasts were transfected with siRNA specific for NRF2 or KEAP1. Gene expression changes were measured at 30 and 48 hours using a custom Affymetrix Gene array. Changes in Eotaxin-1 gene expression and protein secretion were further measured under various inflammatory conditions with siRNAs and pharmacological tools. An anti-correlated gene set (inversely regulated by NRF2 and KEAP1 RNAi) that reflects specific NRF2 regulated genes was identified. Gene annotations show that NRF2-mediated oxidative stress response is the most significantly regulated pathway, followed by heme metabolism, metabolism of xenobiotics by Cytochrome P450 and O-glycan biosynthesis. Unexpectedly the key eosinophil chemokine Eotaxin-1/CCL11 was found to be up-regulated when NRF2 was inhibited and down-regulated when KEAP1 was inhibited. This transcriptional regulation leads to modulation of Eotaxin-1 secretion from human lung fibroblasts under basal and inflammatory conditions, and is specific to Eotaxin-1 as NRF2 or KEAP1 knockdown had no effect on the secretion of a set of other chemokines and cytokines. Furthermore, the known NRF2 small molecule activators CDDO and Sulphoraphane can also dose dependently inhibit Eotaxin-1 release from human lung fibroblasts. These data uncover a previously unknown role for NRF2 in regulating Eotaxin-1 expression and further the mechanistic understanding of this pathway in modulating inflammatory lung disease.
Transcriptional alterations in the left ventricle of three hypertensive rat models.
Cerutti, Catherine; Kurdi, Mazen; Bricca, Giampiero; Hodroj, Wassim; Paultre, Christian; Randon, Jacques; Gustin, Marie-Paule
2006-11-27
Left ventricular hypertrophy (LVH) is commonly associated with hypertension and represents an independent cardiovascular risk factor. The aim of this study was to test the hypothesis that the cardiac overload related to hypertension is associated to a specific gene expression pattern independently of genetic background. Gene expression levels were obtained with microarrays for 15,866 transcripts from RNA of left ventricles from 12-wk-old rats of three hypertensive models [spontaneously hypertensive rat (SHR), Lyon hypertensive rat (LH), and heterozygous TGR(mRen2)27 rat] and their respective controls. More than 60% of the detected transcripts displayed significant changes between the three groups of normotensive rats, showing large interstrain variability. Expression data were analyzed with respect to hypertension, LVH, and chromosomal distribution. Only four genes had significantly modified expression in the three hypertensive models among which a single gene, coding for sialyltransferase 7A, was consistently overexpressed. Correlation analysis between expression data and left ventricular mass index (LVMI) over all rats identified a larger set of genes whose expression was continuously related with LVMI, including known genes associated with cardiac remodeling. Positioning the detected transcripts along the chromosomes pointed out high-density regions mostly located within blood pressure and cardiac mass quantitative trait loci. Although our study could not detect a unique reprogramming of cardiac cells involving specific genes at early stage of LVH, it allowed the identification of some genes associated with LVH regardless of genetic background. This study thus provides a set of potentially important genes contained within restricted chromosomal regions involved in cardiovascular diseases.
Willsey, A. Jeremy; Sanders, Stephan J.; Li, Mingfeng; Dong, Shan; Tebbenkamp, Andrew T.; Muhle, Rebecca A.; Reilly, Steven K.; Lin, Leon; Fertuzinhos, Sofia; Miller, Jeremy A.; Murtha, Michael T.; Bichsel, Candace; Niu, Wei; Cotney, Justin; Ercan-Sencicek, A. Gulhan; Gockley, Jake; Gupta, Abha; Han, Wenqi; He, Xin; Hoffman, Ellen; Klei, Lambertus; Lei, Jing; Liu, Wenzhong; Liu, Li; Lu, Cong; Xu, Xuming; Zhu, Ying; Mane, Shrikant M.; Lein, Edward S.; Wei, Liping; Noonan, James P.; Roeder, Kathryn; Devlin, Bernie; Šestan, Nenad; State, Matthew W.
2013-01-01
SUMMARY Autism spectrum disorder (ASD) is a complex developmental syndrome of unknown etiology. Recent studies employing exome- and genome-wide sequencing have identified nine high-confidence ASD (hcASD) genes. Working from the hypothesis that ASD-associated mutations in these biologically pleiotropic genes will disrupt intersecting developmental processes to contribute to a common phenotype, we have attempted to identify time periods, brain regions, and cell types in which these genes converge. We have constructed coexpression networks based on the hcASD “seed” genes, leveraging a rich expression data set encompassing multiple human brain regions across human development and into adulthood. By assessing enrichment of an independent set of probable ASD (pASD) genes, derived from the same sequencing studies, we demonstrate a key point of convergence in midfetal layer 5/6 cortical projection neurons. This approach informs when, where, and in what cell types mutations in these specific genes may be productively studied to clarify ASD pathophysiology. PMID:24267886
A multiplex branched DNA assay for parallel quantitative gene expression profiling.
Flagella, Michael; Bui, Son; Zheng, Zhi; Nguyen, Cung Tuong; Zhang, Aiguo; Pastor, Larry; Ma, Yunqing; Yang, Wen; Crawford, Kimberly L; McMaster, Gary K; Witney, Frank; Luo, Yuling
2006-05-01
We describe a novel method to quantitatively measure messenger RNA (mRNA) expression of multiple genes directly from crude cell lysates and tissue homogenates without the need for RNA purification or target amplification. The multiplex branched DNA (bDNA) assay adapts the bDNA technology to the Luminex fluorescent bead-based platform through the use of cooperative hybridization, which ensures an exceptionally high degree of assay specificity. Using in vitro transcribed RNA as reference standards, we demonstrated that the assay is highly specific, with cross-reactivity less than 0.2%. We also determined that the assay detection sensitivity is 25,000 RNA transcripts with intra- and interplate coefficients of variance of less than 10% and less than 15%, respectively. Using three 10-gene panels designed to measure proinflammatory and apoptosis responses, we demonstrated sensitive and specific multiplex gene expression profiling directly from cell lysates. The gene expression change data demonstrate a high correlation coefficient (R(2)=0.94) compared with measurements obtained using the single-plex bDNA assay. Thus, the multiplex bDNA assay provides a powerful means to quantify the gene expression profile of a defined set of target genes in large sample populations.
A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells.
Ly, Tony; Ahmad, Yasmeen; Shlien, Adam; Soroka, Dominique; Mills, Allie; Emanuele, Michael J; Stratton, Michael R; Lamond, Angus I
2014-01-01
Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd/). DOI: http://dx.doi.org/10.7554/eLife.01630.001.
Flores-Herrera, Patricio; Arredondo-Zelada, Oscar; Marshall, Sergio H; Gómez, Fernando A
2018-06-01
Piscirickettsia salmonis is a highly aggressive facultative intracellular bacterium that challenges the sustainability of Chilean salmon production. Due to the limited knowledge of its biology, there is a need to identify key molecular markers that could help define the pathogenic potential of this bacterium. We think a model system should be implemented that efficiently evaluates the expression of putative bacterial markers by using validated, stable, and highly specific housekeeping genes to properly select target genes, which could lead to identifying those responsible for infection and disease induction in naturally infected fish. Here, we selected a set of validated reference or housekeeping genes for RT-qPCR expression analyses of P. salmonis under different growth and stress conditions, including an in vitro infection kinetic. After a thorough screening, we selected sdhA as the most reliable housekeeping gene able to represent stable and highly specific host reference genes for RT-qPCR-driven P. salmonis analysis. Copyright © 2018. Published by Elsevier B.V.
A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells
Ly, Tony; Ahmad, Yasmeen; Shlien, Adam; Soroka, Dominique; Mills, Allie; Emanuele, Michael J; Stratton, Michael R; Lamond, Angus I
2014-01-01
Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd/). DOI: http://dx.doi.org/10.7554/eLife.01630.001 PMID:24596151
Chemidlin Prévost-Bouré, Nicolas; Christen, Richard; Dequiedt, Samuel; Mougel, Christophe; Lelièvre, Mélanie; Jolivet, Claudy; Shahbazkia, Hamid Reza; Guillou, Laure; Arrouays, Dominique; Ranjard, Lionel
2011-01-01
Fungi constitute an important group in soil biological diversity and functioning. However, characterization and knowledge of fungal communities is hampered because few primer sets are available to quantify fungal abundance by real-time quantitative PCR (real-time Q-PCR). The aim in this study was to quantify fungal abundance in soils by incorporating, into a real-time Q-PCR using the SYBRGreen® method, a primer set already used to study the genetic structure of soil fungal communities. To satisfy the real-time Q-PCR requirements to enhance the accuracy and reproducibility of the detection technique, this study focused on the 18S rRNA gene conserved regions. These regions are little affected by length polymorphism and may provide sufficiently small targets, a crucial criterion for enhancing accuracy and reproducibility of the detection technique. An in silico analysis of 33 primer sets targeting the 18S rRNA gene was performed to select the primer set with the best potential for real-time Q-PCR: short amplicon length; good fungal specificity and coverage. The best consensus between specificity, coverage and amplicon length among the 33 sets tested was the primer set FR1 / FF390. This in silico analysis of the specificity of FR1 / FF390 also provided additional information to the previously published analysis on this primer set. The specificity of the primer set FR1 / FF390 for Fungi was validated in vitro by cloning - sequencing the amplicons obtained from a real time Q-PCR assay performed on five independent soil samples. This assay was also used to evaluate the sensitivity and reproducibility of the method. Finally, fungal abundance in samples from 24 soils with contrasting physico-chemical and environmental characteristics was examined and ranked to determine the importance of soil texture, organic carbon content, C∶N ratio and land use in determining fungal abundance in soils. PMID:21931659
The genome sequence of taurine cattle: a window to ruminant biology and evolution.
Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi
2009-04-24
To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J
2008-01-01
ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Tyler, S D; Johnson, W M; Lior, H; Wang, G; Rozee, K R
1991-01-01
A set of synthetic oligonucleotide primers was designed for use in a polymerase chain reaction protocol to specifically detect the B subunit genes in vtx2ha and vtx2hb, which code for the production of the VT2 (Shiga-like toxin II) variant cytotoxins VT2v-a and VT2v-b, respectively. An additional set of primers amplified a fragment common to the B subunits of the VT2 and the VT2 variant genes. Subsequent restriction endonuclease digestion of this amplicon permitted prediction of specific VT2 and variant genotypes on the basis of predetermined restriction fragment length polymorphisms. Genotypes of 21 VT2-producing strains of Escherichia coli were determined using this polymerase chain reaction-restriction fragment length polymorphism procedure. Four strains contained B subunit target sequences only for VT2 genes, 9 strains contained sequences only for VT2v-a genes, and 3 strains contained sequences only for VT2v-b. For genes in combination, one strain contained B subunit genes for both VT2 and VT2v-a and two strains contained B subunit genes for VT2 and VT2v-b. Two strains of E. coli O91:H21 contained both VT2v-a and VT2v-b B subunit genes. The VT2 reference strain of E. coli, E32511, was found to contain the targeted sequences from both VT2 and VT2v-a genes, whereas the recombinant E. coli, pEB1, possessed only that of the VT2 gene. The specific activities of extracellular VT2 determined in HeLa cells ranged from 0.3 to 41.7 TCD50 per microgram of protein in strains carrying the VT2 gene target and from 0 to 50.0 TCD50 per microgram of protein in strains carrying only the VT2 variant target (TCD50 is the tissue culture dose by which 50% of the cells were affected), suggesting that phenotypic expression does not correlate with genotype. Images PMID:1679436
Genome-wide analysis of starch metabolism genes in potato (Solanum tuberosum L.).
Van Harsselaar, Jessica K; Lorenz, Julia; Senning, Melanie; Sonnewald, Uwe; Sonnewald, Sophia
2017-01-05
Starch is the principle constituent of potato tubers and is of considerable importance for food and non-food applications. Its metabolism has been subject of extensive research over the past decades. Despite its importance, a description of the complete inventory of genes involved in starch metabolism and their genome organization in potato plants is still missing. Moreover, mechanisms regulating the expression of starch genes in leaves and tubers remain elusive with regard to differences between transitory and storage starch metabolism, respectively. This study aimed at identifying and mapping the complete set of potato starch genes, and to study their expression pattern in leaves and tubers using different sets of transcriptome data. Moreover, we wanted to uncover transcription factors co-regulated with starch accumulation in tubers in order to get insight into the regulation of starch metabolism. We identified 77 genomic loci encoding enzymes involved in starch metabolism. Novel isoforms of many enzymes were found. Their analysis will help to elucidate mechanisms of starch biosynthesis and degradation. Expression analysis of starch genes led to the identification of tissue-specific isoenzymes suggesting differences in the transcriptional regulation of starch metabolism between potato leaf and tuber tissues. Selection of genes predominantly expressed in developing potato tubers and exhibiting an expression pattern indicative for a role in starch biosynthesis enabled the identification of possible transcriptional regulators of tuber starch biosynthesis by co-expression analysis. This study provides the annotation of the complete set of starch metabolic genes in potato plants and their genomic localizations. Novel, so far undescribed, enzyme isoforms were revealed. Comparative transcriptome analysis enabled the identification of tuber- and leaf-specific isoforms of starch genes. This finding suggests distinct regulatory mechanisms in transitory and storage starch metabolism. Putative regulatory proteins of starch biosynthesis in potato tubers have been identified by co-expression and their expression was verified by quantitative RT-PCR.
CEM-designer: design of custom expression microarrays in the post-ENCODE Era.
Arnold, Christian; Externbrink, Fabian; Hackermüller, Jörg; Reiche, Kristin
2014-11-10
Microarrays are widely used in gene expression studies, and custom expression microarrays are popular to monitor expression changes of a customer-defined set of genes. However, the complexity of transcriptomes uncovered recently make custom expression microarray design a non-trivial task. Pervasive transcription and alternative processing of transcripts generate a wealth of interweaved transcripts that requires well-considered probe design strategies and is largely neglected in existing approaches. We developed the web server CEM-Designer that facilitates microarray platform independent design of custom expression microarrays for complex transcriptomes. CEM-Designer covers (i) the collection and generation of a set of unique target sequences from different sources and (ii) the selection of a set of sensitive and specific probes that optimally represents the target sequences. Probe design itself is left to third party software to ensure that probes meet provider-specific constraints. CEM-Designer is available at http://designpipeline.bioinf.uni-leipzig.de. Copyright © 2014 Elsevier B.V. All rights reserved.
A Comprehensive Analysis of Nuclear-Encoded Mitochondrial Genes in Schizophrenia.
Gonçalves, Vanessa F; Cappi, Carolina; Hagen, Christian M; Sequeira, Adolfo; Vawter, Marquis P; Derkach, Andriy; Zai, Clement C; Hedley, Paula L; Bybjerg-Grauholm, Jonas; Pouget, Jennie G; Cuperfain, Ari B; Sullivan, Patrick F; Christiansen, Michael; Kennedy, James L; Sun, Lei
2018-05-01
The genetic risk factors of schizophrenia (SCZ), a severe psychiatric disorder, are not yet fully understood. Multiple lines of evidence suggest that mitochondrial dysfunction may play a role in SCZ, but comprehensive association studies are lacking. We hypothesized that variants in nuclear-encoded mitochondrial genes influence susceptibility to SCZ. We conducted gene-based and gene-set analyses using summary association results from the Psychiatric Genomics Consortium Schizophrenia Phase 2 (PGC-SCZ2) genome-wide association study comprising 35,476 cases and 46,839 control subjects. We applied the MAGMA method to three sets of nuclear-encoded mitochondrial genes: oxidative phosphorylation genes, other nuclear-encoded mitochondrial genes, and genes involved in nucleus-mitochondria crosstalk. Furthermore, we conducted a replication study using the iPSYCH SCZ sample of 2290 cases and 21,621 control subjects. In the PGC-SCZ2 sample, 1186 mitochondrial genes were analyzed, among which 159 had p values < .05 and 19 remained significant after multiple testing correction. A meta-analysis of 818 genes combining the PGC-SCZ2 and iPSYCH samples resulted in 104 nominally significant and nine significant genes, suggesting a polygenic model for the nuclear-encoded mitochondrial genes. Gene-set analysis, however, did not show significant results. In an in silico protein-protein interaction network analysis, 14 mitochondrial genes interacted directly with 158 SCZ risk genes identified in PGC-SCZ2 (permutation p = .02), and aldosterone signaling in epithelial cells and mitochondrial dysfunction pathways appeared to be overrepresented in this network of mitochondrial and SCZ risk genes. This study provides evidence that specific aspects of mitochondrial function may play a role in SCZ, but we did not observe its broad involvement even using a large sample. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
nGASP--the nematode genome annotation assessment project.
Coghlan, Avril; Fiedler, Tristan J; McKay, Sheldon J; Flicek, Paul; Harris, Todd W; Blasiar, Darin; Stein, Lincoln D
2008-12-19
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
Yamashita, S; Nakagawa, H; Sakaguchi, T; Arima, T-H; Kikoku, Y
2018-01-01
Heat-resistant fungi occur sporadically and are a continuing problem for the food and beverage industry. The genus Talaromyces, as a typical fungus, is capable of producing the heat-resistant ascospores responsible for the spoilage of processed food products. Isocitrate lyase, a signature enzyme of the glyoxylate cycle, is required for the metabolism of non-fermentable carbon compounds, like acetate and ethanol. Here, species-specific primer sets for detection and identification of DNA derived from Talaromyces macrosporus and Talaromyces trachyspermus were designed based on the nucleotide sequences of their isocitrate lyase genes. Polymerase chain reaction (PCR) using a species-specific primer set amplified products specific to T. macrosporus and T. trachyspermus. Other fungal species, such as Byssochlamys fulva and Hamigera striata, which cause food spoilage, were not detected using the Talaromyces-specific primer sets. The detection limit for each species-specific primer set was determined as being 50 pg of template DNA, without using a nested PCR method. The specificity of each species-specific primer set was maintained in the presence of 1,000-fold amounts of genomic DNA from other fungi. The method also detected fungal DNA extracted from blueberry inoculated with T. macrosporus. This PCR method provides a quick, simple, powerful and reliable way to detect T. macrosporus and T. trachyspermus. Polymerase chain reaction (PCR)-based detection is rapid, convenient and sensitive compared with traditional methods of detecting heat-resistant fungi. In this study, a PCR-based method was developed for the detection and identification of amplification products from Talaromyces macrosporus and Talaromyces trachyspermus using primer sets that target the isocitrate lyase gene. This method could be used for the on-site detection of T. macrosporus and T. trachyspermus in the near future, and will be helpful in the safety control of raw materials and in food and beverage production. © 2017 The Authors. Letters in Applied Microbiology published by John Wiley & Sons Ltd on behalf of The Society for Applied Microbiology.
Genome-wide analysis of YY2 versus YY1 target genes
Chen, Li; Shioda, Toshi; Coser, Kathryn R.; Lynch, Mary C.; Yang, Chuanwei; Schmidt, Emmett V.
2010-01-01
Yin Yang 1 (YY1) is a critical transcription factor controlling cell proliferation, development and DNA damage responses. Retrotranspositions have independently generated additional YY family members in multiple species. Although Drosophila YY1 [pleiohomeotic (Pho)] and its homolog [pleiohomeotic-like (Phol)] redundantly control homeotic gene expression, the regulatory contributions of YY1-homologs have not yet been examined in other species. Indeed, targets for the mammalian YY1 homolog YY2 are completely unknown. Using gene set enrichment analysis, we found that lentiviral constructs containing short hairpin loop inhibitory RNAs for human YY1 (shYY1) and its homolog YY2 (shYY2) caused significant changes in both shared and distinguishable gene sets in human cells. Ribosomal protein genes were the most significant gene set upregulated by both shYY1 and shYY2, although combined shYY1/2 knock downs were not additive. In contrast, shYY2 reversed the anti-proliferative effects of shYY1, and shYY2 particularly altered UV damage response, platelet-specific and mitochondrial function genes. We found that decreases in YY1 or YY2 caused inverse changes in UV sensitivity, and that their combined loss reversed their respective individual effects. Our studies show that human YY2 is not redundant to YY1, and YY2 is a significant regulator of genes previously identified as uniquely responding to YY1. PMID:20215434
Transcriptional activation of Mina by Sp1/3 factors.
Lian, Shangli; Potula, Hari Hara S K; Pillai, Meenu R; Van Stry, Melanie; Koyanagi, Madoka; Chung, Linda; Watanabe, Makiko; Bix, Mark
2013-01-01
Mina is an epigenetic gene regulatory protein known to function in multiple physiological and pathological contexts, including pulmonary inflammation, cell proliferation, cancer and immunity. We showed previously that the level of Mina gene expression is subject to natural genetic variation linked to 21 SNPs occurring in the Mina 5' region. In order to explore the mechanisms regulating Mina gene expression, we set out to molecularly characterize the Mina promoter in the region encompassing these SNPs. We used three kinds of assays--reporter, gel shift and chromatin immunoprecipitation--to analyze a 2 kb genomic fragment spanning the upstream and intron 1 regions flanking exon 1. Here we discovered a pair of Mina promoters (P1 and P2) and a P1-specific enhancer element (E1). Pharmacologic inhibition and siRNA knockdown experiments suggested that Sp1/3 transcription factors trigger Mina expression through additive activity targeted to a cluster of four Sp1/3 binding sites forming the P1 promoter. These results set the stage for comprehensive analysis of Mina gene regulation from the context of tissue specificity, the impact of inherited genetic variation and the nature of upstream signaling pathways.
Transcriptional Activation of Mina by Sp1/3 Factors
Lian, Shangli; Potula, Hari Hara S. K.; Pillai, Meenu R.; Van Stry, Melanie; Koyanagi, Madoka; Chung, Linda; Watanabe, Makiko; Bix, Mark
2013-01-01
Mina is an epigenetic gene regulatory protein known to function in multiple physiological and pathological contexts, including pulmonary inflammation, cell proliferation, cancer and immunity. We showed previously that the level of Mina gene expression is subject to natural genetic variation linked to 21 SNPs occurring in the Mina 5′ region [1]. In order to explore the mechanisms regulating Mina gene expression, we set out to molecularly characterize the Mina promoter in the region encompassing these SNPs. We used three kinds of assays – reporter, gel shift and chromatin immunoprecipitation – to analyze a 2 kb genomic fragment spanning the upstream and intron 1 regions flanking exon 1. Here we discovered a pair of Mina promoters (P1 and P2) and a P1-specific enhancer element (E1). Pharmacologic inhibition and siRNA knockdown experiments suggested that Sp1/3 transcription factors trigger Mina expression through additive activity targeted to a cluster of four Sp1/3 binding sites forming the P1 promoter. These results set the stage for comprehensive analysis of Mina gene regulation from the context of tissue specificity, the impact of inherited genetic variation and the nature of upstream signaling pathways. PMID:24324617
Bosch, Linda J W; Oort, Frank A; Neerincx, Maarten; Khalid-de Bakker, Carolina A J; Terhaar sive Droste, Jochim S; Melotte, Veerle; Jonkers, Daisy M A E; Masclee, Ad A M; Mongera, Sandra; Grooteclaes, Madeleine; Louwagie, Joost; van Criekinge, Wim; Coupé, Veerle M H; Mulder, Chris J; van Engeland, Manon; Carvalho, Beatriz; Meijer, Gerrit A
2012-03-01
Using a bioinformatics-based strategy, we set out to identify hypermethylated genes that could serve as biomarkers for early detection of colorectal cancer (CRC) in stool. In addition, the complementary value to a Fecal Immunochemical Test (FIT) was evaluated. Candidate genes were selected by applying cluster alignment and computational analysis of promoter regions to microarray-expression data of colorectal adenomas and carcinomas. DNA methylation was measured by quantitative methylation-specific PCR on 34 normal colon mucosa, 71 advanced adenoma, and 64 CRC tissues. The performance as biomarker was tested in whole stool samples from in total 193 subjects, including 19 with advanced adenoma and 66 with CRC. For a large proportion of these series, methylation data for GATA4 and OSMR were available for comparison. The complementary value to FIT was measured in stool subsamples from 92 subjects including 44 with advanced adenoma or CRC. Phosphatase and Actin Regulator 3 (PHACTR3) was identified as a novel hypermethylated gene showing more than 70-fold increased DNA methylation levels in advanced neoplasia compared with normal colon mucosa. In a stool training set, PHACTR3 methylation showed a sensitivity of 55% (95% CI: 33-75) for CRC and a specificity of 95% (95% CI: 87-98). In a stool validation set, sensitivity reached 66% (95% CI: 50-79) for CRC and 32% (95% CI: 14-57) for advanced adenomas at a specificity of 100% (95% CI: 86-100). Adding PHACTR3 methylation to FIT increased sensitivity for CRC up to 15%. PHACTR3 is a new hypermethylated gene in CRC with a good performance in stool DNA testing and has complementary value to FIT.
Czechowski, Tomasz; Stitt, Mark; Altmann, Thomas; Udvardi, Michael K.; Scheible, Wolf-Rüdiger
2005-01-01
Gene transcripts with invariant abundance during development and in the face of environmental stimuli are essential reference points for accurate gene expression analyses, such as RNA gel-blot analysis or quantitative reverse transcription-polymerase chain reaction (PCR). An exceptionally large set of data from Affymetrix ATH1 whole-genome GeneChip studies provided the means to identify a new generation of reference genes with very stable expression levels in the model plant species Arabidopsis (Arabidopsis thaliana). Hundreds of Arabidopsis genes were found that outperform traditional reference genes in terms of expression stability throughout development and under a range of environmental conditions. Most of these were expressed at much lower levels than traditional reference genes, making them very suitable for normalization of gene expression over a wide range of transcript levels. Specific and efficient primers were developed for 22 genes and tested on a diverse set of 20 cDNA samples. Quantitative reverse transcription-PCR confirmed superior expression stability and lower absolute expression levels for many of these genes, including genes encoding a protein phosphatase 2A subunit, a coatomer subunit, and an ubiquitin-conjugating enzyme. The developed PCR primers or hybridization probes for the novel reference genes will enable better normalization and quantification of transcript levels in Arabidopsis in the future. PMID:16166256
Real-time multiplex PCR assay for detection of Yersinia pestis and Yersinia pseudotuberculosis.
Matero, Pirjo; Pasanen, Tanja; Laukkanen, Riikka; Tissari, Päivi; Tarkka, Eveliina; Vaara, Martti; Skurnik, Mikael
2009-01-01
A multiplex real-time polymerase chain reaction (PCR) assay was developed for the detection of Yersinia pestis and Yersinia pseudotuberculosis. The assay includes four primer pairs, two of which are specific for Y. pestis, one for Y. pestis and Y. pseudotuberculosis and one for bacteriophage lambda; the latter was used as an internal amplification control. The Y. pestis-specific target genes in the assay were ypo2088, a gene coding for a putative methyltransferase, and the pla gene coding for the plasminogen activator. In addition, the wzz gene was used as a target to specifically identify both Y. pestis and the closely related Y. pseudotuberculosis group. The primer and probe sets described for the different genes can be used either in single or in multiplex PCR assays because the individual probes were designed with different fluorochromes. The assays were found to be both sensitive and specific; the lower limit of the detection was 10-100 fg of extracted Y. pestis or Y. pseudotuberculosis total DNA. The sensitivity of the tetraplex assay was determined to be 1 cfu for the ypo2088 and pla probe labelled with FAM and JOE fluorescent dyes, respectively.
A convex optimization approach for identification of human tissue-specific interactomes.
Mohammadi, Shahin; Grama, Ananth
2016-06-15
Analysis of organism-specific interactomes has yielded novel insights into cellular function and coordination, understanding of pathology, and identification of markers and drug targets. Genes, however, can exhibit varying levels of cell type specificity in their expression, and their coordinated expression manifests in tissue-specific function and pathology. Tissue-specific/tissue-selective interaction mechanisms have significant applications in drug discovery, as they are more likely to reveal drug targets. Furthermore, tissue-specific transcription factors (tsTFs) are significantly implicated in human disease, including cancers. Finally, disease genes and protein complexes have the tendency to be differentially expressed in tissues in which defects cause pathology. These observations motivate the construction of refined tissue-specific interactomes from organism-specific interactomes. We present a novel technique for constructing human tissue-specific interactomes. Using a variety of validation tests (Edge Set Enrichment Analysis, Gene Ontology Enrichment, Disease-Gene Subnetwork Compactness), we show that our proposed approach significantly outperforms state-of-the-art techniques. Finally, using case studies of Alzheimer's and Parkinson's diseases, we show that tissue-specific interactomes derived from our study can be used to construct pathways implicated in pathology and demonstrate the use of these pathways in identifying novel targets. http://www.cs.purdue.edu/homes/mohammas/projects/ActPro.html mohammadi@purdue.edu. © The Author 2016. Published by Oxford University Press.
Voigt, Oliver; Adamska, Maja; Adamski, Marcin; Kittelmann, André; Wencker, Lukardis; Wörheide, Gert
2017-01-01
The ability to form mineral structures under biological control is widespread among animals. In several species, specific proteins have been shown to be involved in biomineralization, but it is uncertain how they influence the shape of the growing biomineral and the resulting skeleton. Calcareous sponges are the only sponges that form calcitic spicules, which, based on the number of rays (actines) are distinguished in diactines, triactines and tetractines. Each actine is formed by only two cells, called sclerocytes. Little is known about biomineralization proteins in calcareous sponges, other than that specific carbonic anhydrases (CAs) have been identified, and that uncharacterized Asx-rich proteins have been isolated from calcitic spicules. By RNA-Seq and RNA in situ hybridization (ISH), we identified five additional biomineralization genes in Sycon ciliatum: two bicarbonate transporters (BCTs) and three Asx-rich extracellular matrix proteins (ARPs). We show that these biomineralization genes are expressed in a coordinated pattern during spicule formation. Furthermore, two of the ARPs are spicule-type specific for triactines and tetractines (ARP1 or SciTriactinin) or diactines (ARP2 or SciDiactinin). Our results suggest that spicule formation is controlled by defined temporal and spatial expression of spicule-type specific sets of biomineralization genes. PMID:28406140
Immunohistochemistry as a surrogate for molecular testing: a review.
Swanson, Paul E
2015-02-01
Despite the myriad of genetic and epigenetic alterations in human neoplasms that seem to demand specific molecular probes for their identification and practical application to diagnostic pathology, immunohistochemistry (IHC) remains a vital component of laboratory testing in the emerging molecular era. The development and proper application of sensitive and specific antibodies raised against cryptic proteins only expressed in quantity after gene translocation, translocation-specific chimeric fusion peptides, and gene products overexpressed because of gene amplification demonstrate that IHC is a legitimate surrogate for traditional cytogenetic and in situ hybridization-based identification of chromosomal abnormalities, if not a viable molecular technique in its own right. Similarly, the detection of mutational events, through the reliable demonstration of protein loss, the identification of proteins overexpressed because of activating mutations, the specific visualization of mutant gene products, and the localization of splice variant gene products emphasizes the potential value of IHC as a surrogate for mutational analyses of genes important to both diagnosis and prediction of therapeutic response. In the latter setting IHC also provides a means of approximating gene expression profiles in the molecular classification and risk stratification of human neoplasms. For time being, the application of appropriately targeted sensitive and specific antibodies provides a cost-effective screening modality, if not replacement, for selected molecular techniques, but IHC will lose its value if the development of companion tests for emerging novel biomarkers does not keep pace with molecular techniques, particularly as the costs and time constraints of genomic sequencing diminish over time.
Oleksiak, Marjorie F; Karchner, Sibel I; Jenny, Matthew J; Franks, Diana G; Welch, David B Mark; Hahn, Mark E
2011-05-24
Populations of Atlantic killifish (Fundulus heteroclitus) have evolved resistance to the embryotoxic effects of polychlorinated biphenyls (PCBs) and other halogenated and nonhalogenated aromatic hydrocarbons that act through an aryl hydrocarbon receptor (AHR)-dependent signaling pathway. The resistance is accompanied by reduced sensitivity to induction of cytochrome P450 1A (CYP1A), a widely used biomarker of aromatic hydrocarbon exposure and effect, but whether the reduced sensitivity is specific to CYP1A or reflects a genome-wide reduction in responsiveness to all AHR-mediated changes in gene expression is unknown. We compared gene expression profiles and the response to 3,3',4,4',5-pentachlorobiphenyl (PCB-126) exposure in embryos (5 and 10 dpf) and larvae (15 dpf) from F. heteroclitus populations inhabiting the New Bedford Harbor, Massachusetts (NBH) Superfund site (PCB-resistant) and a reference site, Scorton Creek, Massachusetts (SC; PCB-sensitive). Analysis using a 7,000-gene cDNA array revealed striking differences in responsiveness to PCB-126 between the populations; the differences occur at all three stages examined. There was a sizeable set of PCB-responsive genes in the sensitive SC population, a much smaller set of PCB-responsive genes in NBH fish, and few similarities in PCB-responsive genes between the two populations. Most of the array results were confirmed, and additional PCB-regulated genes identified, by RNA-Seq (deep pyrosequencing). The results suggest that NBH fish possess a gene regulatory defect that is not specific to one target gene such as CYP1A but rather lies in a regulatory pathway that controls the transcriptional response of multiple genes to PCB exposure. The results are consistent with genome-wide disruption of AHR-dependent signaling in NBH fish.
Wolf, Yuri I; Makarova, Kira S; Yutin, Natalya; Koonin, Eugene V
2012-12-14
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major 'highways' of horizontal gene transfer. The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time. This article was reviewed by (for complete reviews see the Reviewers' Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
Comparative mRNA analysis of behavioral and genetic mouse models of aggression.
Malki, Karim; Tosto, Maria G; Pain, Oliver; Sluyter, Frans; Mineur, Yann S; Crusio, Wim E; de Boer, Sietse; Sandnabba, Kenneth N; Kesserwani, Jad; Robinson, Edward; Schalkwyk, Leonard C; Asherson, Philip
2016-04-01
Mouse models of aggression have traditionally compared strains, most notably BALB/cJ and C57BL/6. However, these strains were not designed to study aggression despite differences in aggression-related traits and distinct reactivity to stress. This study evaluated expression of genes differentially regulated in a stress (behavioral) mouse model of aggression with those from a recent genetic mouse model aggression. The study used a discovery-replication design using two independent mRNA studies from mouse brain tissue. The discovery study identified strain (BALB/cJ and C57BL/6J) × stress (chronic mild stress or control) interactions. Probe sets differentially regulated in the discovery set were intersected with those uncovered in the replication study, which evaluated differences between high and low aggressive animals from three strains specifically bred to study aggression. Network analysis was conducted on overlapping genes uncovered across both studies. A significant overlap was found with the genetic mouse study sharing 1,916 probe sets with the stress model. Fifty-one probe sets were found to be strongly dysregulated across both studies mapping to 50 known genes. Network analysis revealed two plausible pathways including one centered on the UBC gene hub which encodes ubiquitin, a protein well-known for protein degradation, and another on P38 MAPK. Findings from this study support the stress model of aggression, which showed remarkable molecular overlap with a genetic model. The study uncovered a set of candidate genes including the Erg2 gene, which has previously been implicated in different psychopathologies. The gene networks uncovered points at a Redox pathway as potentially being implicated in aggressive related behaviors. © 2016 Wiley Periodicals, Inc.
The construction of an EST database for Bombyx mori and its application
Mita, Kazuei; Morimyo, Mitsuoki; Okano, Kazuhiro; Koike, Yoshiko; Nohata, Junko; Kawasaki, Hideki; Kadono-Okuda, Keiko; Yamamoto, Kimiko; Suzuki, Masataka G.; Shimada, Toru; Goldsmith, Marian R.; Maeda, Susumu
2003-01-01
To build a foundation for the complete genome analysis of Bombyx mori, we have constructed an EST database. Because gene expression patterns deeply depend on tissues as well as developmental stages, we analyzed many cDNA libraries prepared from various tissues and different developmental stages to cover the entire set of Bombyx genes. So far, the Bombyx EST database contains 35,000 ESTs from 36 cDNA libraries, which are grouped into ≈11,000 nonredundant ESTs with the average length of 1.25 kb. The comparison with FlyBase suggests that the present EST database, SilkBase, covers >55% of all genes of Bombyx. The fraction of library-specific ESTs in each cDNA library indicates that we have not yet reached saturation, showing the validity of our strategy for constructing an EST database to cover all genes. To tackle the coming saturation problem, we have checked two methods, subtraction and normalization, to increase coverage and decrease the number of housekeeping genes, resulting in a 5–11% increase of library-specific ESTs. The identification of a number of genes and comprehensive cloning of gene families have already emerged from the SilkBase search. Direct links of SilkBase with FlyBase and WormBase provide ready identification of candidate Lepidoptera-specific genes. PMID:14614147
2011-01-01
Background Rhizomatousness is a key component of perenniality of many grasses that contribute to competitiveness and invasiveness of many noxious grass weeds, but can potentially be used to develop perennial cereal crops for sustainable farmers in hilly areas of tropical Asia. Oryza longistaminata, a perennial wild rice with strong rhizomes, has been used as the model species for genetic and molecular dissection of rhizome development and in breeding efforts to transfer rhizome-related traits into annual rice species. In this study, an effort was taken to get insights into the genes and molecular mechanisms underlying the rhizomatous trait in O. longistaminata by comparative analysis of the genome-wide tissue-specific gene expression patterns of five different tissues of O. longistaminata using the Affymetrix GeneChip Rice Genome Array. Results A total of 2,566 tissue-specific genes were identified in five different tissues of O. longistaminata, including 58 and 61 unique genes that were specifically expressed in the rhizome tips (RT) and internodes (RI), respectively. In addition, 162 genes were up-regulated and 261 genes were down-regulated in RT compared to the shoot tips. Six distinct cis-regulatory elements (CGACG, GCCGCC, GAGAC, AACGG, CATGCA, and TAAAG) were found to be significantly more abundant in the promoter regions of genes differentially expressed in RT than in the promoter regions of genes uniformly expressed in all other tissues. Many of the RT and/or RI specifically or differentially expressed genes were located in the QTL regions associated with rhizome expression, rhizome abundance and rhizome growth-related traits in O. longistaminata and thus are good candidate genes for these QTLs. Conclusion The initiation and development of the rhizomatous trait in O. longistaminata are controlled by very complex gene networks involving several plant hormones and regulatory genes, different members of gene families showing tissue specificity and their regulated pathways. Auxin/IAA appears to act as a negative regulator in rhizome development, while GA acts as the activator in rhizome development. Co-localization of the genes specifically expressed in rhizome tips and rhizome internodes with the QTLs for rhizome traits identified a large set of candidate genes for rhizome initiation and development in rice for further confirmation. PMID:21261937
Benitez, Cecil M.; Qu, Kun; Sugiyama, Takuya; Pauerstein, Philip T.; Liu, Yinghua; Tsai, Jennifer; Gu, Xueying; Ghodasara, Amar; Arda, H. Efsun; Zhang, Jiajing; Dekker, Joseph D.; Tucker, Haley O.; Chang, Howard Y.; Kim, Seung K.
2014-01-01
The regulatory logic underlying global transcriptional programs controlling development of visceral organs like the pancreas remains undiscovered. Here, we profiled gene expression in 12 purified populations of fetal and adult pancreatic epithelial cells representing crucial progenitor cell subsets, and their endocrine or exocrine progeny. Using probabilistic models to decode the general programs organizing gene expression, we identified co-expressed gene sets in cell subsets that revealed patterns and processes governing progenitor cell development, lineage specification, and endocrine cell maturation. Purification of Neurog3 mutant cells and module network analysis linked established regulators such as Neurog3 to unrecognized gene targets and roles in pancreas development. Iterative module network analysis nominated and prioritized transcriptional regulators, including diabetes risk genes. Functional validation of a subset of candidate regulators with corresponding mutant mice revealed that the transcription factors Etv1, Prdm16, Runx1t1 and Bcl11a are essential for pancreas development. Our integrated approach provides a unique framework for identifying regulatory genes and functional gene sets underlying pancreas development and associated diseases such as diabetes mellitus. PMID:25330008
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452
Grote, Steffi; Prüfer, Kay; Kelso, Janet; Dannemann, Michael
2016-10-15
We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Kim, Ji-Young; Kim, Kee-Beom; Son, Hye-Ju; Chae, Yun-Cheol; Oh, Si-Taek; Kim, Dong-Wook; Pak, Jhang Ho; Seo, Sang-Beom
2012-09-21
Significant progress has been made in understanding the relationship between histone modifications and 'reader' molecules and their effects on transcriptional regulation. A previously identified INHAT complex subunit, SET/TAF-Iβ, binds to histones and inhibits histone acetylation. To investigate the binding specificities of SET/TAF-Iβ to various histone modifications, we employed modified histone tail peptide array analyses. SET/TAF-Iβ strongly recognized PRC2-mediated H3K27me1/2/3; however, the bindings were completely disrupted by H3S28 phosphorylation. We have demonstrated that SET/TAF-Iβ is sequentially recruited to the target gene promoter ATF3 after the PRC2 complex via H3K27me recognition and may offer additive effects in the repression of the target gene. Copyright © 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Multiplex real-time PCR assay for Legionella species.
Kim, Seung Min; Jeong, Yoojung; Sohn, Jang Wook; Kim, Min Ja
2015-12-01
Legionella pneumophila serogroup 1 (sg1) accounts for the majority of infections in humans, but other Legionella species are also associated with human disease. In this study, a new SYBR Green I-based multiplex real-time PCR assay in a single reaction was developed to allow the rapid detection and differentiation of Legionella species by targeting specific gene sequences. Candidate target genes were selected, and primer sets were designed by referring to comparative genomic hybridization data of Legionella species. The Legionella species-specific groES primer set successfully detected all 30 Legionella strains tested. The xcpX and rfbA primers specifically detected L. pneumophila sg1-15 and L. pneumophila sg1, respectively. In addition, this assay was validated by testing clinical samples and isolates. In conclusion, this novel multiplex real-time PCR assay might be a useful diagnostic tool for the rapid detection and differentiation of Legionella species in both clinical and epidemiological studies. Copyright © 2015 Elsevier Ltd. All rights reserved.
Diversification of Root Hair Development Genes in Vascular Plants.
Huang, Ling; Shi, Xinhui; Wang, Wenjia; Ryu, Kook Hui; Schiefelbein, John
2017-07-01
The molecular genetic program for root hair development has been studied intensively in Arabidopsis ( Arabidopsis thaliana ). To understand the extent to which this program might operate in other plants, we conducted a large-scale comparative analysis of root hair development genes from diverse vascular plants, including eudicots, monocots, and a lycophyte. Combining phylogenetics and transcriptomics, we discovered conservation of a core set of root hair genes across all vascular plants, which may derive from an ancient program for unidirectional cell growth coopted for root hair development during vascular plant evolution. Interestingly, we also discovered preferential diversification in the structure and expression of root hair development genes, relative to other root hair- and root-expressed genes, among these species. These differences enabled the definition of sets of genes and gene functions that were acquired or lost in specific lineages during vascular plant evolution. In particular, we found substantial divergence in the structure and expression of genes used for root hair patterning, suggesting that the Arabidopsis transcriptional regulatory mechanism is not shared by other species. To our knowledge, this study provides the first comprehensive view of gene expression in a single plant cell type across multiple species. © 2017 American Society of Plant Biologists. All Rights Reserved.
Diversification of Root Hair Development Genes in Vascular Plants1[OPEN
Shi, Xinhui; Wang, Wenjia; Ryu, Kook Hui
2017-01-01
The molecular genetic program for root hair development has been studied intensively in Arabidopsis (Arabidopsis thaliana). To understand the extent to which this program might operate in other plants, we conducted a large-scale comparative analysis of root hair development genes from diverse vascular plants, including eudicots, monocots, and a lycophyte. Combining phylogenetics and transcriptomics, we discovered conservation of a core set of root hair genes across all vascular plants, which may derive from an ancient program for unidirectional cell growth coopted for root hair development during vascular plant evolution. Interestingly, we also discovered preferential diversification in the structure and expression of root hair development genes, relative to other root hair- and root-expressed genes, among these species. These differences enabled the definition of sets of genes and gene functions that were acquired or lost in specific lineages during vascular plant evolution. In particular, we found substantial divergence in the structure and expression of genes used for root hair patterning, suggesting that the Arabidopsis transcriptional regulatory mechanism is not shared by other species. To our knowledge, this study provides the first comprehensive view of gene expression in a single plant cell type across multiple species. PMID:28487476
Wang, Hao; Sun, Xuming; Chou, Jeff; Lin, Marina; Ferrario, Carlos M; Zapata-Sudo, Gisele; Groban, Leanne
2017-08-01
Activation of G protein-coupled estrogen receptor (GPER) by its agonist, G1, protects the heart from stressors such as pressure-overload, ischemia, a high-salt diet, estrogen loss, and aging, in various male and female animal models. Due to nonspecific effects of G1, the exact functions of cardiac GPER cannot be concluded from studies using systemic G1 administration. Moreover, global knockdown of GPER affects glucose homeostasis, blood pressure, and many other cardiovascular-related systems, thereby confounding interpretation of its direct cardiac actions. We generated a cardiomyocyte-specific GPER knockout (KO) mouse model to specifically investigate the functions of GPER in cardiomyocytes. Compared to wild type mice, cardiomyocyte-specific GPER KO mice exhibited adverse alterations in cardiac structure and impaired systolic and diastolic function, as measured by echocardiography. Gene deletion effects on left ventricular dimensions were more profound in male KO mice compared to female KO mice. Analysis of DNA microarray data from isolated cardiomyocytes of wild type and KO mice revealed sex-based differences in gene expression profiles affecting multiple transcriptional networks. Gene Set Enrichment Analysis (GSEA) revealed that mitochondrial genes are enriched in GPER KO females, whereas inflammatory response genes are enriched in GPER KO males, compared to their wild type counterparts of the same sex. The cardiomyocyte-specific GPER KO mouse model provides us with a powerful tool to study the functions of GPER in cardiomyocytes. The gene expression profiles of the GPER KO mice provide foundational information for further study of the mechanisms underlying sex-specific cardioprotection by GPER. Copyright © 2016 Elsevier B.V. All rights reserved.
Convergence of the transcriptional responses to heat shock and singlet oxygen stresses.
Dufour, Yann S; Imam, Saheed; Koo, Byoung-Mo; Green, Heather A; Donohue, Timothy J
2012-09-01
Cells often mount transcriptional responses and activate specific sets of genes in response to stress-inducing signals such as heat or reactive oxygen species. Transcription factors in the RpoH family of bacterial alternative σ factors usually control gene expression during a heat shock response. Interestingly, several α-proteobacteria possess two or more paralogs of RpoH, suggesting some functional distinction. We investigated the target promoters of Rhodobacter sphaeroides RpoH(I) and RpoH(II) using genome-scale data derived from gene expression profiling and the direct interactions of each protein with DNA in vivo. We found that the RpoH(I) and RpoH(II) regulons have both distinct and overlapping gene sets. We predicted DNA sequence elements that dictate promoter recognition specificity by each RpoH paralog. We found that several bases in the highly conserved TTG in the -35 element are important for activity with both RpoH homologs; that the T-9 position, which is over-represented in the RpoH(I) promoter sequence logo, is critical for RpoH(I)-dependent transcription; and that several bases in the predicted -10 element were important for activity with either RpoH(II) or both RpoH homologs. Genes that are transcribed by both RpoH(I) and RpoH(II) are predicted to encode for functions involved in general cell maintenance. The functions specific to the RpoH(I) regulon are associated with a classic heat shock response, while those specific to RpoH(II) are associated with the response to the reactive oxygen species, singlet oxygen. We propose that a gene duplication event followed by changes in promoter recognition by RpoH(I) and RpoH(II) allowed convergence of the transcriptional responses to heat and singlet oxygen stress in R. sphaeroides and possibly other bacteria.
Drouin, Simon; Laramée, Louise; Jacques, Pierre-Étienne; Forest, Audrey; Bergeron, Maxime; Robert, François
2010-10-28
Histone deacetylase Rpd3 is part of two distinct complexes: the large (Rpd3L) and small (Rpd3S) complexes. While Rpd3L targets specific promoters for gene repression, Rpd3S is recruited to ORFs to deacetylate histones in the wake of RNA polymerase II, to prevent cryptic initiation within genes. Methylation of histone H3 at lysine 36 by the Set2 methyltransferase is thought to mediate the recruitment of Rpd3S. Here, we confirm by ChIP-Chip that Rpd3S binds active ORFs. Surprisingly, however, Rpd3S is not recruited to all active genes, and its recruitment is Set2-independent. However, Rpd3S complexes recruited in the absence of H3K36 methylation appear to be inactive. Finally, we present evidence implicating the yeast DSIF complex (Spt4/5) and RNA polymerase II phosphorylation by Kin28 and Ctk1 in the recruitment of Rpd3S to active genes. Taken together, our data support a model where Set2-dependent histone H3 methylation is required for the activation of Rpd3S following its recruitment to the RNA polymerase II C-terminal domain.
Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang
2017-08-23
Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.
Wear, Emma K; Wilbanks, Elizabeth G; Nelson, Craig E; Carlson, Craig A
2018-03-09
Primers targeting the 16S small subunit ribosomal RNA marker gene, used to characterize bacterial and archaeal communities, have recently been re-evaluated for marine planktonic habitats. To investigate whether primer selection affects the ecological interpretation of bacterioplankton populations and community dynamics, amplicon sequencing with four primer sets targeting several hypervariable regions of the 16S rRNA gene was conducted on both mock communities constructed from cloned 16S rRNA genes and a time-series of DNA samples from the temperate coastal Santa Barbara Channel. Ecological interpretations of community structure (delineation of depth and seasonality, correlations with environmental factors) were similar across primer sets, while population dynamics varied. We observed substantial differences in relative abundances of taxa known to be poorly resolved by some primer sets, such as Thaumarchaeota and SAR11, and unexpected taxa including Roseobacter clades. Though the magnitude of relative abundances of common OTUs differed between primer sets, the relative abundances of the OTUs were nonetheless strongly correlated. We do not endorse one primer set but rather enumerate strengths and weaknesses to facilitate selection appropriate to a system or experimental goal. While 16S rRNA gene primer bias suggests caution in assessing quantitative population dynamics, community dynamics appear robust across studies using different primers. © 2018 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.
2014-01-01
Background Imprinted genes have been extensively documented in eutherian mammals and found to exhibit significant interspecific variation in the suites of genes that are imprinted and in their regulation between tissues and developmental stages. Much less is known about imprinted loci in metatherian (marsupial) mammals, wherein studies have been limited to a small number of genes previously known to be imprinted in eutherians. We describe the first ab initio search for imprinted marsupial genes, in fibroblasts from the opossum, Monodelphis domestica, based on a genome-wide ChIP-seq strategy to identify promoters that are simultaneously marked by mutually exclusive, transcriptionally opposing histone modifications. Results We identified a novel imprinted gene (Meis1) and two additional monoallelically expressed genes, one of which (Cstb) showed allele-specific, but non-imprinted expression. Imprinted vs. allele-specific expression could not be resolved for the third monoallelically expressed gene (Rpl17). Transcriptionally opposing histone modifications H3K4me3, H3K9Ac, and H3K9me3 were found at the promoters of all three genes, but differential DNA methylation was not detected at CpG islands at any of these promoters. Conclusions In generating the first genome-wide histone modification profiles for a marsupial, we identified the first gene that is imprinted in a marsupial but not in eutherian mammals. This outcome demonstrates the practicality of an ab initio discovery strategy and implicates histone modification, but not differential DNA methylation, as a conserved mechanism for marking imprinted genes in all therian mammals. Our findings suggest that marsupials use multiple epigenetic mechanisms for imprinting and support the concept that lineage-specific selective forces can produce sets of imprinted genes that differ between metatherian and eutherian lines. PMID:24484454
Co-expression networks reveal the tissue-specific regulation of transcription and splicing.
Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D H; Jo, Brian; Gao, Chuan; McDowell, Ian C; Engelhardt, Barbara E; Battle, Alexis
2017-11-01
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues. © 2017 Saha et al.; Published by Cold Spring Harbor Laboratory Press.
The Sorghum bicolor genome and the diversification of grasses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paterson, Andrew H.; Bowers, John E.; Bruggmann, Remy
2008-08-20
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approx730-megabase Sorghum bicolor (L.) Moench genome, placing approx98percent of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approx75percent larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidizationmore » approx70 million years ago, most duplicated gene sets lost one member before the sorghum rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24percent of genes are grass-specific and 7percent are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.« less
The Sorghum bicolor genome and the diversification of grasses.
Paterson, Andrew H; Bowers, John E; Bruggmann, Rémy; Dubchak, Inna; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hellsten, Uffe; Mitros, Therese; Poliakov, Alexander; Schmutz, Jeremy; Spannagl, Manuel; Tang, Haibao; Wang, Xiyin; Wicker, Thomas; Bharti, Arvind K; Chapman, Jarrod; Feltus, F Alex; Gowik, Udo; Grigoriev, Igor V; Lyons, Eric; Maher, Christopher A; Martis, Mihaela; Narechania, Apurva; Otillar, Robert P; Penning, Bryan W; Salamov, Asaf A; Wang, Yu; Zhang, Lifang; Carpita, Nicholas C; Freeling, Michael; Gingle, Alan R; Hash, C Thomas; Keller, Beat; Klein, Patricia; Kresovich, Stephen; McCann, Maureen C; Ming, Ray; Peterson, Daniel G; Mehboob-ur-Rahman; Ware, Doreen; Westhoff, Peter; Mayer, Klaus F X; Messing, Joachim; Rokhsar, Daniel S
2009-01-29
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liang, Ying; Gao, Yajun; Jones, Alan M.
The three-member family of Arabidopsis extra-large G proteins (XLG1-3) defines the prototype of an atypical Ga subunit in the heterotrimeric G protein complex. Some recent evidence indicate that XLG subunits operate along with its Gbg dimer in root morphology, stress responsiveness, and cytokinin induced development, however downstream targets of activated XLG proteins in the stress pathways are rarely known. In order to assemble a set of candidate XLG-targeted proteins, a yeast two-hybrid complementation-based screen was performed using XLG protein baits to query interactions between XLG and partner protein found in glucose-treated seedlings, roots, and Arabidopsis cells in culture. Seventy twomore » interactors were identified and >60% of a test set displayed in vivo interaction with XLG proteins. Gene co-expression analysis shows that >70% of the interactors are positively correlated with the corresponding XLG partners. Gene Ontology enrichment for all the candidates indicates stress responses and posits a molecular mechanism involving a specific set of transcription factor partners to XLG. Genes encoding two of these transcription factors, SZF1 and 2, require XLG proteins for full NaCl-induced expression. Furthermore, the subcellular localization of the XLG proteins in the nucleus, endosome, and plasma membrane is dependent on the specific interacting partner.« less
Liang, Ying; Gao, Yajun; Jones, Alan M.
2017-06-13
The three-member family of Arabidopsis extra-large G proteins (XLG1-3) defines the prototype of an atypical Ga subunit in the heterotrimeric G protein complex. Some recent evidence indicate that XLG subunits operate along with its Gbg dimer in root morphology, stress responsiveness, and cytokinin induced development, however downstream targets of activated XLG proteins in the stress pathways are rarely known. In order to assemble a set of candidate XLG-targeted proteins, a yeast two-hybrid complementation-based screen was performed using XLG protein baits to query interactions between XLG and partner protein found in glucose-treated seedlings, roots, and Arabidopsis cells in culture. Seventy twomore » interactors were identified and >60% of a test set displayed in vivo interaction with XLG proteins. Gene co-expression analysis shows that >70% of the interactors are positively correlated with the corresponding XLG partners. Gene Ontology enrichment for all the candidates indicates stress responses and posits a molecular mechanism involving a specific set of transcription factor partners to XLG. Genes encoding two of these transcription factors, SZF1 and 2, require XLG proteins for full NaCl-induced expression. Furthermore, the subcellular localization of the XLG proteins in the nucleus, endosome, and plasma membrane is dependent on the specific interacting partner.« less
Abdominal-B and caudal inhibit the formation of specific neuroblasts in the Drosophila tail region
Birkholz, Oliver; Vef, Olaf; Rogulja-Ortmann, Ana; Berger, Christian; Technau, Gerhard M.
2013-01-01
The central nervous system of Drosophila melanogaster consists of fused segmental units (neuromeres), each generated by a characteristic number of neural stem cells (neuroblasts). In the embryo, thoracic and anterior abdominal neuromeres are almost equally sized and formed by repetitive sets of neuroblasts, whereas the terminal abdominal neuromeres are generated by significantly smaller populations of progenitor cells. Here we investigated the role of the Hox gene Abdominal-B in shaping the terminal neuromeres. We show that the regulatory isoform of Abdominal-B (Abd-B.r) not only confers abdominal fate to specific neuroblasts (e.g. NB6-4) and regulates programmed cell death of several progeny cells within certain neuroblast lineages (e.g. NB3-3) in parasegment 14, but also inhibits the formation of a specific set of neuroblasts in parasegment 15 (including NB7-3). We further show that Abd-B.r requires cooperation of the ParaHox gene caudal to unfold its full competence concerning neuroblast inhibition and specification. Thus, our findings demonstrate that combined action of Abdominal-B and caudal contributes to the size and composition of the terminal neuromeres by regulating both the number and lineages of specific neuroblasts. PMID:23903193
Genome sequence of the model medicinal mushroom Ganoderma lucidum
Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R.; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V.; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C.; Sun, Chao
2012-01-01
Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi. PMID:22735441
The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution
Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.
2010-01-01
To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049
Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong
2015-10-19
Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .
Nguyen, Quan; Lukowski, Samuel; Chiu, Han; Senabouth, Anne; Bruxner, Timothy; Christ, Angelika; Palpant, Nathan; Powell, Joseph
2018-05-11
Heterogeneity of cell states represented in pluripotent cultures have not been described at the transcriptional level. Since gene expression is highly heterogeneous between cells, single-cell RNA sequencing can be used to identify how individual pluripotent cells function. Here, we present results from the analysis of single-cell RNA sequencing data from 18,787 individual WTC CRISPRi human induced pluripotent stem cells. We developed an unsupervised clustering method, and through this identified four subpopulations distinguishable on the basis of their pluripotent state including: a core pluripotent population (48.3%), proliferative (47.8%), early-primed for differentiation (2.8%) and late-primed for differentiation (1.1%). For each subpopulation we were able to identify the genes and pathways that define differences in pluripotent cell states. Our method identified four discrete predictor gene sets comprised of 165 unique genes that denote the specific pluripotency states; and using these sets, we developed a multigenic machine learning prediction method to accurately classify single cells into each of the subpopulations. Compared against a set of established pluripotency markers, our method increases prediction accuracy by 10%, specificity by 20%, and explains a substantially larger proportion of deviance (up to 3-fold) from the prediction model. Finally, we developed an innovative method to predict cells transitioning between subpopulations, and support our conclusions with results from two orthogonal pseudotime trajectory methods. Published by Cold Spring Harbor Laboratory Press.
Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.
Jiang, Xue; Zhang, Han; Quan, Xiongwen
2016-01-01
Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.
A gene expression resource generated by genome-wide lacZ profiling in the mouse
Tuck, Elizabeth; Estabel, Jeanne; Oellrich, Anika; Maguire, Anna Karin; Adissu, Hibret A.; Souter, Luke; Siragher, Emma; Lillistone, Charlotte; Green, Angela L.; Wardle-Jones, Hannah; Carragher, Damian M.; Karp, Natasha A.; Smedley, Damian; Adams, Niels C.; Bussell, James N.; Adams, David J.; Ramírez-Solis, Ramiro; Steel, Karen P.; Galli, Antonella; White, Jacqueline K.
2015-01-01
ABSTRACT Knowledge of the expression profile of a gene is a critical piece of information required to build an understanding of the normal and essential functions of that gene and any role it may play in the development or progression of disease. High-throughput, large-scale efforts are on-going internationally to characterise reporter-tagged knockout mouse lines. As part of that effort, we report an open access adult mouse expression resource, in which the expression profile of 424 genes has been assessed in up to 47 different organs, tissues and sub-structures using a lacZ reporter gene. Many specific and informative expression patterns were noted. Expression was most commonly observed in the testis and brain and was most restricted in white adipose tissue and mammary gland. Over half of the assessed genes presented with an absent or localised expression pattern (categorised as 0-10 positive structures). A link between complexity of expression profile and viability of homozygous null animals was observed; inactivation of genes expressed in ≥21 structures was more likely to result in reduced viability by postnatal day 14 compared with more restricted expression profiles. For validation purposes, this mouse expression resource was compared with Bgee, a federated composite of RNA-based expression data sets. Strong agreement was observed, indicating a high degree of specificity in our data. Furthermore, there were 1207 observations of expression of a particular gene in an anatomical structure where Bgee had no data, indicating a large amount of novelty in our data set. Examples of expression data corroborating and extending genotype-phenotype associations and supporting disease gene candidacy are presented to demonstrate the potential of this powerful resource. PMID:26398943
Clare, Susan E; Gupta, Akash; Choi, MiRan; Ranjan, Manish; Lee, Oukseub; Wang, Jun; Ivancic, David Z; Kim, J Julie; Khan, Seema A
2016-05-23
The synthesis of specific, potent progesterone antagonists adds potential agents to the breast cancer prevention and treatment armamentarium. The identification of individuals who will benefit from these agents will be a critical factor for their clinical success. We utilized telapristone acetate (TPA; CDB-4124) to understand the effects of progesterone receptor (PR) blockade on proliferation, apoptosis, promoter binding, cell cycle progression, and gene expression. We then identified a set of genes that overlap with human breast luteal-phase expressed genes and signify progesterone activity in both normal breast cells and breast cancer cell lines. TPA administration to T47D cells results in a 30 % decrease in cell number at 24 h, which is maintained over 72 h only in the presence of estradiol. Blockade of progesterone signaling by TPA for 24 h results in fewer cells in G2/M, attributable to decreased expression of genes that facilitate the G2/M transition. Gene expression data suggest that TPA affects several mechanisms that progesterone utilizes to control gene expression, including specific post-translational modifications, and nucleosomal organization and higher order chromatin structure, which regulate access of PR to its DNA binding sites. By comparing genes induced by the progestin R5020 in T47D cells with those increased in the luteal-phase normal breast, we have identified a set of genes that predict functional progesterone signaling in tissue. These data will facilitate an understanding of the ways in which drugs such as TPA may be utilized for the prevention, and possibly the therapy, of human breast cancer.
Trescher, Saskia; Münchmeyer, Jannes; Leser, Ulf
2017-03-27
Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation. Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets. The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points.
Programmed DNA Elimination: Keeping Germline Genes in Their Place.
Smith, Jeramiah J
2018-05-21
Each of our cells contains a full set of instructions needed to make an entire human: the genome. But a few special species buck this trend. A new study now identifies the first germline-specific gene in zebra finch, one of a small number of vertebrates that are known to undergo developmentally programmed DNA elimination. Copyright © 2018 Elsevier Ltd. All rights reserved.
Rollins, Derrick K; Teh, Ailing
2010-12-17
Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.
Azad, Tej D; Donato, Michele; Heylen, Line; Liu, Andrew B; Shen-Orr, Shai S; Sweeney, Timothy E; Maltzman, Jonathan Scott; Naesens, Maarten; Khatri, Purvesh
2018-01-25
Late allograft failure is characterized by cumulative subclinical insults manifesting over many years. Although immunomodulatory therapies targeting host T cells have improved short-term survival rates, rates of chronic allograft loss remain high. We hypothesized that other immune cell types may drive subclinical injury, ultimately leading to graft failure. We collected whole-genome transcriptome profiles from 15 independent cohorts composed of 1,697 biopsy samples to assess the association of an inflammatory macrophage polarization-specific gene signature with subclinical injury. We applied penalized regression to a subset of the data sets and identified a 3-gene inflammatory macrophage-derived signature. We validated discriminatory power of the 3-gene signature in 3 independent renal transplant data sets with mean AUC of 0.91. In a longitudinal cohort, the 3-gene signature strongly correlated with extent of injury and accurately predicted progression of subclinical injury 18 months before clinical manifestation. The 3-gene signature also stratified patients at high risk of graft failure as soon as 15 days after biopsy. We found that the 3-gene signature also distinguished acute rejection (AR) accurately in 3 heart transplant data sets but not in lung transplant. Overall, we identified a parsimonious signature capable of diagnosing AR, recognizing subclinical injury, and risk-stratifying renal transplant patients. Our results strongly suggest that inflammatory macrophages may be a viable therapeutic target to improve long-term outcomes for organ transplantation patients.
Mouse Genome Database: From sequence to phenotypes and disease models
Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.
2015-01-01
Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326
da Rocha, Edroaldo Lummertz; Ung, Choong Yong; McGehee, Cordelia D; Correia, Cristina; Li, Hu
2016-06-02
The sequential chain of interactions altering the binary state of a biomolecule represents the 'information flow' within a cellular network that determines phenotypic properties. Given the lack of computational tools to dissect context-dependent networks and gene activities, we developed NetDecoder, a network biology platform that models context-dependent information flows using pairwise phenotypic comparative analyses of protein-protein interactions. Using breast cancer, dyslipidemia and Alzheimer's disease as case studies, we demonstrate NetDecoder dissects subnetworks to identify key players significantly impacting cell behaviour specific to a given disease context. We further show genes residing in disease-specific subnetworks are enriched in disease-related signalling pathways and information flow profiles, which drive the resulting disease phenotypes. We also devise a novel scoring scheme to quantify key genes-network routers, which influence many genes, key targets, which are influenced by many genes, and high impact genes, which experience a significant change in regulation. We show the robustness of our results against parameter changes. Our network biology platform includes freely available source code (http://www.NetDecoder.org) for researchers to explore genome-wide context-dependent information flow profiles and key genes, given a set of genes of particular interest and transcriptome data. More importantly, NetDecoder will enable researchers to uncover context-dependent drug targets. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Dissecting gene expression at the blood-brain barrier
Huntley, Melanie A.; Bien-Ly, Nga; Daneman, Richard; Watts, Ryan J.
2014-01-01
The availability of genome-wide expression data for the blood-brain barrier is an invaluable resource that has recently enabled the discovery of several genes and pathways involved in the development and maintenance of the blood-brain barrier, particularly in rodent models. The broad distribution of published data sets represents a viable starting point for the molecular dissection of the blood-brain barrier and will further direct the discovery of novel mechanisms of blood-brain barrier formation and function. Technical advances in purifying brain endothelial cells, the key cell that forms the critical barrier, have allowed for greater specificity in gene expression comparisons with other central nervous system cell types, and more systematic characterizations of the molecular composition of the blood-brain barrier. Nevertheless, our understanding of how the blood-brain barrier changes during aging and disease is underrepresented. Blood-brain barrier data sets from a wider range of experimental paradigms and species, including invertebrates and primates, would be invaluable for investigating the function and evolution of the blood-brain barrier. Newer technologies in gene expression profiling, such as RNA-sequencing, now allow for finer resolution of transcriptomic changes, including isoform specificity and RNA-editing. As our field continues to utilize more advanced expression profiling in its ongoing efforts to elucidate the blood-brain barrier, including in disease and drug delivery, we will continue to see rapid advances in our understanding of the molecular mediators of barrier biology. We predict that the recently published data sets, combined with forthcoming genomic and proteomic blood-brain barrier data sets, will continue to fuel the molecular genetic revolution of blood-brain barrier biology. PMID:25414634
Alteration of gene expression by alcohol exposure at early neurulation.
Zhou, Feng C; Zhao, Qianqian; Liu, Yunlong; Goodlett, Charles R; Liang, Tiebing; McClintick, Jeanette N; Edenberg, Howard J; Li, Lang
2011-02-21
We have previously demonstrated that alcohol exposure at early neurulation induces growth retardation, neural tube abnormalities, and alteration of DNA methylation. To explore the global gene expression changes which may underline these developmental defects, microarray analyses were performed in a whole embryo mouse culture model that allows control over alcohol and embryonic variables. Alcohol caused teratogenesis in brain, heart, forelimb, and optic vesicle; a subset of the embryos also showed cranial neural tube defects. In microarray analysis (accession number GSM9545), adopting hypothesis-driven Gene Set Enrichment Analysis (GSEA) informatics and intersection analysis of two independent experiments, we found that there was a collective reduction in expression of neural specification genes (neurogenin, Sox5, Bhlhe22), neural growth factor genes [Igf1, Efemp1, Klf10 (Tieg), and Edil3], and alteration of genes involved in cell growth, apoptosis, histone variants, eye and heart development. There was also a reduction of retinol binding protein 1 (Rbp1), and de novo expression of aldehyde dehydrogenase 1B1 (Aldh1B1). Remarkably, four key hematopoiesis genes (glycophorin A, adducin 2, beta-2 microglobulin, and ceruloplasmin) were absent after alcohol treatment, and histone variant genes were reduced. The down-regulation of the neurospecification and the neurotrophic genes were further confirmed by quantitative RT-PCR. Furthermore, the gene expression profile demonstrated distinct subgroups which corresponded with two distinct alcohol-related neural tube phenotypes: an open (ALC-NTO) and a closed neural tube (ALC-NTC). Further, the epidermal growth factor signaling pathway and histone variants were specifically altered in ALC-NTO, and a greater number of neurotrophic/growth factor genes were down-regulated in the ALC-NTO than in the ALC-NTC embryos. This study revealed a set of genes vulnerable to alcohol exposure and genes that were associated with neural tube defects during early neurulation.
Fauteux, François; Strömvik, Martina V
2009-01-01
Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes. PMID:19843335
Xu, Aishi; Li, Guang; Yang, Dong; Wu, Songfeng; Ouyang, Hongsheng; Xu, Ping; He, Fuchu
2015-12-04
Although the "missing protein" is a temporary concept in C-HPP, the biological information for their "missing" could be an important clue in evolutionary studies. Here we classified missing-protein-encoding genes into two groups, the genes encoding PE2 proteins (with transcript evidence) and the genes encoding PE3/4 proteins (with no transcript evidence). These missing-protein-encoding genes distribute unevenly among different chromosomes, chromosomal regions, or gene clusters. In the view of evolutionary features, PE3/4 genes tend to be young, spreading at the nonhomology chromosomal regions and evolving at higher rates. Interestingly, there is a higher proportion of singletons in PE3/4 genes than the proportion of singletons in all genes (background) and OTCSGs (organ, tissue, cell type-specific genes). More importantly, most of the paralogous PE3/4 genes belong to the newly duplicated members of the paralogous gene groups, which mainly contribute to special biological functions, such as "smell perception". These functions are heavily restricted into specific type of cells, tissues, or specific developmental stages, acting as the new functional requirements that facilitated the emergence of the missing-protein-encoding genes during evolution. In addition, the criteria for the extremely special physical-chemical proteins were first set up based on the properties of PE2 proteins, and the evolutionary characteristics of those proteins were explored. Overall, the evolutionary analyses of missing-protein-encoding genes are expected to be highly instructive for proteomics and functional studies in the future.
NASA Astrophysics Data System (ADS)
UŻarowska, E.; Czajkowski, Rafał; Konopka, W.
2014-11-01
We aim to create a set of genetic tools where permanent opsin expression (ChR or NpHR) is precisely limited to the population of neurons that express immediate early gene c-fos during a specific temporal window of behavioral training. Since the c-fos gene is only expressed in neurons that form experience-dependent ensemble, this approach will result in specific labeling of a small subset of cells that create memory trace for the learned behavior. To this end we employ two alternative inducible gene expression systems: Tet Expression System and Cre/lox System. In both cases, the temporal window for opsin induction is controlled pharmacologically, by doxycycline or tamoxifen, respectively. Both systems will be used for creating lines of transgenic animals.
Tissue Gene Expression Analysis Using Arrayed Normalized cDNA Libraries
Eickhoff, Holger; Schuchhardt, Johannes; Ivanov, Igor; Meier-Ewert, Sebastian; O'Brien, John; Malik, Arif; Tandon, Neeraj; Wolski, Eryk-Witold; Rohlfs, Elke; Nyarsik, Lajos; Reinhardt, Richard; Nietfeld, Wilfried; Lehrach, Hans
2000-01-01
We have used oligonucleotide-fingerprinting data on 60,000 cDNA clones from two different mouse embryonic stages to establish a normalized cDNA clone set. The normalized set of 5,376 clones represents different clusters and therefore, in almost all cases, different genes. The inserts of the cDNA clones were amplified by PCR and spotted on glass slides. The resulting arrays were hybridized with mRNA probes prepared from six different adult mouse tissues. Expression profiles were analyzed by hierarchical clustering techniques. We have chosen radioactive detection because it combines robustness with sensitivity and allows the comparison of multiple normalized experiments. Sensitive detection combined with highly effective clustering algorithms allowed the identification of tissue-specific expression profiles and the detection of genes specifically expressed in the tissues investigated. The obtained results are publicly available (http://www.rzpd.de) and can be used by other researchers as a digital expression reference. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AL360374–AL36537.] PMID:10958641
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci.
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-02-14
Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). The PAX8-target gene set was ranked 1/615 in the discovery (P GSEA <0.001; FDR=0.21), 7/615 in the replication (P GSEA =0.004; FDR=0.37), and 1/615 in the combined (P GSEA <0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10 -5 (including six with P<5 × 10 -8 ). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (P GSEA =0.025) and IGROV1 (P GSEA =0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC.
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-01-01
Background: Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. Methods: All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). Results: The PAX8-target gene set was ranked 1/615 in the discovery (PGSEA<0.001; FDR=0.21), 7/615 in the replication (PGSEA=0.004; FDR=0.37), and 1/615 in the combined (PGSEA<0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10−5 (including six with P<5 × 10−8). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (PGSEA=0.025) and IGROV1 (PGSEA=0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Conclusions: Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC. PMID:28103614
Ficklin, Stephen P; Dunwoodie, Leland J; Poehlman, William L; Watson, Christopher; Roche, Kimberly E; Feltus, F Alex
2017-08-17
A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.
Li, Chi-Ming; Guo, Meirong; Borczuk, Alain; Powell, Charles A.; Wei, Michelle; Thaker, Harshwardhan M.; Friedman, Richard; Klein, Ulf; Tycko, Benjamin
2002-01-01
Wilms’ tumor (WT) has been considered a prototype for arrested cellular differentiation in cancer, but previous studies have relied on selected markers. We have now performed an unbiased survey of gene expression in WTs using oligonucleotide microarrays. Statistical criteria identified 357 genes as differentially expressed between WTs and fetal kidneys. This set contained 124 matches to genes on a microarray used by Stuart and colleagues (Stuart RO, Bush KT, Nigam SK: Changes in global gene expression patterns during development and maturation of the rat kidney. Proc Natl Acad Sci USA 2001, 98:5649–5654) to establish genes with stage-specific expression in the developing rat kidney. Mapping between the two data sets showed that WTs systematically overexpressed genes corresponding to the earliest stage of metanephric development, and underexpressed genes corresponding to later stages. Automated clustering identified a smaller group of 27 genes that were highly expressed in WTs compared to fetal kidney and heterologous tumor and normal tissues. This signature set was enriched in genes encoding transcription factors. Four of these, PAX2, EYA1, HBF2, and HOXA11, are essential for cell survival and proliferation in early metanephric development, whereas others, including SIX1, MOX1, and SALL2, are predicted to act at this stage. SIX1 and SALL2 proteins were expressed in the condensing mesenchyme in normal human fetal kidneys, but were absent (SIX1) or reduced (SALL2) in cells at other developmental stages. These data imply that the blastema in WTs has progressed to the committed stage in the mesenchymal-epithelial transition, where it is partially arrested in differentiation. The WT-signature set also contained the Wnt receptor FZD7, the tumor antigen PRAME, the imprinted gene NNAT and the metastasis-associated transcription factor E1AF. PMID:12057921
Chen, Shuonan; Mar, Jessica C
2018-06-19
A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.
Kentzoglanakis, Kyriakos; Poole, Matthew
2012-01-01
In this paper, we investigate the problem of reverse engineering the topology of gene regulatory networks from temporal gene expression data. We adopt a computational intelligence approach comprising swarm intelligence techniques, namely particle swarm optimization (PSO) and ant colony optimization (ACO). In addition, the recurrent neural network (RNN) formalism is employed for modeling the dynamical behavior of gene regulatory systems. More specifically, ACO is used for searching the discrete space of network architectures and PSO for searching the corresponding continuous space of RNN model parameters. We propose a novel solution construction process in the context of ACO for generating biologically plausible candidate architectures. The objective is to concentrate the search effort into areas of the structure space that contain architectures which are feasible in terms of their topological resemblance to real-world networks. The proposed framework is initially applied to the reconstruction of a small artificial network that has previously been studied in the context of gene network reverse engineering. Subsequently, we consider an artificial data set with added noise for reconstructing a subnetwork of the genetic interaction network of S. cerevisiae (yeast). Finally, the framework is applied to a real-world data set for reverse engineering the SOS response system of the bacterium Escherichia coli. Results demonstrate the relative advantage of utilizing problem-specific knowledge regarding biologically plausible structural properties of gene networks over conducting a problem-agnostic search in the vast space of network architectures.
Provenzano, Paolo P; Inman, David R; Eliceiri, Kevin W; Beggs, Hilary E; Keely, Patricia J
2008-11-01
Focal adhesion kinase (FAK) is a central regulator of the focal adhesion, influencing cell proliferation, survival, and migration. Despite evidence demonstrating FAK overexpression in human cancer, its role in tumor initiation and progression is not well understood. Using Cre/LoxP technology to specifically knockout FAK in the mammary epithelium, we showed that FAK is not required for tumor initiation but is required for tumor progression. The mechanistic underpinnings of these results suggested that FAK regulates clinically relevant gene signatures and multiple signaling complexes associated with tumor progression and metastasis, such as Src, ERK, and p130Cas. Furthermore, a systems-level analysis identified FAK as a major regulator of the tumor transcriptome, influencing genes associated with adhesion and growth factor signaling pathways, and their cross talk. Additionally, FAK was shown to down-regulate the expression of clinically relevant proliferation- and metastasis-associated gene signatures, as well as an enriched group of genes associated with the G(2) and G(2)/M phases of the cell cycle. Computational analysis of transcription factor-binding sites within ontology-enriched or clustered gene sets suggested that the differentially expressed proliferation- and metastasis-associated genes in FAK-null cells were regulated through a common set of transcription factors, including p53. Therefore, FAK acts as a primary node in the activated signaling network in transformed motile cells and is a prime candidate for novel therapeutic interventions to treat aggressive human breast cancers.
Nasser, Waleed; Santhanam, Balaji; Miranda, Edward Roshan; Parikh, Anup; Juneja, Kavina; Rot, Gregor; Dinh, Chris; Chen, Rui; Zupan, Blaz; Shaulsky, Gad; Kuspa, Adam
2014-01-01
Background Amoebae and bacteria interact within predator/prey and host/pathogen relationships, but the general response of amoeba to bacteria is not well understood. The amoeba Dictyostelium discoideum feeds on, and is colonized by diverse bacterial species including Gram-positive [Gram(+)] and Gram-negative [Gram(−)] bacteria, two major groups of bacteria that differ in structure and macromolecular composition. Results Transcriptional profiling of D. discoideum revealed sets of genes whose expression is enriched in amoebae interacting with different species of bacteria, including sets that appear specific to amoebae interacting with Gram(+), or with Gram(−) bacteria. In a genetic screen utilizing the growth of mutant amoebae on a variety of bacteria as a phenotypic readout, we identified amoebal genes that are only required for growth on Gram(+) bacteria, including one that encodes the cell surface protein gp130, as well as several genes that are only required for growth on Gram(−) bacteria including one that encodes a putative lysozyme, AlyL. These genes are required for parts of the transcriptional response of wild-type amoebae, and this allowed their classification into potential response pathways. Conclusions We have defined genes that are critical for amoebal survival during feeding on Gram(+), or Gram(−), bacteria which we propose form part of a regulatory network that allows D. discoideum to elicit specific cellular responses to different species of bacteria in order to optimize survival. PMID:23664307
Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach
Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; Segal, Eran; Stolovitzky, Gustavo; Siwo, Geoffrey; Rider, Andrew K.; Tan, Asako; Pinapati, Richard S.; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael T.; Tung, Yi-An; Chen, Yong-Syuan; Chen, Mei-Ju May; Chen, Chien-Yu; Knight, Jason M.; Sahraeian, Sayed Mohammad Ebrahim; Esfahani, Mohammad Shahrokh; Dreos, Rene; Bucher, Philipp; Maier, Ezekiel; Saeys, Yvan; Szczurek, Ewa; Myšičková, Alena; Vingron, Martin; Klein, Holger; Kiełbasa, Szymon M.; Knisley, Jeff; Bonnell, Jeff; Knisley, Debra; Kursa, Miron B.; Rudnicki, Witold R.; Bhattacharjee, Madhuchhanda; Sillanpää, Mikko J.; Yeung, James; Meysman, Pieter; Rodríguez, Aminael Sánchez; Engelen, Kristof; Marchal, Kathleen; Huang, Yezhou; Mordelet, Fantine; Hartemink, Alexander; Pinello, Luca; Yuan, Guo-Cheng
2013-01-01
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites. PMID:23950146
ISL1 and BRN3B co-regulate the differentiation of murine retinal ganglion cells
Pan, Ling; Deng, Min; Xie, Xiaoling; Gan, Lin
2009-01-01
SUMMARY LIM-homeodomain (HD) and POU-HD transcription factors play critical roles in neurogenesis. However, it remains largely unknown how they cooperate in this process and what downstream target genes they regulate. Here we show that ISL1, a LIM-HD protein, is co-expressed with BRN3B, a POU-HD factor, in nascent, post-mitotic retinal ganglion cells (RGCs). Similar to the Brn3b-null retinas, retina-specific deletion of Isl1 results in the apoptosis of a majority of RGCs and in RGC axon guidance defects. The Isl1 and Brn3b double null mice display more severe retinal abnormalities with a near complete loss of RGCs, indicating the synergistic functions of these two factors. Furthermore, we show that both Isl1 and Brn3b function downstream of Math5 to regulate the expression of a common set of RGC-specific genes. Whole retina chromatin immunoprecipitation and in vitro transactivation assays reveal that ISL1 and BRN3B concurrently bind to and synergistically regulate the expression of a common set of RGC-specific genes. Thus, our results uncover a novel regulatory mechanism of BRN3B and ISL1 in RGC differentiation. PMID:18434421
Logical analysis of diffuse large B-cell lymphomas.
Alexe, G; Alexe, S; Axelrod, D E; Hammer, P L; Weissmann, D
2005-07-01
The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al., which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.
Nodavirus infections in Israeli mariculture.
Ucko, M; Colorni, A; Diamant, A
2004-08-01
Viral encephalopathy and retinopathy (VER) infections were diagnosed in five fish species: Epinephelus aeneus, Dicentrarchus labrax, Sciaenops ocellatus, Lates calcarifer and Mugil cephalus cultured on both the Red Sea and Mediterranean coasts of Israel during 1998-2002. Spongiform vacuolation of nervous tissue was observed in histological sections of all examined species. With transmission electron microscopy, paracrystalline arrays and pieces of membrane-associated non-enveloped virions measuring approximately 30 nm in diameter were observed in the brain and retina of all species. At the molecular level, the nodavirus was detected by using a primer set that amplified the T4 region of the coat protein gene. When the same set of primers was used to search for VER in an additional fish species, Sparus aurata, it was found to produce non-specific amplicons, giving rise to false-positive results. This problem was overcome by using a different primer set (F1/VR3), designed on a highly conserved region of the virus gene, which amplified a fragment of 254 bp, and confirmed that S. aurata was nodavirus-free. This set was validated on all five species of infected fish, as well as clinically healthy fish. Comparison of the coat protein genes from the Israeli isolated sequences indicated that more than one viral strain was involved. No strict host-specificity was evident. Red Sea and Mediterranean isolated sequences grouped in distinct clusters, together with several foreign isolates from the Mediterranean area and the Far East, as phylogenetically close to the Epinephelus akaara RGNNV type.
2011-01-01
Background Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input. Results TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified. Conclusions TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at http://apollo11.isto.unibo.it/software/, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes. PMID:21333005
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Sulaiman, Irshad M.; Tang, Kevin; Osborne, John; Sammons, Scott; Wohlhueter, Robert M.
2007-01-01
We developed a set of seven resequencing GeneChips, based on the complete genome sequences of 24 strains of smallpox virus (variola virus), for rapid characterization of this human-pathogenic virus. Each GeneChip was designed to analyze a divergent segment of approximately 30,000 bases of the smallpox virus genome. This study includes the hybridization results of 14 smallpox virus strains. Of the 14 smallpox virus strains hybridized, only 7 had sequence information included in the design of the smallpox virus resequencing GeneChips; similar information for the remaining strains was not tiled as a reference in these GeneChips. By use of variola virus-specific primers and long-range PCR, 22 overlapping amplicons were amplified to cover nearly the complete genome and hybridized with the smallpox virus resequencing GeneChip set. These GeneChips were successful in generating nucleotide sequences for all 14 of the smallpox virus strains hybridized. Analysis of the data indicated that the GeneChip resequencing by hybridization was fast and reproducible and that the smallpox virus resequencing GeneChips could differentiate the 14 smallpox virus strains characterized. This study also suggests that high-density resequencing GeneChips have potential biodefense applications and may be used as an alternate tool for rapid identification of smallpox virus in the future. PMID:17182757
An intersectional gene regulatory strategy defines subclass diversity of C. elegans motor neurons.
Kratsios, Paschalis; Kerk, Sze Yen; Catela, Catarina; Liang, Joseph; Vidal, Berta; Bayer, Emily A; Feng, Weidong; De La Cruz, Estanisla Daniel; Croci, Laura; Consalez, G Giacomo; Mizumoto, Kota; Hobert, Oliver
2017-07-05
A core principle of nervous system organization is the diversification of neuron classes into subclasses that share large sets of features but differ in select traits. We describe here a molecular mechanism necessary for motor neurons to acquire subclass-specific traits in the nematode Caenorhabditis elegans . Cholinergic motor neuron classes of the ventral nerve cord can be subdivided into subclasses along the anterior-posterior (A-P) axis based on synaptic connectivity patterns and molecular features. The conserved COE-type terminal selector UNC-3 not only controls the expression of traits shared by all members of a neuron class, but is also required for subclass-specific traits expressed along the A-P axis. UNC-3, which is not regionally restricted, requires region-specific cofactors in the form of Hox proteins to co-activate subclass-specific effector genes in post-mitotic motor neurons. This intersectional gene regulatory principle for neuronal subclass diversification may be conserved from nematodes to mice.
Development of mRNA-specific RT-PCR for the detection of koi herpesvirus (KHV) replication stage.
Yuasa, Kei; Kurita, Jun; Kawana, Morihiko; Kiryu, Ikunari; Oseko, Norihisa; Sano, Motohiko
2012-08-13
An mRNA-specific reverse transcription (RT)-PCR primer set spanning the exon junction of a spliced putative terminase gene in the koi herpesvirus (KHV) was developed to detect the replicating stage of the virus. The proposed RT-PCR amplified a target gene from the RNA template, but not from a DNA template extracted from common carp brain (CCB) cells infected with KHV. In addition, the RT-PCR did not amplify the target gene of templates extracted from specific cell lines infected with either CyHV-1 or CyHV-2. RT-PCR detected mRNA from the scales of koi experimentally infected with KHV at 24 h post exposure (hpe). However, unlike conventional PCR, RT-PCR could not detect KHV DNA in fish at 0 hpe. The results indicate that the RT-PCR developed in this study is mRNA-specific and that the assay can detect the replicating stage of KHV from both fish and cultured cells infected with the virus.
Chandrasekaran, Sriram; Ament, Seth A.; Eddy, James A.; Rodriguez-Zas, Sandra L.; Schatz, Bruce R.; Price, Nathan D.; Robinson, Gene E.
2011-01-01
Using brain transcriptomic profiles from 853 individual honey bees exhibiting 48 distinct behavioral phenotypes in naturalistic contexts, we report that behavior-specific neurogenomic states can be inferred from the coordinated action of transcription factors (TFs) and their predicted target genes. Unsupervised hierarchical clustering of these transcriptomic profiles showed three clusters that correspond to three ecologically important behavioral categories: aggression, maturation, and foraging. To explore the genetic influences potentially regulating these behavior-specific neurogenomic states, we reconstructed a brain transcriptional regulatory network (TRN) model. This brain TRN quantitatively predicts with high accuracy gene expression changes of more than 2,000 genes involved in behavior, even for behavioral phenotypes on which it was not trained, suggesting that there is a core set of TFs that regulates behavior-specific gene expression in the bee brain, and other TFs more specific to particular categories. TFs playing key roles in the TRN include well-known regulators of neural and behavioral plasticity, e.g., Creb, as well as TFs better known in other biological contexts, e.g., NF-κB (immunity). Our results reveal three insights concerning the relationship between genes and behavior. First, distinct behaviors are subserved by distinct neurogenomic states in the brain. Second, the neurogenomic states underlying different behaviors rely upon both shared and distinct transcriptional modules. Third, despite the complexity of the brain, simple linear relationships between TFs and their putative target genes are a surprisingly prominent feature of the networks underlying behavior. PMID:21960440
Ohno, Satoshi; Yoshikawa, Katsunori; Shimizu, Hiroshi; Tamura, Tomohiro
2014-01-01
We describe here the construction of a series of 71 vectors to silence central carbon metabolism genes in Escherichia coli. The vectors inducibly express antisense RNAs called paired-terminus antisense RNAs, which have a higher silencing efficacy than ordinary antisense RNAs. By measuring mRNA amounts, measuring activities of target proteins, or observing specific phenotypes, it was confirmed that all the vectors were able to silence the expression of target genes efficiently. Using this vector set, each of the central carbon metabolism genes was silenced individually, and the accumulation of metabolites was investigated. We were able to obtain accurate information on ways to increase the production of pyruvate, an industrially valuable compound, from the silencing results. Furthermore, the experimental results of pyruvate accumulation were compared to in silico predictions, and both sets of results were consistent. Compared to the gene disruption approach, the silencing approach has an advantage in that any E. coli strain can be used and multiple gene silencing is easily possible in any combination. PMID:24212579
Priyadarshini, P; Tiwari, K; Das, A; Kumar, D; Mishra, M N; Desikan, P; Nath, G
2017-02-01
To evaluate the sensitivity and specificity of a new nested set of primers designed for the detection of Mycobacterium tuberculosis complex targeting a highly conserved heat shock protein gene (hsp65). The nested primers were designed using multiple sequence alignment assuming the nucleotide sequence of the M. tuberculosis H37Rv hsp65 genome as base. Multidrug-resistant Mycobacterium species along with other non-mycobacterial and fungal species were included to evaluate the specificity of M. tuberculosis hsp65 gene-specific primers. The sensitivity of the primers was determined using serial 10-fold dilutions, and was 100% as shown by the bands in the case of M. tuberculosis complex. None of the other non M. tuberculosis complex bacterial and fungal species yielded any band on nested polymerase chain reaction (PCR). The first round of amplification could amplify 0.3 ng of the template DNA, while nested PCR could detect 0.3 pg. The present hsp65-specific primers have been observed to be sensitive, specific and cost-effective, without requiring interpretation of biochemical tests, real-time PCR, sequencing or high-performance liquid chromatography. These primer sets do not have the drawbacks associated with those protocols that target insertion sequence 6110, 16S rDNA, rpoB, recA and MPT 64.
Xu, Yan; Chen, Yan; Li, Daliang; Liu, Qing; Xuan, Zhenyu; Li, Wen-Hong
2017-02-01
MicroRNAs are small non-coding RNAs acting as posttranscriptional repressors of gene expression. Identifying mRNA targets of a given miRNA remains an outstanding challenge in the field. We have developed a new experimental approach, TargetLink, that applied locked nucleic acid (LNA) as the affinity probe to enrich target genes of a specific microRNA in intact cells. TargetLink also consists a rigorous and systematic data analysis pipeline to identify target genes by comparing LNA-enriched sequences between experimental and control samples. Using miR-21 as a test microRNA, we identified 12 target genes of miR-21 in a human colorectal cancer cell by this approach. The majority of the identified targets interacted with miR-21 via imperfect seed pairing. Target validation confirmed that miR-21 repressed the expression of the identified targets. The cellular abundance of the identified miR-21 target transcripts varied over a wide range, with some targets expressed at a rather low level, confirming that both abundant and rare transcripts are susceptible to regulation by microRNAs, and that TargetLink is an efficient approach for identifying the target set of a specific microRNA in intact cells. C20orf111, one of the novel targets identified by TargetLink, was found to reside in the nuclear speckle and to be reliably repressed by miR-21 through the interaction at its coding sequence.
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Bao, Weier; Greenwold, Matthew J; Sawyer, Roger H
2017-11-01
Gene co-expression network analysis has been a research method widely used in systematically exploring gene function and interaction. Using the Weighted Gene Co-expression Network Analysis (WGCNA) approach to construct a gene co-expression network using data from a customized 44K microarray transcriptome of chicken epidermal embryogenesis, we have identified two distinct modules that are highly correlated with scale or feather development traits. Signaling pathways related to feather development were enriched in the traditional KEGG pathway analysis and functional terms relating specifically to embryonic epidermal development were also enriched in the Gene Ontology analysis. Significant enrichment annotations were discovered from customized enrichment tools such as Modular Single-Set Enrichment Test (MSET) and Medical Subject Headings (MeSH). Hub genes in both trait-correlated modules showed strong specific functional enrichment toward epidermal development. Also, regulatory elements, such as transcription factors and miRNAs, were targeted in the significant enrichment result. This work highlights the advantage of this methodology for functional prediction of genes not previously associated with scale- and feather trait-related modules.
Statistical algorithms improve accuracy of gene fusion detection
Hsieh, Gillian; Bierman, Rob; Szabo, Linda; Lee, Alex Gia; Freeman, Donald E.; Watson, Nathaniel; Sweet-Cordero, E. Alejandro
2017-01-01
Abstract Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors. PMID:28541529
Gene Drive for Mosquito Control: Where Did It Come from and Where Are We Headed?
Macias, Vanessa M.; Ohm, Johanna R.; Rasgon, Jason L.
2017-01-01
Mosquito-borne pathogens place an enormous burden on human health. The existing toolkit is insufficient to support ongoing vector-control efforts towards meeting disease elimination and eradication goals. The perspective that genetic approaches can potentially add a significant set of tools toward mosquito control is not new, but the recent improvements in site-specific gene editing with CRISPR/Cas9 systems have enhanced our ability to both study mosquito biology using reverse genetics and produce genetics-based tools. Cas9-mediated gene-editing is an efficient and adaptable platform for gene drive strategies, which have advantages over innundative release strategies for introgressing desirable suppression and pathogen-blocking genotypes into wild mosquito populations; until recently, an effective gene drive has been largely out of reach. Many considerations will inform the effective use of new genetic tools, including gene drives. Here we review the lengthy history of genetic advances in mosquito biology and discuss both the impact of efficient site-specific gene editing on vector biology and the resulting potential to deploy new genetic tools for the abatement of mosquito-borne disease. PMID:28869513
Høgslund, Niels; Radutoiu, Simona; Krusell, Lene; Voroshilova, Vera; Hannah, Matthew A.; Goffard, Nicolas; Sanchez, Diego H.; Lippold, Felix; Ott, Thomas; Sato, Shusei; Tabata, Satoshi; Liboriussen, Poul; Lohmann, Gitte V.; Schauser, Leif; Weiller, Georg F.; Udvardi, Michael K.; Stougaard, Jens
2009-01-01
Genetic analyses of plant symbiotic mutants has led to the identification of key genes involved in Rhizobium-legume communication as well as in development and function of nitrogen fixing root nodules. However, the impact of these genes in coordinating the transcriptional programs of nodule development has only been studied in limited and isolated studies. Here, we present an integrated genome-wide analysis of transcriptome landscapes in Lotus japonicus wild-type and symbiotic mutant plants. Encompassing five different organs, five stages of the sequentially developed determinate Lotus root nodules, and eight mutants impaired at different stages of the symbiotic interaction, our data set integrates an unprecedented combination of organ- or tissue-specific profiles with mutant transcript profiles. In total, 38 different conditions sampled under the same well-defined growth regimes were included. This comprehensive analysis unravelled new and unexpected patterns of transcriptional regulation during symbiosis and organ development. Contrary to expectations, none of the previously characterized nodulins were among the 37 genes specifically expressed in nodules. Another surprise was the extensive transcriptional response in whole root compared to the susceptible root zone where the cellular response is most pronounced. A large number of transcripts predicted to encode transcriptional regulators, receptors and proteins involved in signal transduction, as well as many genes with unknown function, were found to be regulated during nodule organogenesis and rhizobial infection. Combining wild type and mutant profiles of these transcripts demonstrates the activation of a complex genetic program that delineates symbiotic nitrogen fixation. The complete data set was organized into an indexed expression directory that is accessible from a resource database, and here we present selected examples of biological questions that can be addressed with this comprehensive and powerful gene expression data set. PMID:19662091
Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia
2012-01-01
Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521
Kamenova, Ivanka; Warfield, Linda
2014-01-01
Most RNA polymerase (Pol) II promoters lack a TATA element, yet nearly all Pol II transcription requires TATA binding protein (TBP). While the TBP-TATA interaction is critical for transcription at TATA-containing promoters, it has been unclear whether TBP sequence-specific DNA contacts are required for transcription at TATA-less genes. Transcription factor IID (TFIID), the TBP-containing coactivator that functions at most TATA-less genes, recognizes short sequence-specific promoter elements in metazoans, but analogous promoter elements have not been identified in Saccharomyces cerevisiae. We generated a set of mutations in the yeast TBP DNA binding surface and found that most support growth of yeast. Both in vivo and in vitro, many of these mutations are specifically defective for transcription of two TATA-containing genes with only minor defects in transcription of two TATA-less, TFIID-dependent genes. TBP binds several TATA-less promoters with apparent high affinity, but our results suggest that this binding is not important for transcription activity. Our results are consistent with the model that sequence-specific TBP-DNA contacts are not important at yeast TATA-less genes and suggest that other general transcription factors or coactivator subunits are responsible for recognition of TATA-less promoters. Our results also explain why yeast TBP derivatives defective for TATA binding appear defective in activated transcription. PMID:24865972
Kamenova, Ivanka; Warfield, Linda; Hahn, Steven
2014-08-01
Most RNA polymerase (Pol) II promoters lack a TATA element, yet nearly all Pol II transcription requires TATA binding protein (TBP). While the TBP-TATA interaction is critical for transcription at TATA-containing promoters, it has been unclear whether TBP sequence-specific DNA contacts are required for transcription at TATA-less genes. Transcription factor IID (TFIID), the TBP-containing coactivator that functions at most TATA-less genes, recognizes short sequence-specific promoter elements in metazoans, but analogous promoter elements have not been identified in Saccharomyces cerevisiae. We generated a set of mutations in the yeast TBP DNA binding surface and found that most support growth of yeast. Both in vivo and in vitro, many of these mutations are specifically defective for transcription of two TATA-containing genes with only minor defects in transcription of two TATA-less, TFIID-dependent genes. TBP binds several TATA-less promoters with apparent high affinity, but our results suggest that this binding is not important for transcription activity. Our results are consistent with the model that sequence-specific TBP-DNA contacts are not important at yeast TATA-less genes and suggest that other general transcription factors or coactivator subunits are responsible for recognition of TATA-less promoters. Our results also explain why yeast TBP derivatives defective for TATA binding appear defective in activated transcription. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Cao, Bihao; Huang, Zhiyin; Chen, Guoju; Lei, Jianjun
2010-04-01
This study was designed to control plant fertility by cell lethal gene Barnase expressing at specific developmental stage and in specific tissue of male organ under the control of Cre/loxP system, for heterosis breeding, producing hybrid seed of eggplant. The Barnase-coding region was flanked by loxP recognition sites for Cre-recombinase. The eggplant inbred/pure line ('E-38') was transformed with Cre gene and the inbred/pure line ('E-8') was transformed with the Barnase gene situated between loxp. The experiments were done separately, by means of Agrobacterium co-culture. Four T(0) -plants with the Barnase gene were obtained, all proved to be male-sterile and incapable of producing viable pollen. Flowers stamens were shorter, but the vegetative phenotype was similar to wild-type. Five T (0) -plants with the Cre gene developed well, blossomed out and set fruit normally. The crossing of male-sterile Barnase-plants with Cre expression transgenic eggplants resulted in site-specific excision with the male-sterile plants producing normal fruits. With the Barnase was excised, pollen fertility was fully restored in the hybrids. The phenotype of these restored plants was the same as that of the wild-type. Thus, the Barnase and Cre genes were capable of stable inheritance and expression in progenies of transgenic plants.
Wieczorek, D F; Smith, C W; Nadal-Ginard, B
1988-01-01
Tropomyosin (TM), a ubiquitous protein, is a component of the contractile apparatus of all cells. In nonmuscle cells, it is found in stress fibers, while in sarcomeric and nonsarcomeric muscle, it is a component of the thin filament. Several different TM isoforms specific for nonmuscle cells and different types of muscle cell have been described. As for other contractile proteins, it was assumed that smooth, striated, and nonmuscle isoforms were each encoded by different sets of genes. Through the use of S1 nuclease mapping, RNA blots, and 5' extension analyses, we showed that the rat alpha-TM gene, whose expression was until now considered to be restricted to muscle cells, generates many different tissue-specific isoforms. The promoter of the gene appears to be very similar to other housekeeping promoters in both its pattern of utilization, being active in most cell types, and its lack of any canonical sequence elements. The rat alpha-TM gene is split into at least 13 exons, 7 of which are alternatively spliced in a tissue-specific manner. This gene arrangement, which also includes two different 3' ends, generates a minimum of six different mRNAs each with the capacity to code for a different protein. These distinct TM isoforms are expressed specifically in nonmuscle and smooth and striated (cardiac and skeletal) muscle cells. The tissue-specific expression and developmental regulation of these isoforms is, therefore, produced by alternative mRNA processing. Moreover, structural and sequence comparisons among TM genes from different phyla suggest that alternative splicing is evolutionarily a very old event that played an important role in gene evolution and might have appeared concomitantly with or even before constitutive splicing. Images PMID:3352602
Lukianova-Hleb, Ekaterina Y.; Mutonga, Martin B. G.; Lapotko, Dmitri O.
2012-01-01
Current methods of cell processing for gene and cell therapies use several separate procedures for gene transfer and cell separation or elimination, because no current technology can offer simultaneous multi-functional processing of specific cell sub-sets in highly heterogeneous cell systems. Using the cell-specific generation of plasmonic nanobubbles of different sizes around cell-targeted gold nanoshells and nanospheres, we achieved simultaneous multifunctional cell-specific processing in a rapid single 70 ps laser pulse bulk treatment of heterogeneous cell suspension. This method supported the detection of cells, delivery of external molecular cargo to one type of cells and the concomitant destruction of another type of cells without damaging other cells in suspension, and real-time guidance of the two above cellular effects. PMID:23167546
Campos, Bruno; Fletcher, Danielle; Piña, Benjamín; Tauler, Romà; Barata, Carlos
2018-05-18
Unravelling the link between genes and environment across the life cycle is a challenging goal that requires model organisms with well-characterized life-cycles, ecological interactions in nature, tractability in the laboratory, and available genomic tools. Very few well-studied invertebrate model species meet these requirements, being the waterflea Daphnia magna one of them. Here we report a full genome transcription profiling of D. magna during its life-cycle. The study was performed using a new microarray platform designed from the complete set of gene models representing the whole transcribed genome of D. magna. Up to 93% of the existing 41,317 D. magna gene models showed differential transcription patterns across the developmental stages of D. magna, 59% of which were functionally annotated. Embryos showed the highest number of unique transcribed genes, mainly related to DNA, RNA, and ribosome biogenesis, likely related to cellular proliferation and morphogenesis of the several body organs. Adult females showed an enrichment of transcripts for genes involved in reproductive processes. These female-specific transcripts were essentially absent in males, whose transcriptome was enriched in specific genes of male sexual differentiation genes, like doublesex. Our results define major characteristics of transcriptional programs involved in the life-cycle, differentiate males and females, and show that large scale gene-transcription data collected in whole animals can be used to identify genes involved in specific biological and biochemical processes.
Co-Option and De Novo Gene Evolution Underlie Molluscan Shell Diversity
Aguilera, Felipe; McDougall, Carmel
2017-01-01
Abstract Molluscs fabricate shells of incredible diversity and complexity by localized secretions from the dorsal epithelium of the mantle. Although distantly related molluscs express remarkably different secreted gene products, it remains unclear if the evolution of shell structure and pattern is underpinned by the differential co-option of conserved genes or the integration of lineage-specific genes into the mantle regulatory program. To address this, we compare the mantle transcriptomes of 11 bivalves and gastropods of varying relatedness. We find that each species, including four Pinctada (pearl oyster) species that diverged within the last 20 Ma, expresses a unique mantle secretome. Lineage- or species-specific genes comprise a large proportion of each species’ mantle secretome. A majority of these secreted proteins have unique domain architectures that include repetitive, low complexity domains (RLCDs), which evolve rapidly, and have a proclivity to expand, contract and rearrange in the genome. There are also a large number of secretome genes expressed in the mantle that arose before the origin of gastropods and bivalves. Each species expresses a unique set of these more ancient genes consistent with their independent co-option into these mantle gene regulatory networks. From this analysis, we infer lineage-specific secretomes underlie shell diversity, and include both rapidly evolving RLCD-containing proteins, and the continual recruitment and loss of both ancient and recently evolved genes into the periphery of the regulatory network controlling gene expression in the mantle epithelium. PMID:28053006
Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C
2009-02-01
Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.
Nielsen, Ronni; Grøntved, Lars; Stunnenberg, Hendrik G.; Mandrup, Susanne
2006-01-01
Investigations of the molecular events involved in activation of genomic target genes by peroxisome proliferator-activated receptors (PPARs) have been hampered by the inability to establish a clean on/off state of the receptor in living cells. Here we show that the combination of adenoviral delivery and chromatin immunoprecipitation (ChIP) is ideal for dissecting these mechanisms. Adenoviral delivery of PPARs leads to a rapid and synchronous expression of the PPAR subtypes, establishment of transcriptional active complexes at genomic loci, and immediate activation of even silent target genes. We demonstrate that PPARγ2 possesses considerable ligand-dependent as well as independent transactivation potential and that agonists increase the occupancy of PPARγ2/retinoid X receptor at PPAR response elements. Intriguingly, by direct comparison of the PPARs (α, γ, and β/δ), we show that the subtypes have very different abilities to gain access to target sites and that in general the genomic occupancy correlates with the ability to activate the corresponding target gene. In addition, the specificity and potency of activation by PPAR subtypes are highly dependent on the cell type. Thus, PPAR subtype-specific activation of genomic target genes involves an intricate interplay between the properties of the subtype- and cell-type-specific settings at the individual target loci. PMID:16847324
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
Ebot, Ericka M; Gerke, Travis; Labbé, David P; Sinnott, Jennifer A; Zadra, Giorgia; Rider, Jennifer R; Tyekucheva, Svitlana; Wilson, Kathryn M; Kelly, Rachel S; Shui, Irene M; Loda, Massimo; Kantoff, Philip W; Finn, Stephen; Vander Heiden, Matthew G; Brown, Myles; Giovannucci, Edward L; Mucci, Lorelei A
2017-11-01
Obese men are at higher risk of advanced prostate cancer and cancer-specific mortality; however, the biology underlying this association remains unclear. This study examined gene expression profiles of prostate tissue to identify biological processes differentially expressed by obesity status and lethal prostate cancer. Gene expression profiling was performed on tumor (n = 402) and adjacent normal (n = 200) prostate tissue from participants in 2 prospective cohorts who had been diagnosed with prostate cancer from 1982 to 2005. Body mass index (BMI) was calculated from the questionnaire immediately preceding cancer diagnosis. Men were followed for metastases or prostate cancer-specific death (lethal disease) through 2011. Gene Ontology biological processes differentially expressed by BMI were identified using gene set enrichment analysis. Pathway scores were computed by averaging the signal intensities of member genes. Odds ratios (ORs) for lethal prostate cancer were estimated with logistic regression. Among 402 men, 48% were healthy weight, 31% were overweight, and 21% were very overweight/obese. Fifteen gene sets were enriched in tumor tissue, but not normal tissue, of very overweight/obese men versus healthy-weight men; 5 of these were related to chromatin modification and remodeling (false-discovery rate < 0.25). Patients with high tumor expression of chromatin-related genes had worse clinical characteristics (Gleason grade > 7, 41% vs 17%; P = 2 × 10 -4 ) and an increased risk of lethal disease that was independent of grade and stage (OR, 5.26; 95% confidence interval, 2.37-12.25). This study improves our understanding of the biology of aggressive prostate cancer and identifies a potential mechanistic link between obesity and prostate cancer death that warrants further study. Cancer 2017;123:4130-4138. © 2017 American Cancer Society. © 2017 American Cancer Society.
Johnson, Michelle D; Dopierala, Justyna
2018-01-01
ABSTRACT DNA methylation is an important regulator of gene function. Fetal sex is associated with the risk of several specific pregnancy complications related to placental function. However, the association between fetal sex and placental DNA methylation remains poorly understood. We carried out whole-genome oxidative bisulfite sequencing in the placentas of two healthy female and two healthy male pregnancies generating an average genome depth of coverage of 25x. Most highly ranked differentially methylated regions (DMRs) were located on the X chromosome but we identified a 225 kb sex-specific DMR in the body of the CUB and Sushi Multiple Domains 1 (CSMD1) gene on chromosome 8. The sex-specific differential methylation pattern observed in this region was validated in additional placentas using in-solution target capture. In a new RNA-seq data set from 64 female and 67 male placentas, CSMD1 mRNA was 1.8-fold higher in male than in female placentas (P value = 8.5 × 10−7, Mann-Whitney test). Exon-level quantification of CSMD1 mRNA from these 131 placentas suggested a likely placenta-specific CSMD1 isoform not detected in the 21 somatic tissues analyzed. We show that the gene body of an autosomal gene, CSMD1, is differentially methylated in a sex- and placental-specific manner, displaying sex-specific differences in placental transcript abundance. PMID:29376485
Walker, Amy K; Shi, Yang; Blackwell, T Keith
2004-04-09
The general transcription factor TFIID sets the mRNA start site and consists of TATA-binding protein and associated factors (TAF(II)s), some of which are also present in SPT-ADA-GCN5 (SAGA)-related complexes. In yeast, results of multiple studies indicate that TFIID-specific TAF(II)s are not required for the transcription of most genes, implying that intact TFIID may have a surprisingly specialized role in transcription. Relatively little is known about how TAF(II)s contribute to metazoan transcription in vivo, especially at developmental and tissue-specific genes. Previously, we investigated functions of four shared TFIID/SAGA TAF(II)s in Caenorhabditis elegans. Whereas TAF-4 was required for essentially all embryonic transcription, TAF-5, TAF-9, and TAF-10 were dispensable at multiple developmental and other metazoan-specific promoters. Here we show evidence that in C. elegans embryos transcription of most genes requires TFIID-specific TAF-1. TAF-1 is not as universally required as TAF-4, but it is essential for a greater proportion of transcription than TAF-5, -9, or -10 and is important for transcription of many developmental and other metazoan-specific genes. TAF-2, which binds core promoters with TAF-1, appears to be required for a similarly substantial proportion of transcription. C. elegans TAF-1 overlaps functionally with the coactivator p300/CBP (CBP-1), and at some genes it is required along with the TBP-like protein TLF(TRF2). We conclude that during C. elegans embryogenesis TAF-1 and TFIID have broad roles in transcription and development and that TFIID and TLF may act together at certain promoters. Our findings imply that in metazoans TFIID may be of widespread importance for transcription and for expression of tissue-specific genes.
Novel gene sets improve set-level classification of prokaryotic gene expression data.
Holec, Matěj; Kuželka, Ondřej; Železný, Filip
2015-10-28
Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
Yoo, Eung Jae; Cajiao, Isabela; Kim, Jeong-Seon; Kimura, Atsushi P.; Zhang, Aiwen; Cooke, Nancy E.; Liebhaber, Stephen A.
2006-01-01
Random assortment within mammalian genomes juxtaposes genes with distinct expression profiles. This organization, along with the prevalence of long-range regulatory controls, generates a potential for aberrant transcriptional interactions. The human CD79b/GH locus contains six tightly linked genes with three mutually exclusive tissue specificities and interdigitated control elements. One consequence of this compact organization is that the pituitarycell-specific transcriptional events that activate hGH-N also trigger ectopic activation of CD79b. However, the B-cell-specific events that activate CD79b do not trigger reciprocal activation of hGH-N. Here we utilized DNase I hypersensitive site mapping, chromatin immunoprecipitation, and transgenic models to explore the basis for this asymmetric relationship. The results reveal tissue-specific patterns of chromatin structures and transcriptional controls at the CD79b/GH locus in B cells distinct from those in the pituitary gland and placenta. These three unique transcriptional environments suggest a set of corresponding gene expression pathways and transcriptional interactions that are likely to be found juxtaposed at multiple sites within the eukaryotic genome. PMID:16847312
Pena, S D; Barreto, G; Vago, A R; De Marco, L; Reinach, F C; Dias Neto, E; Simpson, A J
1994-01-01
Low-stringency single specific primer PCR (LSSP-PCR) is an extremely simple PCR-based technique that detects single or multiple mutations in gene-sized DNA fragments. A purified DNA fragment is subjected to PCR using high concentrations of a single specific oligonucleotide primer, large amounts of Taq polymerase, and a very low annealing temperature. Under these conditions the primer hybridizes specifically to its complementary region and nonspecifically to multiple sites within the fragment, in a sequence-dependent manner, producing a heterogeneous set of reaction products resolvable by electrophoresis. The complex banding pattern obtained is significantly altered by even a single-base change and thus constitutes a unique "gene signature." Therefore LSSP-PCR will have almost unlimited application in all fields of genetics and molecular medicine where rapid and sensitive detection of mutations and sequence variations is important. The usefulness of LSSP-PCR is illustrated by applications in the study of mutants of smooth muscle myosin light chain, analysis of a family with X-linked nephrogenic diabetes insipidus, and identity testing using human mitochondrial DNA. Images PMID:8127912
Joint mapping of genes and conditions via multidimensional unfolding analysis
Van Deun, Katrijn; Marchal, Kathleen; Heiser, Willem J; Engelen, Kristof; Van Mechelen, Iven
2007-01-01
Background Microarray compendia profile the expression of genes in a number of experimental conditions. Such data compendia are useful not only to group genes and conditions based on their similarity in overall expression over profiles but also to gain information on more subtle relations between genes and conditions. Getting a clear visual overview of all these patterns in a single easy-to-grasp representation is a useful preliminary analysis step: We propose to use for this purpose an advanced exploratory method, called multidimensional unfolding. Results We present a novel algorithm for multidimensional unfolding that overcomes both general problems and problems that are specific for the analysis of gene expression data sets. Applying the algorithm to two publicly available microarray compendia illustrates its power as a tool for exploratory data analysis: The unfolding analysis of a first data set resulted in a two-dimensional representation which clearly reveals temporal regulation patterns for the genes and a meaningful structure for the time points, while the analysis of a second data set showed the algorithm's ability to go beyond a mere identification of those genes that discriminate between different patient or tissue types. Conclusion Multidimensional unfolding offers a useful tool for preliminary explorations of microarray data: By relying on an easy-to-grasp low-dimensional geometric framework, relations among genes, among conditions and between genes and conditions are simultaneously represented in an accessible way which may reveal interesting patterns in the data. An additional advantage of the method is that it can be applied to the raw data without necessitating the choice of suitable genewise transformations of the data. PMID:17550582
Sadee, Wolfgang
2013-09-01
Pharmacogenetic biomarker tests include mostly specific single gene-drug pairs, capable of accounting for a portion of interindividual variability in drug response and toxicity. However, multiple genes are likely to contribute, either acting independently or epistatically, with the CYP2C9-VKORC1-warfarin test panel, an example of a clinically used gene-gene-dug interaction. I discuss here further instances of gene-gene-drug interactions, including a proposed dynamic effect on statin therapy by genetic variants in both a transporter (SLCO1B1) and a metabolizing enzyme (CYP3A4) in liver cells, the main target site where statins block cholesterol synthesis. These examples set a conceptual framework for developing diagnostic panels involving multiple gene-drug combinations. Copyright © 2013 Wiley Periodicals, Inc.
Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie
2011-09-12
Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
2011-01-01
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
Getzenberg, R H; Coffey, D S
1990-09-01
The DNA of interphase nuclei have very specific three-dimensional organizations that are different in different cell types, and it is possible that this varying DNA organization is responsible for the tissue specificity of gene expression. The nuclear matrix organizes the three-dimensional structure of the DNA and is believed to be involved in the control of gene expression. This study compares the nuclear structural proteins between two sex accessory tissues in the same animal responding to the same androgen stimulation by the differential expression of major tissue-specific secretory proteins. We demonstrate here that the nuclear matrix is tissue specific in the rat ventral prostate and seminal vesicle, and undergoes characteristic alterations in its protein composition upon androgen withdrawal. Three types of nuclear matrix proteins were observed: 1) nuclear matrix proteins that are different and tissue specific in the rat ventral prostate and seminal vesicle, 2) a set of nuclear matrix proteins that either appear or disappear upon androgen withdrawal, and 3) a set of proteins that are common to both the ventral prostate and seminal vesicle and do not change with the hormonal state of the animal. Since the nuclear matrix is known to bind androgen receptors in a tissue- and steroid-specific manner, we propose that the tissue specificity of the nuclear matrix arranges the DNA in a unique conformation, which may be involved in the specific interaction of transcription factors with DNA sequences, resulting in tissue-specific patterns of secretory protein expression.
Microarray-based cancer prediction using soft computing approach.
Wang, Xiaosheng; Gotoh, Osamu
2009-05-26
One of the difficulties in using gene expression profiles to predict cancer is how to effectively select a few informative genes to construct accurate prediction models from thousands or ten thousands of genes. We screen highly discriminative genes and gene pairs to create simple prediction models involved in single genes or gene pairs on the basis of soft computing approach and rough set theory. Accurate cancerous prediction is obtained when we apply the simple prediction models for four cancerous gene expression datasets: CNS tumor, colon tumor, lung cancer and DLBCL. Some genes closely correlated with the pathogenesis of specific or general cancers are identified. In contrast with other models, our models are simple, effective and robust. Meanwhile, our models are interpretable for they are based on decision rules. Our results demonstrate that very simple models may perform well on cancerous molecular prediction and important gene markers of cancer can be detected if the gene selection approach is chosen reasonably.
Lee, Wan Sin; Gudimella, Ranganath; Wong, Gwo Rong; Tammi, Martti Tapani; Khalid, Norzulaani; Harikrishna, Jennifer Ann
2015-01-01
Physiological responses to stress are controlled by expression of a large number of genes, many of which are regulated by microRNAs. Since most banana cultivars are salt-sensitive, improved understanding of genetic regulation of salt induced stress responses in banana can support future crop management and improvement in the face of increasing soil salinity related to irrigation and climate change. In this study we focused on determining miRNA and their targets that respond to NaCl exposure and used transcriptome sequencing of RNA and small RNA from control and NaCl-treated banana roots to assemble a cultivar-specific reference transcriptome and identify orthologous and Musa-specific miRNA responding to salinity. We observed that, banana roots responded to salinity stress with changes in expression for a large number of genes (9.5% of 31,390 expressed unigenes) and reduction in levels of many miRNA, including several novel miRNA and banana-specific miRNA-target pairs. Banana roots expressed a unique set of orthologous and Musa-specific miRNAs of which 59 respond to salt stress in a dose-dependent manner. Gene expression patterns of miRNA compared with those of their predicted mRNA targets indicated that a majority of the differentially expressed miRNAs were down-regulated in response to increased salinity, allowing increased expression of targets involved in diverse biological processes including stress signaling, stress defence, transport, cellular homeostasis, metabolism and other stress-related functions. This study may contribute to the understanding of gene regulation and abiotic stress response of roots and the high-throughput sequencing data sets generated may serve as important resources related to salt tolerance traits for functional genomic studies and genetic improvement in banana. PMID:25993649
Lee, Wan Sin; Gudimella, Ranganath; Wong, Gwo Rong; Tammi, Martti Tapani; Khalid, Norzulaani; Harikrishna, Jennifer Ann
2015-01-01
Physiological responses to stress are controlled by expression of a large number of genes, many of which are regulated by microRNAs. Since most banana cultivars are salt-sensitive, improved understanding of genetic regulation of salt induced stress responses in banana can support future crop management and improvement in the face of increasing soil salinity related to irrigation and climate change. In this study we focused on determining miRNA and their targets that respond to NaCl exposure and used transcriptome sequencing of RNA and small RNA from control and NaCl-treated banana roots to assemble a cultivar-specific reference transcriptome and identify orthologous and Musa-specific miRNA responding to salinity. We observed that, banana roots responded to salinity stress with changes in expression for a large number of genes (9.5% of 31,390 expressed unigenes) and reduction in levels of many miRNA, including several novel miRNA and banana-specific miRNA-target pairs. Banana roots expressed a unique set of orthologous and Musa-specific miRNAs of which 59 respond to salt stress in a dose-dependent manner. Gene expression patterns of miRNA compared with those of their predicted mRNA targets indicated that a majority of the differentially expressed miRNAs were down-regulated in response to increased salinity, allowing increased expression of targets involved in diverse biological processes including stress signaling, stress defence, transport, cellular homeostasis, metabolism and other stress-related functions. This study may contribute to the understanding of gene regulation and abiotic stress response of roots and the high-throughput sequencing data sets generated may serve as important resources related to salt tolerance traits for functional genomic studies and genetic improvement in banana.
Bhindi, Ravinay; Fahmy, Roger G.; Lowe, Harry C.; Chesterman, Colin N.; Dass, Crispin R.; Cairns, Murray J.; Saravolac, Edward G.; Sun, Lun-Quan; Khachigian, Levon M.
2007-01-01
The past decade has seen the rapid evolution of small-molecule gene-silencing strategies, driven largely by enhanced understanding of gene function in the pathogenesis of disease. Over this time, many genes have been targeted by specifically engineered agents from different classes of nucleic acid-based drugs in experimental models of disease to probe, dissect, and characterize further the complex processes that underpin molecular signaling. Arising from this, a number of molecules have been examined in the setting of clinical trials, and several have recently made the successful transition from the bench to the clinic, heralding an exciting era of gene-specific treatments. This is particularly important because clear inadequacies in present therapies account for significant morbidity, mortality, and cost. The broad umbrella of gene-silencing therapeutics encompasses a range of agents that include DNA enzymes, short interfering RNA, antisense oligonucleotides, decoys, ribozymes, and aptamers. This review tracks current movements in these technologies, focusing mainly on DNA enzymes and short interfering RNA, because these are poised to play an integral role in antigene therapies in the future. PMID:17717148
Disease modeling in genetic kidney diseases: zebrafish.
Schenk, Heiko; Müller-Deile, Janina; Kinast, Mark; Schiffer, Mario
2017-07-01
Growing numbers of translational genomics studies are based on the highly efficient and versatile zebrafish (Danio rerio) vertebrate model. The increasing types of zebrafish models have improved our understanding of inherited kidney diseases, since they not only display pathophysiological changes but also give us the opportunity to develop and test novel treatment options in a high-throughput manner. New paradigms in inherited kidney diseases have been developed on the basis of the distinct genome conservation of approximately 70 % between zebrafish and humans in terms of existing gene orthologs. Several options are available to determine the functional role of a specific gene or gene sets. Permanent genome editing can be induced via complete gene knockout by using the CRISPR/Cas-system, among others, or via transient modification by using various morpholino techniques. Cross-species rescues succeeding knockdown techniques are employed to determine the functional significance of a target gene or a specific mutation. This article summarizes the current techniques and discusses their perspectives.
Tamplin, Owen J; Cox, Brian J; Rossant, Janet
2011-12-15
The node and notochord are key tissues required for patterning of the vertebrate body plan. Understanding the gene regulatory network that drives their formation and function is therefore important. Foxa2 is a key transcription factor at the top of this genetic hierarchy and finding its targets will help us to better understand node and notochord development. We performed an extensive microarray-based gene expression screen using sorted embryonic notochord cells to identify early notochord-enriched genes. We validated their specificity to the node and notochord by whole mount in situ hybridization. This provides the largest available resource of notochord-expressed genes, and therefore candidate Foxa2 target genes in the notochord. Using existing Foxa2 ChIP-seq data from adult liver, we were able to identify a set of genes expressed in the notochord that had associated regions of Foxa2-bound chromatin. Given that Foxa2 is a pioneer transcription factor, we reasoned that these sites might represent notochord-specific enhancers. Candidate Foxa2-bound regions were tested for notochord specific enhancer function in a zebrafish reporter assay and 7 novel notochord enhancers were identified. Importantly, sequence conservation or predictive models could not have readily identified these regions. Mutation of putative Foxa2 binding elements in two of these novel enhancers abrogated reporter expression and confirmed their Foxa2 dependence. The combination of highly specific gene expression profiling and genome-wide ChIP analysis is a powerful means of understanding developmental pathways, even for small cell populations such as the notochord. Copyright © 2011 Elsevier Inc. All rights reserved.
van Oostrom, Conny T.; Jonker, Martijs J.; de Jong, Mark; Dekker, Rob J.; Rauwerda, Han; Ensink, Wim A.; de Vries, Annemieke; Breit, Timo M.
2014-01-01
In transcriptomics research, design for experimentation by carefully considering biological, technological, practical and statistical aspects is very important, because the experimental design space is essentially limitless. Usually, the ranges of variable biological parameters of the design space are based on common practices and in turn on phenotypic endpoints. However, specific sub-cellular processes might only be partially reflected by phenotypic endpoints or outside the associated parameter range. Here, we provide a generic protocol for range finding in design for transcriptomics experimentation based on small-scale gene-expression experiments to help in the search for the right location in the design space by analyzing the activity of already known genes of relevant molecular mechanisms. Two examples illustrate the applicability: in-vitro UV-C exposure of mouse embryonic fibroblasts and in-vivo UV-B exposure of mouse skin. Our pragmatic approach is based on: framing a specific biological question and associated gene-set, performing a wide-ranged experiment without replication, eliminating potentially non-relevant genes, and determining the experimental ‘sweet spot’ by gene-set enrichment plus dose-response correlation analysis. Examination of many cellular processes that are related to UV response, such as DNA repair and cell-cycle arrest, revealed that basically each cellular (sub-) process is active at its own specific spot(s) in the experimental design space. Hence, the use of range finding, based on an affordable protocol like this, enables researchers to conveniently identify the ‘sweet spot’ for their cellular process of interest in an experimental design space and might have far-reaching implications for experimental standardization. PMID:24823911
Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses
Bhat, Prajwal; Yang, Haixuan; Bögre, László; Devoto, Alessandra; Paccanaro, Alberto
2012-01-01
The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/]. PMID:22879875
DLGP: A database for lineage-conserved and lineage-specific gene pairs in animal and plant genomes.
Wang, Dapeng
2016-01-15
The conservation of gene organization in the genome with lineage-specificity is an invaluable resource to decipher their potential functionality with diverse selective constraints, especially in higher animals and plants. Gene pairs appear to be the minimal structure for such kind of gene clusters that tend to reside in their preferred locations, representing the distinctive genomic characteristics in single species or a given lineage. Despite gene families having been investigated in a widespread manner, the definition of gene pair families in various taxa still lacks adequate attention. To address this issue, we report DLGP (http://lcgbase.big.ac.cn/DLGP/) that stores the pre-calculated lineage-based gene pairs in currently available 134 animal and plant genomes and inspect them under the same analytical framework, bringing out a set of innovational features. First, the taxonomy or lineage has been classified into four levels such as Kingdom, Phylum, Class and Order. It adopts all-to-all comparison strategy to identify the possible conserved gene pairs in all species for each gene pair in certain species and reckon those that are conserved in over a significant proportion of species in a given lineage (e.g. Primates, Diptera or Poales) as the lineage-conserved gene pairs. Furthermore, it predicts the lineage-specific gene pairs by retaining the above-mentioned lineage-conserved gene pairs that are not conserved in any other lineages. Second, it carries out pairwise comparison for the gene pairs between two compared species and creates the table including all the conserved gene pairs and the image elucidating the conservation degree of gene pairs in chromosomal level. Third, it supplies gene order browser to extend gene pairs to gene clusters, allowing users to view the evolution dynamics in the gene context in an intuitive manner. This database will be able to facilitate the particular comparison between animals and plants, between vertebrates and arthropods, and between monocots and eudicots, accounting for the significant contribution of gene pairs to speciation and diversification in specific lineages. Copyright © 2015 Elsevier Inc. All rights reserved.
Using RNA-seq data to select reference genes for normalizing gene expression in apple roots.
Zhou, Zhe; Cong, Peihua; Tian, Yi; Zhu, Yanmin
2017-01-01
Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization.
Using RNA-seq data to select reference genes for normalizing gene expression in apple roots
Zhou, Zhe; Cong, Peihua; Tian, Yi
2017-01-01
Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization. PMID:28934340
DOE Office of Scientific and Technical Information (OSTI.GOV)
Varner, J.E.
1993-06-01
Since xylem tissue includes the main cell types which are lignified, we are interested in gene expression of glycine-rich proteins and proline-rich proteins, and other proteins which are involved in secondary cell wall thickening during xylogenesis. Since the main feature of xylogenesis is the deposition of additional wall components, study of the mechanism of xylogenesis will greatly advance our knowledge of the synthesis and assembly of wall macromolecules. We are using the in vitro xylogenesis system from isolated Zinnia mesophyll cells to isolate genes which are specifically expressed during xylogenesis. We have used subtractive hybridization methods to isolate a numbermore » of cDNA clones for differentially regulated genes from the cells after hormonal induction. So far, we have partially characterized 18 different cDNA clones from 239 positive clones. These differentially regulated genes can be divided into three sets according to the characteristics of gene expression in the induction medium and the control medium. The first set is induced in both the induction medium and the control medium without hormones. The second set is induced mainly in the induction medium and in the control medium with the addition of NAA alone. Two of thesegenes are exclusively induced by auxin. The third set of genes is induced mainly in the induction medium. Since these genes are not induced by either auxin or cytokinin alone, they may be directly involved in the process of xylogenesis. Our experiments on the localization of H{sub 2}O{sub 2} production reinforce the earlier ideas of others that H{sub 2}O{sub 2} is involved in normal lignification.« less
[Hydroxyproline: Rich glycoproteins of the plant and cell wall
DOE Office of Scientific and Technical Information (OSTI.GOV)
Varner, J.E.
1993-01-01
Since xylem tissue includes the main cell types which are lignified, we are interested in gene expression of glycine-rich proteins and proline-rich proteins, and other proteins which are involved in secondary cell wall thickening during xylogenesis. Since the main feature of xylogenesis is the deposition of additional wall components, study of the mechanism of xylogenesis will greatly advance our knowledge of the synthesis and assembly of wall macromolecules. We are using the in vitro xylogenesis system from isolated Zinnia mesophyll cells to isolate genes which are specifically expressed during xylogenesis. We have used subtractive hybridization methods to isolate a numbermore » of cDNA clones for differentially regulated genes from the cells after hormonal induction. So far, we have partially characterized 18 different cDNA clones from 239 positive clones. These differentially regulated genes can be divided into three sets according to the characteristics of gene expression in the induction medium and the control medium. The first set is induced in both the induction medium and the control medium without hormones. The second set is induced mainly in the induction medium and in the control medium with the addition of NAA alone. Two of thesegenes are exclusively induced by auxin. The third set of genes is induced mainly in the induction medium. Since these genes are not induced by either auxin or cytokinin alone, they may be directly involved in the process of xylogenesis. Our experiments on the localization of H[sub 2]O[sub 2] production reinforce the earlier ideas of others that H[sub 2]O[sub 2] is involved in normal lignification.« less
Galkiewicz, Julia P; Kellogg, Christina A
2008-12-01
PCR amplification of pure bacterial DNA is vital to the study of bacterial interactions with corals. Commonly used Bacteria-specific primers 8F and 27F paired with the universal primer 1492R amplify both eukaryotic and prokaryotic rRNA genes. An alternative primer set, 63F/1542R, is suggested to resolve this problem.
Galkiewicz, Julia P.; Kellogg, Christina A.
2008-01-01
PCR amplification of pure bacterial DNA is vital to the study of bacterial interactions with corals. Commonly used Bacteria-specific primers 8F and 27F paired with the universal primer 1492R amplify both eukaryotic and prokaryotic rRNA genes. An alternative primer set, 63F/1542R, is suggested to resolve this problem. PMID:18931299
MultiSite Gateway-Compatible Cell Type-Specific Gene-Inducible System for Plants1[OPEN
Siligato, Riccardo; Wang, Xin; Yadav, Shri Ram; Lehesranta, Satu; Ma, Guojie; Ursache, Robertas; Sevilem, Iris; Zhang, Jing; Gorte, Maartje; Prasad, Kalika; Heidstra, Renze
2016-01-01
A powerful method to study gene function is expression or overexpression in an inducible, cell type-specific system followed by observation of consequent phenotypic changes and visualization of linked reporters in the target tissue. Multiple inducible gene overexpression systems have been developed for plants, but very few of these combine plant selection markers, control of expression domains, access to multiple promoters and protein fusion reporters, chemical induction, and high-throughput cloning capabilities. Here, we introduce a MultiSite Gateway-compatible inducible system for Arabidopsis (Arabidopsis thaliana) plants that provides the capability to generate such constructs in a single cloning step. The system is based on the tightly controlled, estrogen-inducible XVE system. We demonstrate that the transformants generated with this system exhibit the expected cell type-specific expression, similar to what is observed with constitutively expressed native promoters. With this new system, cloning of inducible constructs is no longer limited to a few special cases but can be used as a standard approach when gene function is studied. In addition, we present a set of entry clones consisting of histochemical and fluorescent reporter variants designed for gene and promoter expression studies. PMID:26644504
Modrzynska, Katarzyna; Pfander, Claudia; Chappell, Lia; Yu, Lu; Suarez, Catherine; Dundas, Kirsten; Gomes, Ana Rita; Goulding, David; Rayner, Julian C; Choudhary, Jyoti; Billker, Oliver
2017-01-11
A family of apicomplexa-specific proteins containing AP2 DNA-binding domains (ApiAP2s) was identified in malaria parasites. This family includes sequence-specific transcription factors that are key regulators of development. However, functions for the majority of ApiAP2 genes remain unknown. Here, a systematic knockout screen in Plasmodium berghei identified ten ApiAP2 genes that were essential for mosquito transmission: four were critical for the formation of infectious ookinetes, and three were required for sporogony. We describe non-essential functions for AP2-O and AP2-SP proteins in blood stages, and identify AP2-G2 as a repressor active in both asexual and sexual stages. Comparative transcriptomics across mutants and developmental stages revealed clusters of co-regulated genes with shared cis promoter elements, whose expression can be controlled positively or negatively by different ApiAP2 factors. We propose that stage-specific interactions between ApiAP2 proteins on partly overlapping sets of target genes generate the complex transcriptional network that controls the Plasmodium life cycle. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Meslin, Camille; Plakke, Melissa S.; Deutsch, Aaron B.; Small, Brandon S.; Morehouse, Nathan I.; Clark, Nathan L.
2015-01-01
Persistent adaptive challenges are often met with the evolution of novel physiological traits. Although there are specific examples of single genes providing new physiological functions, studies on the origin of complex organ functions are lacking. One such derived set of complex functions is found in the Lepidopteran bursa copulatrix, an organ within the female reproductive tract that digests nutrients from the male ejaculate or spermatophore. Here, we characterized bursa physiology and the evolutionary mechanisms by which it was equipped with digestive and absorptive functionality. By studying the transcriptome of the bursa and eight other tissues, we revealed a suite of highly expressed and secreted gene products providing the bursa with a combination of stomach-like traits for mechanical and enzymatic digestion of the male spermatophore. By subsequently placing these bursa genes in an evolutionary framework, we found that the vast majority of their novel digestive functions were co-opted by borrowing genes that continue to be expressed in nonreproductive tissues. However, a number of bursa-specific genes have also arisen, some of which represent unique gene families restricted to Lepidoptera and may provide novel bursa-specific functions. This pattern of promiscuous gene borrowing and relatively infrequent evolution of tissue-specific duplicates stands in contrast to studies of the evolution of novelty via single gene co-option. Our results suggest that the evolution of complex organ-level phenotypes may often be enabled (and subsequently constrained) by changes in tissue specificity that allow expression of existing genes in novel contexts, such as reproduction. The extent to which the selective pressures encountered in these novel roles require resolution via duplication and sub/neofunctionalization is likely to be determined by the need for specialized reproductive functionality. Thus, complex physiological phenotypes such as that found in the bursa offer important opportunities for understanding the relative role of pleiotropy and specialization in adaptive evolution. PMID:25725432
Sahoo, Pravas Ranjan; Sethy, Kamadev; Mohapatra, Swagat; Panda, Debasis
2016-05-01
India being a developing country mainly depends on livestock sector for its economy. However, nowadays, there is emergence and reemergence of more transboundary animal diseases. The existing diagnostic techniques are not so quick and with less specificity. To reduce the economy loss, there should be a development of rapid, reliable, robust diagnostic technique, which can work with high degree of sensitivity and specificity. Loop mediated isothermal amplification assay is a rapid gene amplification technique that amplifies nucleic acid under an isothermal condition with a set of designed primers spanning eight distinct sequences of the target. This assay can be used as an emerging powerful, innovative gene amplification diagnostic tool against various pathogens of livestock diseases. This review is to highlight the basic concept and methodology of this assay in livestock disease.
Cho, Min Seok; Lee, Jang Ha; Her, Nam Han; Kim, Changkug; Seol, Young-Joo; Hahn, Jang Ho; Baeg, Ji Hyoun; Kim, Hong Gi; Park, Dong Suk
2012-06-01
The Gram-positive bacterium Clavibacter michiganensis subsp. michiganensis is the causal agent of canker disease in tomato. Because it is very important to control newly introduced inoculum sources from commercial materials, the specific detection of this pathogen in seeds and seedlings is essential for effective disease control. In this study, a novel and efficient assay for the detection and quantitation of C. michiganensis subsp. michiganensis in symptomless tomato and red pepper seeds was developed. A pair of polymerase chain reaction (PCR) primers (Cmm141F/R) was designed to amplify a specific 141 bp fragment on the basis of a ferredoxin reductase gene of C. michiganensis subsp. michiganensis NCPPB 382. The specificity of the primer set was evaluated using purified DNA from 16 isolates of five C. michiganensis subspecies, one other Clavibacter species, and 17 other reference bacteria. The primer set amplified a single band of expected size from the genomic DNA obtained from the C. michiganensis subsp. michiganensis strains but not from the other C. michiganensis subspecies or from other Clavibacter species. The detection limit was a single cloned copy of the ferredoxin reductase gene of C. michiganensis subsp. michiganensis. In conclusion, this quantitative direct PCR assay can be applied as a practical diagnostic method for epidemiological research and the sanitary management of seeds and seedlings with a low level or latent infection of C. michiganensis subsp. michiganensis.
DiRE: identifying distant regulatory elements of co-expressed genes
Gotea, Valer; Ovcharenko, Ivan
2008-01-01
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org. PMID:18487623
Graeber, Kai; Linkies, Ada; Wood, Andrew T.A.; Leubner-Metzger, Gerhard
2011-01-01
Comparative biology includes the comparison of transcriptome and quantitative real-time RT-PCR (qRT-PCR) data sets in a range of species to detect evolutionarily conserved and divergent processes. Transcript abundance analysis of target genes by qRT-PCR requires a highly accurate and robust workflow. This includes reference genes with high expression stability (i.e., low intersample transcript abundance variation) for correct target gene normalization. Cross-species qRT-PCR for proper comparative transcript quantification requires reference genes suitable for different species. We addressed this issue using tissue-specific transcriptome data sets of germinating Lepidium sativum seeds to identify new candidate reference genes. We investigated their expression stability in germinating seeds of L. sativum and Arabidopsis thaliana by qRT-PCR, combined with in silico analysis of Arabidopsis and Brassica napus microarray data sets. This revealed that reference gene expression stability is higher for a given developmental process between distinct species than for distinct developmental processes within a given single species. The identified superior cross-species reference genes may be used for family-wide comparative qRT-PCR analysis of Brassicaceae seed germination. Furthermore, using germinating seeds, we exemplify optimization of the qRT-PCR workflow for challenging tissues regarding RNA quality, transcript stability, and tissue abundance. Our work therefore can serve as a guideline for moving beyond Arabidopsis by establishing high-quality cross-species qRT-PCR. PMID:21666000
Yamazaki, Hiroshi; Sekiguchi, Mariko; Takamatsu, Masako; Tanabe, Yasuto; Nakanishi, Shigetada
2004-10-05
Cajal-Retzius (CR) cells are early-generated transient neurons and are important in the regulation of cortical neuronal migration and cortical laminar formation. Molecular entities characterizing the CR cell identity, however, remain largely elusive. We purified mouse cortical CR cells expressing GFP to homogeneity by fluorescence-activated cell sorting and examined a genome-wide expression profile of cortical CR cells at embryonic and postnatal periods. We identified 49 genes that exceeded hybridization signals by >10-fold in CR cells compared with non-CR cells at embryonic day 13.5, postnatal day 2, or both. Among these CR cell-specific genes, 25 genes, including the CR cell marker genes such as the reelin and calretinin genes, are selectively and highly expressed in both embryonic and postnatal CR cells. These genes, which encode generic properties of CR cell specificity, are eminently characterized as modulatory composites of voltage-dependent calcium channels and sets of functionally related cellular components involved in cell migration, adhesion, and neurite extension. Five genes are highly expressed in CR cells at the early embryonic period and are rapidly down-regulated thereafter. Furthermore, some of these genes have been shown to mark two distinctly different focal regions corresponding to the CR cell origins. At the late prenatal and postnatal periods, 19 genes are selectively up-regulated in CR cells. These genes include functional molecules implicated in synaptic transmission and modulation. CR cells thus strikingly change their cellular phenotypes during cortical development and play a pivotal role in both corticogenesis and cortical circuit maturation.
Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa
2006-01-01
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.
Merelli, Ivan; Calabria, Andrea; Cozzi, Paolo; Viti, Federica; Mosca, Ettore; Milanesi, Luciano
2013-01-01
The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies.
Zhang, Yijing; Yao, Yi; Du, Weixing; Wu, Kai; Xu, Wenyue; Lin, Min; Tan, Huabing; Li, Jian
2017-07-01
In order to achieve better outcomes for treatment and in the prophylaxis of malaria, it is imperative to develop a sensitive, specific, and accurate assay for early diagnosis of Plasmodium falciparum infection, which is the major cause of malaria. In this study, we aimed to develop a loop-mediated isothermal amplification (LAMP) assay with P. falciparum unique genes for sensitive, specific, and accurate detection of P. falciparum infection. The unique genes of P. falciparum were randomly selected from PlasmoDB. The LAMP primers of the unique genes were designed using PrimerExplorer V4. LAMP assays with primers from unique genes of P. falciparum and conserved 18S rRNA gene were developed and their sensitivity was assessed. The specificity of the most sensitive LAMP assay was further examined using genomic DNA from Plasmodium vivax, Plasmodium yoelii and Toxoplasma gondii. Finally, the unique gene-based LAMP assay was validated using clinical samples of P. falciparum infection cases. A total of 31 sets of top-scored LAMP primers from nine unique genes were selected from the pools of designed primers. The LAMP assay with PF3D7_1253300-5 was the most sensitive with the detection limit 5 parasites/μl, and it displayed negative LAMP assay with the genomic DNA samples of P. vivax, P. yoelii, and T. gondii. The LAMP assay with PF3D7_0112300 (18S rRNA) was less sensitive with the detection limit 50 parasites/μl, and it displayed negative LAMP assay with the genomic DNA samples of P. yoelii and T. gondii, but displayed positive LAMP detection with P. vivax. The positive detection rate of the LAMP assay with PF3D7_1253300-5 was 90% (27/30), higher than that (80%, 24/30) of the positive rate of PF3D7_0112300 (18S rRNA) in examining clinical samples of P. falciparum infection cases. The LAMP assay with the primer set PF3D7_1253300-5 was more sensitive, specific, and accurate than those with PF3D7_0112300 (18S rRNA) in examining P. falciparum infection, and therefore it is a promising tool for diagnosis of P. falciparum infection.
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
Selection of Phototransduction Genes in Homo sapiens.
Christopher, Mark; Scheetz, Todd E; Mullins, Robert F; Abràmoff, Michael D
2013-08-13
We investigated the evidence of recent positive selection in the human phototransduction system at single nucleotide polymorphism (SNP) and gene level. SNP genotyping data from the International HapMap Project for European, Eastern Asian, and African populations was used to discover differences in haplotype length and allele frequency between these populations. Numeric selection metrics were computed for each SNP and aggregated into gene-level metrics to measure evidence of recent positive selection. The level of recent positive selection in phototransduction genes was evaluated and compared to a set of genes shown previously to be under recent selection, and a set of highly conserved genes as positive and negative controls, respectively. Six of 20 phototransduction genes evaluated had gene-level selection metrics above the 90th percentile: RGS9, GNB1, RHO, PDE6G, GNAT1, and SLC24A1. The selection signal across these genes was found to be of similar magnitude to the positive control genes and much greater than the negative control genes. There is evidence for selective pressure in the genes involved in retinal phototransduction, and traces of this selective pressure can be demonstrated using SNP-level and gene-level metrics of allelic variation. We hypothesize that the selective pressure on these genes was related to their role in low light vision and retinal adaptation to ambient light changes. Uncovering the underlying genetics of evolutionary adaptations in phototransduction not only allows greater understanding of vision and visual diseases, but also the development of patient-specific diagnostic and intervention strategies.
Fackler, Mary Jo; Bujanda, Zoila Lopez; Umbricht, Christopher; Teo, Wei Wen; Cho, Soonweng; Zhang, Zhe; Visvanathan, Kala; Jeter, Stacie; Argani, Pedram; Wang, Chenguang; Lyman, Jaclyn P.; de Brot, Marina; Ingle, James N.; Boughey, Judy; McGuire, Kandace; King, Tari A.; Carey, Lisa A.; Cope, Leslie; Wolff, Antonio C.; Sukumar, Saraswati
2015-01-01
The ability to consistently detect cell-free tumor-specific DNA in peripheral blood of patients with metastatic breast cancer provides the opportunity to detect changes in tumor burden and to monitor response to treatment. We developed cMethDNA, a quantitative multiplexed methylation-specific PCR assay for a panel of ten genes, consisting of novel and known breast cancer hypermethylated markers identified by mining our previously reported study of DNA methylation patterns in breast tissue (103 cancer, 21 normal on the Illumina HumanMethylation27 Beadchip) and then validating the 10-gene panel in a TCGA breast cancer methylome database. For cMethDNA, a fixed physiological level (50 copies) of artificially constructed, standard non-human reference DNA specific for each gene is introduced into in a constant volume of serum (300 μl) prior to purification of the DNA, facilitating a sensitive, specific, robust and quantitative assay of tumor DNA, with broad dynamic range. Cancer-specific methylated DNA was detected in Training (28 normal, 24 cancer) and Test (27 normal, 33 cancer) sets of recurrent Stage 4 patient sera with a sensitivity of 91% and a specificity of 96% in the test set. In a pilot study, cMethDNA assay faithfully reflected patient response to chemotherapy (N = 29). A core methylation signature present in the primary breast cancer was retained in serum and metastatic tissues collected at autopsy 2–11 years after diagnosis of the disease. Together, our data suggest that the cMethDNA assay can detect advanced breast cancer, and monitor tumor burden and treatment response in women with metastatic breast cancer. PMID:24737128
Rivera-Torres, Natalia; Strouse, Bryan; Bialk, Pawel; Niamat, Rohina A; Kmiec, Eric B
2014-01-01
With recent technological advances that enable DNA cleavage at specific sites in the human genome, it may now be possible to reverse inborn errors, thereby correcting a mutation, at levels that could have an impact in a clinical setting. We have been developing gene editing, using single-stranded DNA oligonucleotides (ssODNs), as a tool to direct site specific single base changes. Successful application of this technique has been demonstrated in many systems ranging from bacteria to human (ES and somatic) cells. While the frequency of gene editing can vary widely, it is often at a level that does not enable clinical application. As such, a number of stimulatory factors such as double-stranded breaks are known to elevate the frequency significantly. The majority of these results have been discovered using a validated HCT116 mammalian cell model system where credible genetic and biochemical readouts are available. Here, we couple TAL-Effector Nucleases (TALENs) that execute specific ds DNA breaks with ssODNs, designed specifically to repair a missense mutation, in an integrated single copy eGFP gene. We find that proximal cleavage, relative to the mutant base, is key for enabling high frequencies of editing. A directionality of correction is also observed with TALEN activity upstream from the target base being more effective in promoting gene editing than activity downstream. We also find that cells progressing through S phase are more amenable to combinatorial gene editing activity. Thus, we identify novel aspects of gene editing that will help in the design of more effective protocols for genome modification and gene therapy in natural genes.
Yang, Yajie; Boss, Isaac W; McIntyre, Lauren M; Renne, Rolf
2014-08-08
Kaposi's sarcoma associated herpes virus (KSHV) is associated with tumors of endothelial and lymphoid origin. During latent infection, KSHV expresses miR-K12-11, an ortholog of the human tumor gene hsa-miR-155. Both gene products are microRNAs (miRNAs), which are important post-transcriptional regulators that contribute to tissue specific gene expression. Advances in target identification technologies and molecular interaction databases have allowed a systems biology approach to unravel the gene regulatory networks (GRNs) triggered by miR-K12-11 in endothelial and lymphoid cells. Understanding the tissue specific function of miR-K12-11 will help to elucidate underlying mechanisms of KSHV pathogenesis. Ectopic expression of miR-K12-11 differentially affected gene expression in BJAB cells of lymphoid origin and TIVE cells of endothelial origin. Direct miRNA targeting accounted for a small fraction of the observed transcriptome changes: only 29 genes were identified as putative direct targets of miR-K12-11 in both cell types. However, a number of commonly affected biological pathways, such as carbohydrate metabolism and interferon response related signaling, were revealed by gene ontology analysis. Integration of transcriptome profiling, bioinformatic algorithms, and databases of protein-protein interactome from the ENCODE project identified different nodes of GRNs utilized by miR-K12-11 in a tissue-specific fashion. These effector genes, including cancer associated transcription factors and signaling proteins, amplified the regulatory potential of a single miRNA, from a small set of putative direct targets to a larger set of genes. This is the first comparative analysis of miRNA-K12-11's effects in endothelial and B cells, from tissues infected with KSHV in vivo. MiR-K12-11 was able to broadly modulate gene expression in both cell types. Using a systems biology approach, we inferred that miR-K12-11 establishes its GRN by both repressing master TFs and influencing signaling pathways, to counter the host anti-viral response and to promote proliferation and survival of infected cells. The targeted GRNs are more reproducible and informative than target gene identification, and our approach can be applied to other regulatory factors of interest.
Kavak, Erşen; Ünlü, Mustafa; Nistér, Monica; Koman, Ahmet
2010-01-01
Cancer is among the major causes of human death and its mechanism(s) are not fully understood. We applied a novel meta-analysis approach to multiple sets of merged serial analysis of gene expression and microarray cancer data in order to analyze transcriptome alterations in human cancer. Our methodology, which we denote ‘COgnate Gene Expression patterNing in tumours’ (COGENT), unmasked numerous genes that were differentially expressed in multiple cancers. COGENT detected well-known tumor-associated (TA) genes such as TP53, EGFR and VEGF, as well as many multi-cancer, but not-yet-tumor-associated genes. In addition, we identified 81 co-regulated regions on the human genome (RIDGEs) by using expression data from all cancers. Some RIDGEs (28%) consist of paralog genes while another subset (30%) are specifically dysregulated in tumors but not in normal tissues. Furthermore, a significant number of RIDGEs are associated with GC-rich regions on the genome. All assembled data is freely available online (www.oncoreveal.org) as a tool implementing COGENT analysis of multi-cancer genes and RIDGEs. These findings engender a deeper understanding of cancer biology by demonstrating the existence of a pool of under-studied multi-cancer genes and by highlighting the cancer-specificity of some TA-RIDGEs. PMID:20621981
Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M
2012-01-01
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.
Computational annotation of genes differentially expressed along olive fruit development
Galla, Giulio; Barcaccia, Gianni; Ramina, Angelo; Collani, Silvio; Alagna, Fiammetta; Baldoni, Luciana; Cultrera, Nicolò GM; Martinelli, Federico; Sebastiani, Luca; Tonutti, Pietro
2009-01-01
Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1), completed pit hardening (stage 2) and veraison (stage 3)] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B), and 2 and 3 (libraries C and D). All sequenced clones (1,132 in total) were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO): cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening. PMID:19852839
Regulation of human genome expression and RNA splicing by human papillomavirus 16 E2 protein.
Gauson, Elaine J; Windle, Brad; Donaldson, Mary M; Caffarel, Maria M; Dornan, Edward S; Coleman, Nicholas; Herzyk, Pawel; Henderson, Scott C; Wang, Xu; Morgan, Iain M
2014-11-01
Human papillomavirus 16 (HPV16) is causative in human cancer. The E2 protein regulates transcription from and replication of the viral genome; the role of E2 in regulating the host genome has been less well studied. We have expressed HPV16 E2 (E2) stably in U2OS cells; these cells tolerate E2 expression well and gene expression analysis identified 74 genes showing differential expression specific to E2. Analysis of published gene expression data sets during cervical cancer progression identified 20 of the genes as being altered in a similar direction as the E2 specific genes. In addition, E2 altered the splicing of many genes implicated in cancer and cell motility. The E2 expressing cells showed no alteration in cell growth but were altered in cell motility, consistent with the E2 induced altered splicing predicted to affect this cellular function. The results present a model system for investigating E2 regulation of the host genome. Copyright © 2014 Elsevier Inc. All rights reserved.
Ortega-Molina, Ana; Boss, Isaac W.; Canela, Andres; Pan, Heng; Jiang, Yanwen; Zhao, Chunying; Jiang, Man; Hu, Deqing; Agirre, Xabier; Niesvizky, Itamar; Lee, Ji-Eun; Chen, Hua-Tang; Ennishi, Daisuke; Scott, David W.; Mottok, Anja; Hother, Christoffer; Liu, Shichong; Cao, Xing-Jun; Tam, Wayne; Shaknovich, Rita; Garcia, Benjamin A.; Gascoyne, Randy D.; Ge, Kai; Shilatifard, Ali; Elemento, Olivier; Nussenzweig, Andre; Melnick, Ari M.; Wendel, Hans-Guido
2015-01-01
The lysine-specific histone methyltransferase KMT2D has emerged as one of the most frequently mutated genes in follicular lymphoma (FL) and diffuse large B cell lymphoma (DLBCL). However, the biological consequences of KMT2D mutations on lymphoma development are not known. Here we show that KMT2D functions as a bona fide tumor suppressor and that its genetic ablation in B cells promotes lymphoma development in mice. KMT2D deficiency also delays germinal center (GC) involution, impedes B cell differentiation and class switch recombination (CSR). Integrative genomic analyses indicate that KMT2D affects H3K4 methylation and expression of a specific set of genes including those in the CD40, JAK-STAT, Toll-like receptor, and B cell receptor pathways. Notably, other KMT2D target genes include frequently mutated tumor suppressor genes such as TNFAIP3, SOCS3, and TNFRSF14. Therefore, KMT2D mutations may promote malignant outgrowth by perturbing the expression of tumor suppressor genes that control B cell activating pathways. PMID:26366710
Martins, Tiago M; Hartmann, Diego O; Planchon, Sébastien; Martins, Isabel; Renaut, Jenny; Silva Pereira, Cristina
2015-01-01
Aspergilli play major roles in the natural turnover of elements, especially through the decomposition of plant litter, but the end catabolism of lignin aromatic hydrocarbons remains largely unresolved. The 3-oxoadipate pathway of their degradation combines the catechol and the protocatechuate branches, each using a set of specific genes. However, annotation for most of these genes is lacking or attributed to poorly- or un-characterised families. Aspergillus nidulans can utilise as sole carbon/energy source either benzoate or salicylate (upstream aromatic metabolites of the protocatechuate and the catechol branches, respectively). Using this cultivation strategy and combined analyses of comparative proteomics, gene mining, gene expression and characterisation of particular gene-replacement mutants, we precisely assigned most of the steps of the 3-oxoadipate pathway to specific genes in this fungus. Our findings disclose the genetically encoded potential of saprophytic Ascomycota fungi to utilise this pathway and provide means to untie associated regulatory networks, which are vital to heightening their ecological significance. Copyright © 2014 Elsevier Inc. All rights reserved.
Xu, Chen; Zhang, Nan; Huo, Qianyu; Chen, Minghui; Wang, Rengfeng; Liu, Zhili; Li, Xue; Liu, Yunde; Bao, Huijing
2016-04-15
In this article, we discuss the polymerase chain reaction (PCR)-hybridization assay that we developed for high-throughput simultaneous detection and differentiation of Ureaplasma urealyticum and Ureaplasma parvum using one set of primers and two specific DNA probes based on urease gene nucleotide sequence differences. First, U. urealyticum and U. parvum DNA samples were specifically amplified using one set of biotin-labeled primers. Furthermore, amine-modified DNA probes, which can specifically react with U. urealyticum or U. parvum DNA, were covalently immobilized to a DNA-BIND plate surface. The plate was then incubated with the PCR products to facilitate sequence-specific DNA binding. Horseradish peroxidase-streptavidin conjugation and a colorimetric assay were used. Based on the results, the PCR-hybridization assay we developed can specifically differentiate U. urealyticum and U. parvum with high sensitivity (95%) compared with cultivation (72.5%). Hence, this study demonstrates a new method for high-throughput simultaneous differentiation and detection of U. urealyticum and U. parvum with high sensitivity. Based on these observations, the PCR-hybridization assay developed in this study is ideal for detecting and discriminating U. urealyticum and U. parvum in clinical applications. Copyright © 2016 Elsevier Inc. All rights reserved.
Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N
2013-03-15
The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
Wang, Guo-Ming; Yin, Hao; Qiao, Xin; Tan, Xu; Gu, Chao; Wang, Bao-Hua; Cheng, Rui; Wang, Ying-Zhen; Zhang, Shao-Ling
2016-12-01
F-box gene family, as one of the largest gene families in plants, plays crucial roles in regulating plant development, reproduction, cellular protein degradation and responses to biotic and abiotic stresses. However, comprehensive analysis of the F-box gene family in pear (Pyrus bretschneideri Rehd.) and other Rosaceae species has not been reported yet. Herein, we identified a total of 226 full-length F-box genes in pear for the first time. And these genes were further divided into various subgroups based on specific domains and phylogenetic analysis. Intriguingly, we observed that whole-genome duplication and dispersed duplication have a major contribution to F-box family expansion. Furthermore, the dynamic evolution for different modes of gene duplication was dissected. Interestingly, we found that dispersed and tandem duplicate have been evolving at a high rate. In addition, we found that F-box genes exhibited functional specificity based on GO analysis, and most of the F-box genes were significantly enriched in the protein binding (GO: 0005515) term, supporting that F-box genes might play a critical role for gene regulation in pear. Transcriptome and digital expression profiles revealed that F-box genes are involved in the development of multiple pear tissues. Overall, these results will set stage for elaborating the biological role of F-box genes in pear and other plants. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
2011-01-01
Background Populations of Atlantic killifish (Fundulus heteroclitus) have evolved resistance to the embryotoxic effects of polychlorinated biphenyls (PCBs) and other halogenated and nonhalogenated aromatic hydrocarbons that act through an aryl hydrocarbon receptor (AHR)-dependent signaling pathway. The resistance is accompanied by reduced sensitivity to induction of cytochrome P450 1A (CYP1A), a widely used biomarker of aromatic hydrocarbon exposure and effect, but whether the reduced sensitivity is specific to CYP1A or reflects a genome-wide reduction in responsiveness to all AHR-mediated changes in gene expression is unknown. We compared gene expression profiles and the response to 3,3',4,4',5-pentachlorobiphenyl (PCB-126) exposure in embryos (5 and 10 dpf) and larvae (15 dpf) from F. heteroclitus populations inhabiting the New Bedford Harbor, Massachusetts (NBH) Superfund site (PCB-resistant) and a reference site, Scorton Creek, Massachusetts (SC; PCB-sensitive). Results Analysis using a 7,000-gene cDNA array revealed striking differences in responsiveness to PCB-126 between the populations; the differences occur at all three stages examined. There was a sizeable set of PCB-responsive genes in the sensitive SC population, a much smaller set of PCB-responsive genes in NBH fish, and few similarities in PCB-responsive genes between the two populations. Most of the array results were confirmed, and additional PCB-regulated genes identified, by RNA-Seq (deep pyrosequencing). Conclusions The results suggest that NBH fish possess a gene regulatory defect that is not specific to one target gene such as CYP1A but rather lies in a regulatory pathway that controls the transcriptional response of multiple genes to PCB exposure. The results are consistent with genome-wide disruption of AHR-dependent signaling in NBH fish. PMID:21609454
Evolution and development of the vertebrate ear
NASA Technical Reports Server (NTRS)
Fritzsch, B.; Beisel, K. W.
2001-01-01
This review outlines major aspects of development and evolution of the ear, specifically addressing issues of cell fate commitment and the emerging molecular governance of these decisions. Available data support the notion of homology of subsets of mechanosensors across phyla (proprioreceptive mechanosensory neurons in insects, hair cells in vertebrates). It is argued that this conservation is primarily related to the specific transducing environment needed to achieve mechanosensation. Achieving this requires highly conserved transcription factors that regulate the expression of the relevant structural genes for mechanosensory transduction. While conserved at the level of some cell fate assignment genes (atonal and its mammalian homologue), the ear has also radically reorganized its development by implementing genes used for cell fate assignment in other parts of the developing nervous systems (e.g., neurogenin 1) and by evolving novel sets of genes specifically associated with the novel formation of sensory neurons that contact hair cells (neurotrophins and their receptors). Numerous genes have been identified that regulate morphogenesis, but there is only one common feature that emerges at the moment: the ear appears to have co-opted genes from a large variety of other parts of the developing body (forebrain, limbs, kidneys) and establishes, in combination with existing transcription factors, an environment in which those genes govern novel, ear-related morphogenetic aspects. The ear thus represents a unique mix of highly conserved developmental elements combined with co-opted and newly evolved developmental elements.
Microbiota diversity and gene expression dynamics in human oral biofilms
2014-01-01
Background Micro-organisms inhabiting teeth surfaces grow on biofilms where a specific and complex succession of bacteria has been described by co-aggregation tests and DNA-based studies. Although the composition of oral biofilms is well established, the active portion of the bacterial community and the patterns of gene expression in vivo have not been studied. Results Using RNA-sequencing technologies, we present the first metatranscriptomic study of human dental plaque, performed by two different approaches: (1) A short-reads, high-coverage approach by Illumina sequencing to characterize the gene activity repertoire of the microbial community during biofilm development; (2) A long-reads, lower-coverage approach by pyrosequencing to determine the taxonomic identity of the active microbiome before and after a meal ingestion. The high-coverage approach allowed us to analyze over 398 million reads, revealing that microbial communities are individual-specific and no bacterial species was detected as key player at any time during biofilm formation. We could identify some gene expression patterns characteristic for early and mature oral biofilms. The transcriptomic profile of several adhesion genes was confirmed through qPCR by measuring expression of fimbriae-associated genes. In addition to the specific set of gene functions overexpressed in early and mature oral biofilms, as detected through the short-reads dataset, the long-reads approach detected specific changes when comparing the metatranscriptome of the same individual before and after a meal, which can narrow down the list of organisms responsible for acid production and therefore potentially involved in dental caries. Conclusions The bacteria changing activity during biofilm formation and after meal ingestion were person-specific. Interestingly, some individuals showed extreme homeostasis with virtually no changes in the active bacterial population after food ingestion, suggesting the presence of a microbial community which could be associated to dental health. PMID:24767457
Microbiota diversity and gene expression dynamics in human oral biofilms.
Benítez-Páez, Alfonso; Belda-Ferre, Pedro; Simón-Soro, Aurea; Mira, Alex
2014-04-27
Micro-organisms inhabiting teeth surfaces grow on biofilms where a specific and complex succession of bacteria has been described by co-aggregation tests and DNA-based studies. Although the composition of oral biofilms is well established, the active portion of the bacterial community and the patterns of gene expression in vivo have not been studied. Using RNA-sequencing technologies, we present the first metatranscriptomic study of human dental plaque, performed by two different approaches: (1) A short-reads, high-coverage approach by Illumina sequencing to characterize the gene activity repertoire of the microbial community during biofilm development; (2) A long-reads, lower-coverage approach by pyrosequencing to determine the taxonomic identity of the active microbiome before and after a meal ingestion. The high-coverage approach allowed us to analyze over 398 million reads, revealing that microbial communities are individual-specific and no bacterial species was detected as key player at any time during biofilm formation. We could identify some gene expression patterns characteristic for early and mature oral biofilms. The transcriptomic profile of several adhesion genes was confirmed through qPCR by measuring expression of fimbriae-associated genes. In addition to the specific set of gene functions overexpressed in early and mature oral biofilms, as detected through the short-reads dataset, the long-reads approach detected specific changes when comparing the metatranscriptome of the same individual before and after a meal, which can narrow down the list of organisms responsible for acid production and therefore potentially involved in dental caries. The bacteria changing activity during biofilm formation and after meal ingestion were person-specific. Interestingly, some individuals showed extreme homeostasis with virtually no changes in the active bacterial population after food ingestion, suggesting the presence of a microbial community which could be associated to dental health.
Guo, Bing; Greenwood, Paul L; Cafe, Linda M; Zhou, Guanghong; Zhang, Wangang; Dalrymple, Brian P
2015-03-13
This study aimed to identify markers for muscle growth rate and the different cellular contributors to cattle muscle and to link the muscle growth rate markers to specific cell types. The expression of two groups of genes in the longissimus muscle (LM) of 48 Brahman steers of similar age, significantly enriched for "cell cycle" and "ECM (extracellular matrix) organization" Gene Ontology (GO) terms was correlated with average daily gain/kg liveweight (ADG/kg) of the animals. However, expression of the same genes was only partly related to growth rate across a time course of postnatal LM development in two cattle genotypes, Piedmontese x Hereford (high muscling) and Wagyu x Hereford (high marbling). The deposition of intramuscular fat (IMF) altered the relationship between the expression of these genes and growth rate. K-means clustering across the development time course with a large set of genes (5,596) with similar expression profiles to the ECM genes was undertaken. The locations in the clusters of published markers of different cell types in muscle were identified and used to link clusters of genes to the cell type most likely to be expressing them. Overall correspondence between published cell type expression of markers and predicted major cell types of expression in cattle LM was high. However, some exceptions were identified: expression of SOX8 previously attributed to muscle satellite cells was correlated with angiogenesis. Analysis of the clusters and cell types suggested that the "cell cycle" and "ECM" signals were from the fibro/adipogenic lineage. Significant contributions to these signals from the muscle satellite cells, angiogenic cells and adipocytes themselves were not as strongly supported. Based on the clusters and cell type markers, sets of five genes predicted to be representative of fibro/adipogenic precursors (FAPs) and endothelial cells, and/or ECM remodelling and angiogenesis were identified. Gene sets and gene markers for the analysis of many of the major processes/cell populations contributing to muscle composition and growth have been proposed, enabling a consistent interpretation of gene expression datasets from cattle LM. The same gene sets are likely to be applicable in other cattle muscles and in other species.
Comparative genomics reveals candidate carotenoid pathway regulators of ripening watermelon fruit.
Grassi, Stefania; Piro, Gabriella; Lee, Je Min; Zheng, Yi; Fei, Zhangjun; Dalessandro, Giuseppe; Giovannoni, James J; Lenucci, Marcello S
2013-11-12
Many fruits, including watermelon, are proficient in carotenoid accumulation during ripening. While most genes encoding steps in the carotenoid biosynthetic pathway have been cloned, few transcriptional regulators of these genes have been defined to date. Here we describe the identification of a set of putative carotenoid-related transcription factors resulting from fresh watermelon carotenoid and transcriptome analysis during fruit development and ripening. Our goal is to both clarify the expression profiles of carotenoid pathway genes and to identify candidate regulators and molecular targets for crop improvement. Total carotenoids progressively increased during fruit ripening up to ~55 μg g(-1) fw in red-ripe fruits. Trans-lycopene was the carotenoid that contributed most to this increase. Many of the genes related to carotenoid metabolism displayed changing expression levels during fruit ripening generating a metabolic flux toward carotenoid synthesis. Constitutive low expression of lycopene cyclase genes resulted in lycopene accumulation. RNA-seq expression profiling of watermelon fruit development yielded a set of transcription factors whose expression was correlated with ripening and carotenoid accumulation. Nineteen putative transcription factor genes from watermelon and homologous to tomato carotenoid-associated genes were identified. Among these, six were differentially expressed in the flesh of both species during fruit development and ripening. Taken together the data suggest that, while the regulation of a common set of metabolic genes likely influences carotenoid synthesis and accumulation in watermelon and tomato fruits during development and ripening, specific and limiting regulators may differ between climacteric and non-climacteric fruits, possibly related to their differential susceptibility to and use of ethylene during ripening.
Comparative genomics reveals candidate carotenoid pathway regulators of ripening watermelon fruit
2013-01-01
Background Many fruits, including watermelon, are proficient in carotenoid accumulation during ripening. While most genes encoding steps in the carotenoid biosynthetic pathway have been cloned, few transcriptional regulators of these genes have been defined to date. Here we describe the identification of a set of putative carotenoid-related transcription factors resulting from fresh watermelon carotenoid and transcriptome analysis during fruit development and ripening. Our goal is to both clarify the expression profiles of carotenoid pathway genes and to identify candidate regulators and molecular targets for crop improvement. Results Total carotenoids progressively increased during fruit ripening up to ~55 μg g-1 fw in red-ripe fruits. Trans-lycopene was the carotenoid that contributed most to this increase. Many of the genes related to carotenoid metabolism displayed changing expression levels during fruit ripening generating a metabolic flux toward carotenoid synthesis. Constitutive low expression of lycopene cyclase genes resulted in lycopene accumulation. RNA-seq expression profiling of watermelon fruit development yielded a set of transcription factors whose expression was correlated with ripening and carotenoid accumulation. Nineteen putative transcription factor genes from watermelon and homologous to tomato carotenoid-associated genes were identified. Among these, six were differentially expressed in the flesh of both species during fruit development and ripening. Conclusions Taken together the data suggest that, while the regulation of a common set of metabolic genes likely influences carotenoid synthesis and accumulation in watermelon and tomato fruits during development and ripening, specific and limiting regulators may differ between climacteric and non-climacteric fruits, possibly related to their differential susceptibility to and use of ethylene during ripening. PMID:24219562
Carbajo, Daniel; Magi, Shigeyuki; Itoh, Masayoshi; Kawaji, Hideya; Lassmann, Timo; Arner, Erik; Forrest, Alistair R R; Carninci, Piero; Hayashizaki, Yoshihide; Daub, Carsten O; Okada-Hatakeyama, Mariko; Mar, Jessica C
2015-01-01
Understanding how cells use complex transcriptional programs to alter their fate in response to specific stimuli is an important question in biology. For the MCF-7 human breast cancer cell line, we applied gene expression trajectory models to identify the genes involved in driving cell fate transitions. We modified trajectory models to account for the scenario where cells were exposed to different stimuli, in this case epidermal growth factor and heregulin, to arrive at different cell fates, i.e. proliferation and differentiation respectively. Using genome-wide CAGE time series data collected from the FANTOM5 consortium, we identified the sets of promoters that were involved in the transition of MCF-7 cells to their specific fates versus those with expression changes that were generic to both stimuli. Of the 1,552 promoters identified, 1,091 had stimulus-specific expression while 461 promoters had generic expression profiles over the time course surveyed. Many of these stimulus-specific promoters mapped to key regulators of the ERK (extracellular signal-regulated kinases) signaling pathway such as FHL2 (four and a half LIM domains 2). We observed that in general, generic promoters peaked in their expression early on in the time course, while stimulus-specific promoters tended to show activation of their expression at a later stage. The genes that mapped to stimulus-specific promoters were enriched for pathways that control focal adhesion, p53 signaling and MAPK signaling while generic promoters were enriched for cell death, transcription and the cell cycle. We identified 162 genes that were controlled by an alternative promoter during the time course where a subset of 37 genes had separate promoters that were classified as stimulus-specific and generic. The results of our study highlighted the degree of complexity involved in regulating a cell fate transition where multiple promoters mapping to the same gene can demonstrate quite divergent expression profiles.
Bruce, A. Gregory; Barcy, Serge; DiMaio, Terri; Gan, Emilia; Garrigues, H. Jacques; Lagunoff, Michael; Rose, Timothy M.
2017-01-01
The transcriptome of the Kaposi’s sarcoma-associated herpesvirus (KSHV/HHV8) after primary latent infection of human blood (BEC), lymphatic (LEC) and immortalized (TIME) endothelial cells was analyzed using RNAseq, and compared to long-term latency in BCBL-1 lymphoma cells. Naturally expressed transcripts were obtained without artificial induction, and a comprehensive annotation of the KSHV genome was determined. A set of unique coding sequence (UCDS) features and a process to resolve overlapping transcripts were developed to accurately quantitate transcript levels from specific promoters. Similar patterns of KSHV expression were detected in BCBL-1 cells undergoing long-term latent infections and in primary latent infections of both BEC and LEC cultures. High expression levels of poly-adenylated nuclear (PAN) RNA and spliced and unspliced transcripts encoding the K12 Kaposin B/C complex and associated microRNA region were detected, with an elevated expression of a large set of lytic genes in all latently infected cultures. Quantitation of non-overlapping regions of transcripts across the complete KSHV genome enabled for the first time accurate evaluation of the KSHV transcriptome associated with viral latency in different cell types. Hierarchical clustering applied to a gene correlation matrix identified modules of co-regulated genes with similar correlation profiles, which corresponded with biological and functional similarities of the encoded gene products. Gene modules were differentially upregulated during latency in specific cell types indicating a role for cellular factors associated with differentiated and/or proliferative states of the host cell to influence viral gene expression. PMID:28335496
2013-01-01
Background Although Candida albicans and Candida dubliniensis are most closely related, both species behave significantly different with respect to morphogenesis and virulence. In order to gain further insight into the divergent routes for morphogenetic adaptation in both species, we investigated qualitative along with quantitative differences in the transcriptomes of both organisms by cDNA deep sequencing. Results Following genome-associated assembly of sequence reads we were able to generate experimentally verified databases containing 6016 and 5972 genes for C. albicans and C. dubliniensis, respectively. About 95% of the transcriptionally active regions (TARs) contain open reading frames while the remaining TARs most likely represent non-coding RNAs. Comparison of our annotations with publically available gene models for C. albicans and C. dubliniensis confirmed approximately 95% of already predicted genes, but also revealed so far unknown novel TARs in both species. Qualitative cross-species analysis of these databases revealed in addition to 5802 orthologs also 399 and 49 species-specific protein coding genes for C. albicans and C. dubliniensis, respectively. Furthermore, quantitative transcriptional profiling using RNA-Seq revealed significant differences in the expression of orthologs across both species. We defined a core subset of 84 hyphal-specific genes required for both species, as well as a set of 42 genes that seem to be specifically induced during hyphal morphogenesis in C. albicans. Conclusions Species-specific adaptation in C. albicans and C. dubliniensis is governed by individual genetic repertoires but also by altered regulation of conserved orthologs on the transcriptional level. PMID:23547856
Badr, Eman; ElHefnawi, Mahmoud; Heath, Lenwood S
2016-01-01
Alternative splicing is a vital process for regulating gene expression and promoting proteomic diversity. It plays a key role in tissue-specific expressed genes. This specificity is mainly regulated by splicing factors that bind to specific sequences called splicing regulatory elements (SREs). Here, we report a genome-wide analysis to study alternative splicing on multiple tissues, including brain, heart, liver, and muscle. We propose a pipeline to identify differential exons across tissues and hence tissue-specific SREs. In our pipeline, we utilize the DEXSeq package along with our previously reported algorithms. Utilizing the publicly available RNA-Seq data set from the Human BodyMap project, we identified 28,100 differentially used exons across the four tissues. We identified tissue-specific exonic splicing enhancers that overlap with various previously published experimental and computational databases. A complicated exonic enhancer regulatory network was revealed, where multiple exonic enhancers were found across multiple tissues while some were found only in specific tissues. Putative combinatorial exonic enhancers and silencers were discovered as well, which may be responsible for exon inclusion or exclusion across tissues. Some of the exonic enhancers are found to be co-occurring with multiple exonic silencers and vice versa, which demonstrates a complicated relationship between tissue-specific exonic enhancers and silencers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Acquaah-Mensah, George K.; Taylor, Ronald C.
Microarray data have been a valuable resource for identifying transcriptional regulatory relationships among genes. As an example, brain region-specific transcriptional regulatory events have the potential of providing etiological insights into Alzheimer Disease (AD). However, there is often a paucity of suitable brain-region specific expression data obtained via microarrays or other high throughput means. The Allen Brain Atlas in situ hybridization (ISH) data sets (Jones et al., 2009) represent a potentially valuable alternative source of high-throughput brain region-specific gene expression data for such purposes. In this study, Allen BrainAtlasmouse ISH data in the hippocampal fields were extracted, focusing on 508 genesmore » relevant to neurodegeneration. Transcriptional regulatory networkswere learned using three high-performing network inference algorithms. Only 17% of regulatory edges from a network reverse-engineered based on brain region-specific ISH data were also found in a network constructed upon gene expression correlations inmousewhole brain microarrays, thus showing the specificity of gene expression within brain sub-regions. Furthermore, the ISH data-based networks were used to identify instructive transcriptional regulatory relationships. Ncor2, Sp3 and Usf2 form a unique three-party regulatory motif, potentially affecting memory formation pathways. Nfe2l1, Egr1 and Usf2 emerge among regulators of genes involved in AD (e.g. Dhcr24, Aplp2, Tia1, Pdrx1, Vdac1, andSyn2). Further, Nfe2l1, Egr1 and Usf2 are sensitive to dietary factors and could be among links between dietary influences and genes in the AD etiology. Thus, this approach of harnessing brain region-specific ISH data represents a rare opportunity for gleaning unique etiological insights for diseases such as AD.« less
Kagoshima, Hiroshi; Kohara, Yuji
2015-03-15
A wide variety of cells are generated by the expression of characteristic sets of genes, primarily those regulated by cell-specific transcription. To elucidate the mechanism regulating cell-specific gene expression in a highly specialized cell, AFD thermosensory neuron in Caenorhabditis elegans, we analyzed the promoter sequences of guanylyl cyclase genes, gcy-8 and gcy-18, exclusively expressed in AFD. In this study, we showed that AFD-specific expression of gcy-8 and gcy-18 requires the co-expression of homeodomain proteins, CEH-14/LHX3 and TTX-1/OTX1. We observed that mutation of ttx-1 or ceh-14 caused a reduction in the expression of gcy-8 and gcy-18 and that the expression was completely lost in double mutants. This synergy effect was also observed with other AFD marker genes, such as ntc-1, nlp-21and cng-3. Electrophoretic mobility shift assays revealed direct interaction of CEH-14 and TTX-1 proteins with gcy-8 and gcy-18 promoters in vitro. The binding sites of CEH-14 and TTX-1 proteins were confirmed to be essential for AFD-specific expression of gcy-8 and gcy-18 in vivo. We also demonstrated that forced expression of CEH-14 and TTX-1 in AWB chemosensory neurons induced ectopic expression of gcy-8 and gcy-18 reporters in this neuron. Finally, we showed that the regulation of gcy-8 and gcy-18 expression by ceh-14 and ttx-1 is evolutionally conserved in five Caenorhabditis species. Taken together, ceh-14 and ttx-1 expression determines the fate of AFD as terminal selector genes at the final step of cell specification. Copyright © 2015 Elsevier Inc. All rights reserved.
Duesberg, Peter H.; Vogt, Peter K.
1979-01-01
The genome of the defective avian tumor virus MH2 was identified as a RNA of 5.7 kilobases by its presence in different MH2-helper virus complexes and its absence from pure helper virus, by its unique fingerprint pattern of RNase T1-resistant (T1) oligonucleotides that differed from those of two helper virus RNAs, and by its structural analogy to the RNA of MC29, another avian acute leukemia virus. Two sets of sequences were distinguished in MH2 RNA: 66% hybridized with DNA complementary to helper-independent avian tumor viruses, termed group-specific, and 34% were specific. The percentage of specific sequences is considered a minimal estimate because the MH2 RNA used was about 30% contaminated by helper virus RNA. No sequences related to the transforming src gene of avian sarcoma viruses were found in MH2. MH2 shared three large T1 oligonucleotides with MC29, two of which could also be isolated from a RNase A- and T1-resistant hybrid formed between MH2 RNA and MC29 specific cDNA. These oligonucleotides belong to a group of six that define the specific segment of MC29 RNA described previously. The group-specific sequences of MH2 and MC29 RNA shared only the two smallest out of about 20 T1 oligonucleotides associated with MH2 RNA. It is concluded that the specific sequences of MH2 and MC29 are related, and it is proposed that they are necessary for, or identical with, the onc genes of these viruses. These sequences would define a related class of transforming genes in avian tumor viruses that differs from the src genes of avian sarcoma viruses. Images PMID:221900
2011-01-01
Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results We have observed that (1) the Tol2 transposase (but not piggyBac) is highly sensitive to molecular engineering; (2) the piggyBac donor with only the 40 bp 3'-and 67 bp 5'-terminal repeat domain is sufficient for effective transposition; and (3) a small amount of piggyBac transposases results in robust transposition suggesting the piggyBac transpospase is highly active. Performing genome-wide target profiling on data sets obtained by retrieving chromosomal targeting sequences from individual clones, we have identified several piggyBac and Tol2 hotspots and observed that (4) piggyBac and Tol2 display a clear difference in targeting preferences in the human genome. Finally, we have observed that (5) only sites with a particular sequence context can be targeted by either piggyBac or Tol2. Conclusions The non-overlapping targeting preference of piggyBac and Tol2 makes them complementary research tools for manipulating mammalian genomes. PiggyBac is the most promising transposon-based vector system for achieving site-specific targeting of therapeutic genes due to the flexibility of its transposase for being molecularly engineered. Insights from this study will provide a basis for engineering piggyBac transposases to achieve site-specific therapeutic gene targeting. PMID:21447194
Identification and Functional Analysis of Healing Regulators in Drosophila
Álvarez-Fernández, Carmen; Tamirisa, Srividya; Prada, Federico; Chernomoretz, Ariel; Podhajcer, Osvaldo; Blanco, Enrique; Martín-Blanco, Enrique
2015-01-01
Wound healing is an essential homeostatic mechanism that maintains the epithelial barrier integrity after tissue damage. Although we know the overall steps in wound healing, many of the underlying molecular mechanisms remain unclear. Genetically amenable systems, such as wound healing in Drosophila imaginal discs, do not model all aspects of the repair process. However, they do allow the less understood aspects of the healing response to be explored, e.g., which signal(s) are responsible for initiating tissue remodeling? How is sealing of the epithelia achieved? Or, what inhibitory cues cancel the healing machinery upon completion? Answering these and other questions first requires the identification and functional analysis of wound specific genes. A variety of different microarray analyses of murine and humans have identified characteristic profiles of gene expression at the wound site, however, very few functional studies in healing regulation have been carried out. We developed an experimentally controlled method that is healing-permissive and that allows live imaging and biochemical analysis of cultured imaginal discs. We performed comparative genome-wide profiling between Drosophila imaginal cells actively involved in healing versus their non-engaged siblings. Sets of potential wound-specific genes were subsequently identified. Importantly, besides identifying and categorizing new genes, we functionally tested many of their gene products by genetic interference and overexpression in healing assays. This non-saturated analysis defines a relevant set of genes whose changes in expression level are functionally significant for proper tissue repair. Amongst these we identified the TCP1 chaperonin complex as a key regulator of the actin cytoskeleton essential for the wound healing response. There is promise that our newly identified wound-healing genes will guide future work in the more complex mammalian wound healing response. PMID:25647511
McClellan, Michael J.; Wood, C. David; Ojeniyi, Opeoluwa; Cooper, Tim J.; Kanhere, Aditi; Arvey, Aaron; Webb, Helen M.; Palermo, Richard D.; Harth-Hertle, Marie L.; Kempkes, Bettina; Jenner, Richard G.; West, Michelle J.
2013-01-01
Epstein-Barr virus (EBV) epigenetically reprogrammes B-lymphocytes to drive immortalization and facilitate viral persistence. Host-cell transcription is perturbed principally through the actions of EBV EBNA 2, 3A, 3B and 3C, with cellular genes deregulated by specific combinations of these EBNAs through unknown mechanisms. Comparing human genome binding by these viral transcription factors, we discovered that 25% of binding sites were shared by EBNA 2 and the EBNA 3s and were located predominantly in enhancers. Moreover, 80% of potential EBNA 3A, 3B or 3C target genes were also targeted by EBNA 2, implicating extensive interplay between EBNA 2 and 3 proteins in cellular reprogramming. Investigating shared enhancer sites neighbouring two new targets (WEE1 and CTBP2) we discovered that EBNA 3 proteins repress transcription by modulating enhancer-promoter loop formation to establish repressive chromatin hubs or prevent assembly of active hubs. Re-ChIP analysis revealed that EBNA 2 and 3 proteins do not bind simultaneously at shared sites but compete for binding thereby modulating enhancer-promoter interactions. At an EBNA 3-only intergenic enhancer site between ADAM28 and ADAMDEC1 EBNA 3C was also able to independently direct epigenetic repression of both genes through enhancer-promoter looping. Significantly, studying shared or unique EBNA 3 binding sites at WEE1, CTBP2, ITGAL (LFA-1 alpha chain), BCL2L11 (Bim) and the ADAMs, we also discovered that different sets of EBNA 3 proteins bind regulatory elements in a gene and cell-type specific manner. Binding profiles correlated with the effects of individual EBNA 3 proteins on the expression of these genes, providing a molecular basis for the targeting of different sets of cellular genes by the EBNA 3s. Our results therefore highlight the influence of the genomic and cellular context in determining the specificity of gene deregulation by EBV and provide a paradigm for host-cell reprogramming through modulation of enhancer-promoter interactions by viral transcription factors. PMID:24068937
Zhu, Hong; Xia, Wei; Mo, Xing-Bo; Lin, Xiang; Qiu, Ying-Hua; Yi, Neng-Jun; Zhang, Yong-Hong; Deng, Fei-Yan; Lei, Shu-Feng
2016-01-01
Rheumatoid arthritis (RA) is a complex autoimmune disease. Using a gene-based association research strategy, the present study aims to detect unknown susceptibility to RA and to address the ethnic differences in genetic susceptibility to RA between European and Asian populations. Gene-based association analyses were performed with KGG 2.5 by using publicly available large RA datasets (14,361 RA cases and 43,923 controls of European subjects, 4,873 RA cases and 17,642 controls of Asian Subjects). For the newly identified RA-associated genes, gene set enrichment analyses and protein-protein interactions analyses were carried out with DAVID and STRING version 10.0, respectively. Differential expression verification was conducted using 4 GEO datasets. The expression levels of three selected 'highly verified' genes were measured by ELISA among our in-house RA cases and controls. A total of 221 RA-associated genes were newly identified by gene-based association study, including 71'overlapped', 76 'European-specific' and 74 'Asian-specific' genes. Among them, 105 genes had significant differential expressions between RA patients and health controls at least in one dataset, especially for 20 genes including 11 'overlapped' (ABCF1, FLOT1, HLA-F, IER3, TUBB, ZKSCAN4, BTN3A3, HSP90AB1, CUTA, BRD2, HLA-DMA), 5 'European-specific' (PHTF1, RPS18, BAK1, TNFRSF14, SUOX) and 4 'Asian-specific' (RNASET2, HFE, BTN2A2, MAPK13) genes whose differential expressions were significant at least in three datasets. The protein expressions of two selected genes FLOT1 (P value = 1.70E-02) and HLA-DMA (P value = 4.70E-02) in plasma were significantly different in our in-house samples. Our study identified 221 novel RA-associated genes and especially highlighted the importance of 20 candidate genes on RA. The results addressed ethnic genetic background differences for RA susceptibility between European and Asian populations and detected a long list of overlapped or ethnic specific RA genes. The study not only greatly increases our understanding of genetic susceptibility to RA, but also provides important insights into the ethno-genetic homogeneity and heterogeneity of RA in both ethnicities.
Identification of functional elements and regulatory circuits by Drosophila modENCODE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.
2010-12-22
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Ozerov, Ivan V; Lezhnina, Ksenia V; Izumchenko, Evgeny; Artemov, Artem V; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N; Labat, Ivan; West, Michael D; Buzdin, Anton; Cantor, Charles R; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex
2016-11-16
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
Ozerov, Ivan V.; Lezhnina, Ksenia V.; Izumchenko, Evgeny; Artemov, Artem V.; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N.; Labat, Ivan; West, Michael D.; Buzdin, Anton; Cantor, Charles R.; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex
2016-01-01
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy. PMID:27848968
Tissue enrichment analysis for C. elegans genomics.
Angeles-Albores, David; N Lee, Raymond Y; Chan, Juancarlos; Sternberg, Paul W
2016-09-13
Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python's standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results.
Cell-autonomous-like silencing of GFP-partitioned transgenic Nicotiana benthamiana.
Sohn, Seong-Han; Frost, Jennifer; Kim, Yoon-Hee; Choi, Seung-Kook; Lee, Yi; Seo, Mi-Suk; Lim, Sun-Hyung; Choi, Yeonhee; Kim, Kook-Hyung; Lomonossoff, George
2014-08-01
We previously reported the novel partitioning of regional GFP-silencing on leaves of 35S-GFP transgenic plants, coining the term "partitioned silencing". We set out to delineate the mechanism of partitioned silencing. Here, we report that the partitioned plants were hemizygous for the transgene, possessing two direct-repeat copies of 35S-GFP. The detection of both siRNA expression (21 and 24 nt) and DNA methylation enrichment specifically at silenced regions indicated that both post-transcriptional gene silencing (PTGS) and transcriptional gene silencing (TGS) were involved in the silencing mechanism. Using in vivo agroinfiltration of 35S-GFP/GUS and inoculation of TMV-GFP RNA, we demonstrate that PTGS, not TGS, plays a dominant role in the partitioned silencing, concluding that the underlying mechanism of partitioned silencing is analogous to RNA-directed DNA methylation (RdDM). The initial pattern of partitioned silencing was tightly maintained in a cell-autonomous manner, although partitioned-silenced regions possess a potential for systemic spread. Surprisingly, transcriptome profiling through next-generation sequencing demonstrated that expression levels of most genes involved in the silencing pathway were similar in both GFP-expressing and silenced regions although a diverse set of region-specific transcripts were detected.This suggests that partitioned silencing can be triggered and regulated by genes other than the genes involved in the silencing pathway. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Discovering causal signaling pathways through gene-expression patterns
Parikh, Jignesh R.; Klinger, Bertram; Xia, Yu; Marto, Jarrod A.; Blüthgen, Nils
2010-01-01
High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place. Here, we present an intuitive approach for discovering upstream signaling pathways responsible for regulating these differentially expressed genes. We identify consistently regulated signature genes specific for signal transduction pathways from a panel of single-pathway perturbation experiments. An algorithm that detects overrepresentation of these signature genes in a gene group of interest is used to infer the signaling pathway responsible for regulation. We expose our novel resource and algorithm through a web server called SPEED: Signaling Pathway Enrichment using Experimental Data sets. SPEED can be freely accessed at http://speed.sys-bio.net/. PMID:20494976
The peripheral sensory nervous system in the vertebrate head: a gene regulatory perspective.
Grocott, Timothy; Tambalo, Monica; Streit, Andrea
2012-10-01
In the vertebrate head, crucial parts of the sense organs and sensory ganglia develop from special regions, the cranial placodes. Despite their cellular and functional diversity, they arise from a common field of multipotent progenitors and acquire distinct identity later under the influence of local signalling. Here we present the gene regulatory network that summarises our current understanding of how sensory cells are specified, how they become different from other ectodermal derivatives and how they begin to diversify to generate placodes with different identities. This analysis reveals how sequential activation of sets of transcription factors subdivides the ectoderm over time into smaller domains of progenitors for the central nervous system, neural crest, epidermis and sensory placodes. Within this hierarchy the timing of signalling and developmental history of each cell population is of critical importance to determine the ultimate outcome. A reoccurring theme is that local signals set up broad gene expression domains, which are further refined by mutual repression between different transcription factors. The Six and Eya network lies at the heart of sensory progenitor specification. In a positive feedback loop these factors perpetuate their own expression thus stabilising pre-placodal fate, while simultaneously repressing neural and neural crest specific factors. Downstream of the Six and Eya cassette, Pax genes in combination with other factors begin to impart regional identity to placode progenitors. While our review highlights the wealth of information available, it also points to the lack information on the cis-regulatory mechanisms that control placode specification and of how the repeated use of signalling input is integrated. Copyright © 2012. Published by Elsevier Inc.
Muñoz, Nélida; Diaz-Osorio, Miguel; Moreno, Jaime; Sánchez-Jiménez, Miryan; Cardona-Castro, Nora
2010-01-01
A multiplex real-time polymerase chain reaction procedure was developed to identify the most prevalent clinical isolates of Salmonella enterica subsp. enterica. Genes from the rfb, fliC, fljB, and viaB groups that encode the O, H, and Vi antigens were used to design 15 primer pairs and TaqMan probes specific for the genes rfbJ, wzx, fliC, fljB, wcdB, the sdf-l sequence, and invA, which was used as an internal amplification control. The primers and probes were variously combined into six sets. The first round of reactions used two of these sets to detect Salmonella O:4, O:9, O:7, O:8, and O:3,10 serogroups. Once the serogroups were identified, the results of a second round of reactions that used primers and probes for the flagellar antigen l genes, 1,2; e,h; g,m; d; e,n,x; and z10, and the Vi gene were used to identify individual serovars. The procedure was standardized using 18 Salmonella reference strains and other enterobacteria. The procedure's reliability and sensitivity was evaluated using 267 randomly chosen serotyped Salmonella clinical isolates. The procedure had a sensitivity of 95.5% and was 100% specific. Thus, our technique is a quick, sensitive, reliable, and specific means of identifying S. enterica serovars and can be used in conjunction with traditional serotyping. Other primer and probe combinations could be used to increase the number of identifiable serovars. PMID:20110454
Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.
Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K
2011-01-01
Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.
Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M
2005-08-01
The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. (c) 2005 Wiley-Liss, Inc.
Blevins, Tana; Aliev, Fazil; Adkins, Amy; Hack, Laura; Bigdeli, Tim; D. van der Vaart, Andrew; Web, Bradley Todd; Bacanu, Silviu-Alin; Kalsi, Gursharan; Kendler, Kenneth S.; Miles, Michael F.; Dick, Danielle; Riley, Brien P.; Dumur, Catherine; Vladimirov, Vladimir I.
2015-01-01
Alcohol consumption is known to lead to gene expression changes in the brain. After performing weighted gene co-expression network analyses (WGCNA) on genome-wide mRNA and microRNA (miRNA) expression in Nucleus Accumbens (NAc) of subjects with alcohol dependence (AD; N = 18) and of matched controls (N = 18), six mRNA and three miRNA modules significantly correlated with AD were identified (Bonferoni-adj. p≤ 0.05). Cell-type-specific transcriptome analyses revealed two of the mRNA modules to be enriched for neuronal specific marker genes and downregulated in AD, whereas the remaining four mRNA modules were enriched for astrocyte and microglial specific marker genes and upregulated in AD. Gene set enrichment analysis demonstrated that neuronal specific modules were enriched for genes involved in oxidative phosphorylation, mitochondrial dysfunction and MAPK signaling. Glial-specific modules were predominantly enriched for genes involved in processes related to immune functions, i.e. cytokine signaling (all adj. p≤ 0.05). In mRNA and miRNA modules, 461 and 25 candidate hub genes were identified, respectively. In contrast to the expected biological functions of miRNAs, correlation analyses between mRNA and miRNA hub genes revealed a higher number of positive than negative correlations (χ2 test p≤ 0.0001). Integration of hub gene expression with genome-wide genotypic data resulted in 591 mRNA cis-eQTLs and 62 miRNA cis-eQTLs. mRNA cis-eQTLs were significantly enriched for AD diagnosis and AD symptom counts (adj. p = 0.014 and p = 0.024, respectively) in AD GWAS signals in a large, independent genetic sample from the Collaborative Study on Genetics of Alcohol (COGA). In conclusion, our study identified putative gene network hubs coordinating mRNA and miRNA co-expression changes in the NAc of AD subjects, and our genetic (cis-eQTL) analysis provides novel insights into the etiological mechanisms of AD. PMID:26381263
Genome wide predictions of miRNA regulation by transcription factors.
Ruffalo, Matthew; Bar-Joseph, Ziv
2016-09-01
Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ zivbj@cs.cmu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Development of a cross-platform biomarker signature to detect renal transplant tolerance in humans
Sagoo, Pervinder; Perucha, Esperanza; Sawitzki, Birgit; Tomiuk, Stefan; Stephens, David A.; Miqueu, Patrick; Chapman, Stephanie; Craciun, Ligia; Sergeant, Ruhena; Brouard, Sophie; Rovis, Flavia; Jimenez, Elvira; Ballow, Amany; Giral, Magali; Rebollo-Mesa, Irene; Le Moine, Alain; Braudeau, Cecile; Hilton, Rachel; Gerstmayer, Bernhard; Bourcier, Katarzyna; Sharif, Adnan; Krajewska, Magdalena; Lord, Graham M.; Roberts, Ian; Goldman, Michel; Wood, Kathryn J.; Newell, Kenneth; Seyfert-Margolis, Vicki; Warrens, Anthony N.; Janssen, Uwe; Volk, Hans-Dieter; Soulillou, Jean-Paul; Hernandez-Fuentes, Maria P.; Lechler, Robert I.
2010-01-01
Identifying transplant recipients in whom immunological tolerance is established or is developing would allow an individually tailored approach to their posttransplantation management. In this study, we aimed to develop reliable and reproducible in vitro assays capable of detecting tolerance in renal transplant recipients. Several biomarkers and bioassays were screened on a training set that included 11 operationally tolerant renal transplant recipients, recipient groups following different immunosuppressive regimes, recipients undergoing chronic rejection, and healthy controls. Highly predictive assays were repeated on an independent test set that included 24 tolerant renal transplant recipients. Tolerant patients displayed an expansion of peripheral blood B and NK lymphocytes, fewer activated CD4+ T cells, a lack of donor-specific antibodies, donor-specific hyporesponsiveness of CD4+ T cells, and a high ratio of forkhead box P3 to α-1,2-mannosidase gene expression. Microarray analysis further revealed in tolerant recipients a bias toward differential expression of B cell–related genes and their associated molecular pathways. By combining these indices of tolerance as a cross-platform biomarker signature, we were able to identify tolerant recipients in both the training set and the test set. This study provides an immunological profile of the tolerant state that, with further validation, should inform and shape drug-weaning protocols in renal transplant recipients. PMID:20501943
Molecular Evolution of Phosphoprotein Phosphatases in Drosophila
Miskei, Márton; Ádám, Csaba; Kovács, László; Karányi, Zsolt; Dombrádi, Viktor
2011-01-01
Phosphoprotein phosphatases (PPP), these ancient and important regulatory enzymes are present in all eukaryotic organisms. Based on the genome sequences of 12 Drosophila species we traced the evolution of the PPP catalytic subunits and noted a substantial expansion of the gene family. We concluded that the 18–22 PPP genes of Drosophilidae were generated from a core set of 8 indispensable phosphatases that are present in most of the insects. Retropositons followed by tandem gene duplications extended the phosphatase repertoire, and sporadic gene losses contributed to the species specific variations in the PPP complement. During the course of these studies we identified 5, up till now uncharacterized phosphatase retrogenes: PpY+, PpD5+, PpD6+, Pp4+, and Pp6+ which are found only in some ancient Drosophila. We demonstrated that all of these new PPP genes exhibit a distinct male specific expression. In addition to the changes in gene numbers, the intron-exon structure and the chromosomal localization of several PPP genes was also altered during evolution. The G−C content of the coding regions decreased when a gene moved into the heterochromatic region of chromosome Y. Thus the PPP enzymes exemplify the various types of dynamic rearrangements that accompany the molecular evolution of a gene family in Drosophilidae. PMID:21789237
Characterization of embryo-specific genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1989-01-01
The objective of the proposed research is to characterize the structure and function of a set of genes whose expression is regulated in embryo development, and that is not expressed in mature tissues -- the embryonic genes. In the last two years, using cDNA clones, we have isolated 22 cDNA clones, and characterized the expression pattern of their corresponding RNA. At least 4 cDNA clones detect RNAs of embryonic genes. These cDNA clones detect RNAs expressed in somatic as well as zygotic embryos of carrot. Using the cDNA clones, we screened the genomic library of carrot embryo DNA, and isolatedmore » genomic clones for three genes. The structure and function of two genes DC 8 and DC 59 have been characterized and are reported in this paper.« less
MalaCards: an integrated compendium for diseases and their annotation
Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron
2013-01-01
Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/ PMID:23584832
Le Bail, Aude; Scholz, Sebastian; Kost, Benedikt
2013-01-01
The use of the moss Physcomitrella patens as a model system to study plant development and physiology is rapidly expanding. The strategic position of P. patens within the green lineage between algae and vascular plants, the high efficiency with which transgenes are incorporated by homologous recombination, advantages associated with the haploid gametophyte representing the dominant phase of the P. patens life cycle, the simple structure of protonemata, leafy shoots and rhizoids that constitute the haploid gametophyte, as well as a readily accessible high-quality genome sequence make this moss a very attractive experimental system. The investigation of the genetic and hormonal control of P. patens development heavily depends on the analysis of gene expression patterns by real time quantitative PCR (RT qPCR). This technique requires well characterized sets of reference genes, which display minimal expression level variations under all analyzed conditions, for data normalization. Sets of suitable reference genes have been described for most widely used model systems including e.g. Arabidopsis thaliana, but not for P. patens. Here, we present a RT qPCR based comparison of transcript levels of 12 selected candidate reference genes in a range of gametophytic P. patens structures at different developmental stages, and in P. patens protonemata treated with hormones or hormone transport inhibitors. Analysis of these RT qPCR data using GeNorm and NormFinder software resulted in the identification of sets of P. patens reference genes suitable for gene expression analysis under all tested conditions, and suggested that the two best reference genes are sufficient for effective data normalization under each of these conditions. PMID:23951063
Gene expression profiles in whole blood and associations with metabolic dysregulation in obesity.
Cox, Amanda J; Zhang, Ping; Evans, Tiffany J; Scott, Rodney J; Cripps, Allan W; West, Nicholas P
Gene expression data provides one tool to gain further insight into the complex biological interactions linking obesity and metabolic disease. This study examined associations between blood gene expression profiles and metabolic disease in obesity. Whole blood gene expression profiles, performed using the Illumina HT-12v4 Human Expression Beadchip, were compared between (i) individuals with obesity (O) or lean (L) individuals (n=21 each), (ii) individuals with (M) or without (H) Metabolic Syndrome (n=11 each) matched on age and gender. Enrichment of differentially expressed genes (DEG) into biological pathways was assessed using Ingenuity Pathway Analysis. Association between sets of genes from biological pathways considered functionally relevant and Metabolic Syndrome were further assessed using an area under the curve (AUC) and cross-validated classification rate (CR). For OvL, only 50 genes were significantly differentially expressed based on the selected differential expression threshold (1.2-fold, p<0.05). For MvH, 582 genes were significantly differentially expressed (1.2-fold, p<0.05) and pathway analysis revealed enrichment of DEG into a diverse set of pathways including immune/inflammatory control, insulin signalling and mitochondrial function pathways. Gene sets from the mTOR signalling pathways demonstrated the strongest association with Metabolic Syndrome (p=8.1×10 -8 ; AUC: 0.909, CR: 72.7%). These results support the use of expression profiling in whole blood in the absence of more specific tissue types for investigations of metabolic disease. Using a pathway analysis approach it was possible to identify an enrichment of DEG into biological pathways that could be targeted for in vitro follow-up. Copyright © 2017 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
Regulation of root hair initiation and expansin gene expression in Arabidopsis
NASA Technical Reports Server (NTRS)
Cho, Hyung-Taeg; Cosgrove, Daniel J.
2002-01-01
The expression of two Arabidopsis expansin genes (AtEXP7 and AtEXP18) is tightly linked to root hair initiation; thus, the regulation of these genes was studied to elucidate how developmental, hormonal, and environmental factors orchestrate root hair formation. Exogenous ethylene and auxin, as well as separation of the root from the medium, stimulated root hair formation and the expression of these expansin genes. The effects of exogenous auxin and root separation on root hair formation required the ethylene signaling pathway. By contrast, blocking the endogenous ethylene pathway, either by genetic mutations or by a chemical inhibitor, did not affect normal root hair formation and expansin gene expression. These results indicate that the normal developmental pathway for root hair formation (i.e., not induced by external stimuli) is independent of the ethylene pathway. Promoter analyses of the expansin genes show that the same promoter elements that determine cell specificity also determine inducibility by ethylene, auxin, and root separation. Our study suggests that two distinctive signaling pathways, one developmental and the other environmental/hormonal, converge to modulate the initiation of the root hair and the expression of its specific expansin gene set.
Li, Changyan; Wei, Jing; Lin, Yongjun; Chen, Hao
2012-05-01
Resistant germplasm resources are valuable for developing resistant varieties in agricultural production. However, recessive resistance genes are usually overlooked in hybrid breeding. Compared with dominant traits, however, they may confer resistance to different pathogenic races or pest biotypes with different mechanisms of action. The recessive rice bacterial blight resistance gene xa13, also involved in pollen development, has been cloned and its resistance mechanism has been recently characterized. This report describes the conversion of bacterial blight resistance mediated by the recessive xa13 gene into a dominant trait to facilitate its use in a breeding program. This was achieved by knockdown of the corresponding dominant allele Xa13 in transgenic rice using recently developed artificial microRNA technology. Tissue-specific promoters were used to exclude most of the expression of artificial microRNA in the anther to ensure that Xa13 functioned normally during pollen development. A battery of highly bacterial blight resistant transgenic plants with normal seed setting rates were acquired, indicating that highly specific gene silencing had been achieved. Our success with xa13 provides a paradigm that can be adapted to other recessive resistance genes.
Toward an understanding of the pathophysiology of clear cell carcinoma of the ovary (Review)
UEKURI, CHIHARU; SHIGETOMI, HIROSHI; ONO, SUMIRE; SASAKI, YOSHIKAZU; MATSUURA, MIYUKI; KOBAYASHI, HIROSHI
2013-01-01
Endometriosis-associated ovarian cancers demonstrate substantial morphological and genetic diversity. The transcription factor, hepatocyte nuclear factor (HNF)-1β, may be one of several key genes involved in the identity of ovarian clear cell carcinoma (CCC). The present study reviews a considerably expanded set of HNF-1β-associated genes and proteins that determine the pathophysiology of CCC. The current literature was reviewed by searching MEDLINE/PubMed. Functional interpretations of gene expression profiling in CCC are provided. Several important CCC-related genes overlap with those known to be regulated by the upregulation of HNF-1β expression, along with a lack of estrogen receptor (ER) expression. Furthermore, the genetic expression pattern in CCC resembles that of the Arias-Stella reaction, decidualization and placentation. HNF-1β regulates a subset of progesterone target genes. HNF-1β may also act as a modulator of female reproduction, playing a role in endometrial regeneration, differentiation, decidualization, glycogen synthesis, detoxification, cell cycle regulation, implantation, uterine receptivity and a successful pregnancy. In conclusion, the present study focused on reviewing the aberrant expression of CCC-specific genes and provided an update on the pathological implications and molecular functions of well-characterized CCC-specific genes. PMID:24179489
GeneChip{sup {trademark}} screening assay for cystic fibrosis mutations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cronn, M.T.; Miyada, C.G.; Fucini, R.V.
1994-09-01
GeneChip{sup {trademark}} assays are based on high density, carefully designed arrays of short oligonucleotide probes (13-16 bases) built directly on derivatized silica substrates. DNA target sequence analysis is achieved by hybridizing fluorescently labeled amplification products to these arrays. Fluorescent hybridization signals located within the probe array are translated into target sequence information using the known probe sequence at each array feature. The mutation screening assay for cystic fibrosis includes sets of oligonucleotide probes designed to detect numerous different mutations that have been described in 14 exons and one intron of the CFTR gene. Each mutation site is addressed by amore » sub-array of at least 40 probe sequences, half designed to detect the wild type gene sequence and half designed to detect the reported mutant sequence. Hybridization with homozygous mutant, homozygous wild type or heterozygous targets results in distinctive hybridization patterns within a sub-array, permitting specific discrimination of each mutation. The GeneChip probe arrays are very small (approximately 1 cm{sup 2}). There miniature size coupled with their high information content make GeneChip probe arrays a useful and practical means for providing CF mutation analysis in a clinical setting.« less
Spliced synthetic genes as internal controls in RNA sequencing experiments.
Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R
2016-09-01
RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
NASA Astrophysics Data System (ADS)
Douglas, Joanne T.
The practical implementation of gene therapy in the clinical setting mandates gene delivery vehicles, or vectors, capable of efficient gene delivery selectively to the target disease cells. The utility of adenoviral vectors for gene therapy is restricted by their dependence on the native adenoviral primary cellular receptor for cell entry. Therefore, a number of strategies have been developed to allow CAR-independent infection of specific cell types, including the use of bispecific conjugates and genetic modifications to the adenoviral capsid proteins, in particular the fibre protein. These targeted adenoviral vectors have demonstrated efficient gene transfer in vitro , correlating with a therapeutic benefit in preclinical animal models. Such vectors are predicted to possess enhanced efficacy in human clinical studies, although anatomical barriers to their use must be circumvented.
Jenkins, Adam M.; Muskavitch, Marc A. T.
2015-01-01
We understand little about photopreference and the molecular mechanisms governing vision-dependent behavior in vector mosquitoes. Investigations of the influence of photopreference on adult mosquito behaviors such as endophagy and exophagy and endophily and exophily will enhance our ability to develop and deploy vector-targeted interventions and monitoring techniques. Our laboratory-based analyses have revealed that crepuscular period photopreference differs between An. gambiae and An. stephensi. We employed qRT-PCR to assess crepuscular transcriptional expression patterns of long wavelength-, short wavelength-, and ultraviolet wavelength-sensing opsins (i.e., rhodopsin-class G-protein coupled receptors) in An. gambiae and in An. stephensi. Transcript levels do not exhibit consistent differences between species across diurnal cycles, indicating that differences in transcript abundances within this gene set are not correlated with these behavioral differences. Using developmentally staged and gender-specific RNAseq data sets in An. gambiae, we show that long wavelength-sensing opsins are expressed in two different patterns (one set expressed during larval stages, and one set expressed during adult stages), while short wavelength- and ultraviolet wavelength-sensing opsins exhibit increased expression during adult stages. Genomic organization of An. gambiae opsins suggests paralogous gene expansion of long wavelength-sensing opsins in comparison with An. stephensi. We speculate that this difference in gene number may contribute to variation between these species in photopreference behavior (e.g., visual sensitivity). PMID:26334802
Kumar, Gulshan; Gupta, Khushboo; Pathania, Shivalika; Swarnkar, Mohit Kumar; Rattan, Usha Kumari; Singh, Gagandeep; Sharma, Ram Kumar; Singh, Anil Kumar
2017-01-01
The availability of sufficient chilling during bud dormancy plays an important role in the subsequent yield and quality of apple fruit, whereas, insufficient chilling availability negatively impacts the apple production. The transcriptome profiling during bud dormancy release and initial fruit set under low and high chill conditions was performed using RNA-seq. The comparative high number of differentially expressed genes during bud break and fruit set under high chill condition indicates that chilling availability was associated with transcriptional reorganization. The comparative analysis reveals the differential expression of genes involved in phytohormone metabolism, particularly for Abscisic acid, gibberellic acid, ethylene, auxin and cytokinin. The expression of Dormancy Associated MADS-box, Flowering Locus C-like, Flowering Locus T-like and Terminal Flower 1-like genes was found to be modulated under differential chilling. The co-expression network analysis indentified two high chill specific modules that were found to be enriched for “post-embryonic development” GO terms. The network analysis also identified hub genes including Early flowering 7, RAF10, ZEP4 and F-box, which may be involved in regulating chilling-mediated dormancy release and fruit set. The results of transcriptome and co-expression network analysis indicate that chilling availability majorly regulates phytohormone-related pathways and post-embryonic development during bud break. PMID:28198417
da Rocha, Ricardo Fagundes; De Bastiani, Marco Antônio; Klamt, Fábio
2014-11-01
Atherosclerosis is a pro-inflammatory process intrinsically related to systemic redox impairments. Macrophages play a major role on disease development. The specific involvement of classically activated, M1 (pro-inflammatory), or the alternatively activated, M2 (anti-inflammatory), on plaque formation and disease progression are still not established. Thus, based on meta-data analysis of public micro-array datasets, we compared differential gene expression levels of the human antioxidant genes (HAG) and M1/M2 genes between early and advanced human atherosclerotic plaques, and among peripheric macrophages (with or without foam cells induction by oxidized low density lipoprotein, oxLDL) from healthy and atherosclerotic subjects. Two independent datasets, GSE28829 and GSE9874, were selected from gene expression omnibus (http://www.ncbi.nlm.nih.gov/geo/) repository. Functional interactions were obtained with STRING (http://string-db.org/) and Medusa (http://coot.embl.de/medusa/). Statistical analysis was performed with ViaComplex(®) (http://lief.if.ufrgs.br/pub/biosoftwares/viacomplex/) and gene score enrichment analysis (http://www.broadinstitute.org/gsea/index.jsp). Bootstrap analysis demonstrated that the activity (expression) of HAG and M1 gene sets were significantly increased in advance compared to early atherosclerotic plaque. Increased expressions of HAG, M1, and M2 gene sets were found in peripheric macrophages from atherosclerotic subjects compared to peripheric macrophages from healthy subjects, while only M1 gene set was increased in foam cells from atherosclerotic subjects compared to foam cells from healthy subjects. However, M1 gene set was decreased in foam cells from healthy subjects compared to peripheric macrophages from healthy subjects, while no differences were found in foam cells from atherosclerotic subjects compared to peripheric macrophages from atherosclerotic subjects. Our data suggest that, different to cancer, in atherosclerosis there is no M1 or M2 polarization of macrophages. Actually, M1 and M2 phenotype are equally induced, what is an important aspect to better understand the disease progression, and can help to develop new therapeutic approaches.
Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.
Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew
2012-08-08
Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong
2015-01-01
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.
Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong
2015-01-01
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378
Liscovitch, Noa; Bazak, Lily; Levanon, Erez Y; Chechik, Gal
2014-01-01
A-to-I RNA editing by adenosine deaminases acting on RNA is a post-transcriptional modification that is crucial for normal life and development in vertebrates. RNA editing has been shown to be very abundant in the human transcriptome, specifically at the primate-specific Alu elements. The functional role of this wide-spread effect is still not clear; it is believed that editing of transcripts is a mechanism for their down-regulation via processes such as nuclear retention or RNA degradation. Here we combine 2 neural gene expression datasets with genome-level editing information to examine the relation between the expression of ADAR genes with the expression of their target genes. Specifically, we computed the spatial correlation across structures of post-mortem human brains between ADAR and a large set of targets that were found to be edited in their Alu repeats. Surprisingly, we found that a large fraction of the edited genes are positively correlated with ADAR, opposing the assumption that editing would reduce expression. When considering the correlations between ADAR and its targets over development, 2 gene subsets emerge, positively correlated and negatively correlated with ADAR expression. Specifically, in embryonic time points, ADAR is positively correlated with many genes related to RNA processing and regulation of gene expression. These findings imply that the suggested mechanism of regulation of expression by editing is probably not a global one; ADAR expression does not have a genome wide effect reducing the expression of editing targets. It is possible, however, that RNA editing by ADAR in non-coding regions of the gene might be a part of a more complex expression regulation mechanism. PMID:25692240
Liscovitch, Noa; Bazak, Lily; Levanon, Erez Y; Chechik, Gal
2014-01-01
A-to-I RNA editing by adenosine deaminases acting on RNA is a post-transcriptional modification that is crucial for normal life and development in vertebrates. RNA editing has been shown to be very abundant in the human transcriptome, specifically at the primate-specific Alu elements. The functional role of this wide-spread effect is still not clear; it is believed that editing of transcripts is a mechanism for their down-regulation via processes such as nuclear retention or RNA degradation. Here we combine 2 neural gene expression datasets with genome-level editing information to examine the relation between the expression of ADAR genes with the expression of their target genes. Specifically, we computed the spatial correlation across structures of post-mortem human brains between ADAR and a large set of targets that were found to be edited in their Alu repeats. Surprisingly, we found that a large fraction of the edited genes are positively correlated with ADAR, opposing the assumption that editing would reduce expression. When considering the correlations between ADAR and its targets over development, 2 gene subsets emerge, positively correlated and negatively correlated with ADAR expression. Specifically, in embryonic time points, ADAR is positively correlated with many genes related to RNA processing and regulation of gene expression. These findings imply that the suggested mechanism of regulation of expression by editing is probably not a global one; ADAR expression does not have a genome wide effect reducing the expression of editing targets. It is possible, however, that RNA editing by ADAR in non-coding regions of the gene might be a part of a more complex expression regulation mechanism.
Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K
2015-06-04
Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.
Specificity, Privacy, and Degeneracy in the CD4 T Cell Receptor Repertoire Following Immunization
Sun, Yuxin; Best, Katharine; Cinelli, Mattia; Heather, James M.; Reich-Zeliger, Shlomit; Shifrut, Eric; Friedman, Nir; Shawe-Taylor, John; Chain, Benny
2017-01-01
T cells recognize antigen using a large and diverse set of antigen-specific receptors created by a complex process of imprecise somatic cell gene rearrangements. In response to antigen-/receptor-binding-specific T cells then divide to form memory and effector populations. We apply high-throughput sequencing to investigate the global changes in T cell receptor sequences following immunization with ovalbumin (OVA) and adjuvant, to understand how adaptive immunity achieves specificity. Each immunized mouse contained a predominantly private but related set of expanded CDR3β sequences. We used machine learning to identify common patterns which distinguished repertoires from mice immunized with adjuvant with and without OVA. The CDR3β sequences were deconstructed into sets of overlapping contiguous amino acid triplets. The frequencies of these motifs were used to train the linear programming boosting (LPBoost) algorithm LPBoost to classify between TCR repertoires. LPBoost could distinguish between the two classes of repertoire with accuracies above 80%, using a small subset of triplet sequences present at defined positions along the CDR3. The results suggest a model in which such motifs confer degenerate antigen specificity in the context of a highly diverse and largely private set of T cell receptors. PMID:28450864
Gene set analysis of purine and pyrimidine antimetabolites cancer therapies.
Fridley, Brooke L; Batzler, Anthony; Li, Liang; Li, Fang; Matimba, Alice; Jenkins, Gregory D; Ji, Yuan; Wang, Liewei; Weinshilboum, Richard M
2011-11-01
Responses to therapies, either with regard to toxicities or efficacy, are expected to involve complex relationships of gene products within the same molecular pathway or functional gene set. Therefore, pathways or gene sets, as opposed to single genes, may better reflect the true underlying biology and may be more appropriate units for analysis of pharmacogenomic studies. Application of such methods to pharmacogenomic studies may enable the detection of more subtle effects of multiple genes in the same pathway that may be missed by assessing each gene individually. A gene set analysis of 3821 gene sets is presented assessing the association between basal messenger RNA expression and drug cytotoxicity using ethnically defined human lymphoblastoid cell lines for two classes of drugs: pyrimidines [gemcitabine (dFdC) and arabinoside] and purines [6-thioguanine and 6-mercaptopurine]. The gene set nucleoside-diphosphatase activity was found to be significantly associated with both dFdC and arabinoside, whereas gene set γ-aminobutyric acid catabolic process was associated with dFdC and 6-thioguanine. These gene sets were significantly associated with the phenotype even after adjusting for multiple testing. In addition, five associated gene sets were found in common between the pyrimidines and two gene sets for the purines (3',5'-cyclic-AMP phosphodiesterase activity and γ-aminobutyric acid catabolic process) with a P value of less than 0.0001. Functional validation was attempted with four genes each in gene sets for thiopurine and pyrimidine antimetabolites. All four genes selected from the pyrimidine gene sets (PSME3, CANT1, ENTPD6, ADRM1) were validated, but only one (PDE4D) was validated for the thiopurine gene sets. In summary, results from the gene set analysis of pyrimidine and purine therapies, used often in the treatment of various cancers, provide novel insight into the relationship between genomic variation and drug response.
MAGMA: Generalized Gene-Set Analysis of GWAS Data
de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle
2015-01-01
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710
MAGMA: generalized gene-set analysis of GWAS data.
de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle
2015-04-01
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.
Lapébie, Pascal; Ruggiero, Antonella; Barreau, Carine; Chevalier, Sandra; Chang, Patrick; Dru, Philippe; Houliston, Evelyn; Momose, Tsuyoshi
2014-01-01
We have used Digital Gene Expression analysis to identify, without bilaterian bias, regulators of cnidarian embryonic patterning. Transcriptome comparison between un-manipulated Clytia early gastrula embryos and ones in which the key polarity regulator Wnt3 was inhibited using morpholino antisense oligonucleotides (Wnt3-MO) identified a set of significantly over and under-expressed transcripts. These code for candidate Wnt signaling modulators, orthologs of other transcription factors, secreted and transmembrane proteins known as developmental regulators in bilaterian models or previously uncharacterized, and also many cnidarian-restricted proteins. Comparisons between embryos injected with morpholinos targeting Wnt3 and its receptor Fz1 defined four transcript classes showing remarkable correlation with spatiotemporal expression profiles. Class 1 and 3 transcripts tended to show sustained expression at “oral” and “aboral” poles respectively of the developing planula larva, class 2 transcripts in cells ingressing into the endodermal region during gastrulation, while class 4 gene expression was repressed at the early gastrula stage. The preferential effect of Fz1-MO on expression of class 2 and 4 transcripts can be attributed to Planar Cell Polarity (PCP) disruption, since it was closely matched by morpholino knockdown of the specific PCP protein Strabismus. We conclude that endoderm and post gastrula-specific gene expression is particularly sensitive to PCP disruption while Wnt-/β-catenin signaling dominates gene regulation along the oral-aboral axis. Phenotype analysis using morpholinos targeting a subset of transcripts indicated developmental roles consistent with expression profiles for both conserved and cnidarian-restricted genes. Overall our unbiased screen allowed systematic identification of regionally expressed genes and provided functional support for a shared eumetazoan developmental regulatory gene set with both predicted and previously unexplored members, but also demonstrated that fundamental developmental processes including axial patterning and endoderm formation in cnidarians can involve newly evolved (or highly diverged) genes. PMID:25233086
Lapébie, Pascal; Ruggiero, Antonella; Barreau, Carine; Chevalier, Sandra; Chang, Patrick; Dru, Philippe; Houliston, Evelyn; Momose, Tsuyoshi
2014-09-01
We have used Digital Gene Expression analysis to identify, without bilaterian bias, regulators of cnidarian embryonic patterning. Transcriptome comparison between un-manipulated Clytia early gastrula embryos and ones in which the key polarity regulator Wnt3 was inhibited using morpholino antisense oligonucleotides (Wnt3-MO) identified a set of significantly over and under-expressed transcripts. These code for candidate Wnt signaling modulators, orthologs of other transcription factors, secreted and transmembrane proteins known as developmental regulators in bilaterian models or previously uncharacterized, and also many cnidarian-restricted proteins. Comparisons between embryos injected with morpholinos targeting Wnt3 and its receptor Fz1 defined four transcript classes showing remarkable correlation with spatiotemporal expression profiles. Class 1 and 3 transcripts tended to show sustained expression at "oral" and "aboral" poles respectively of the developing planula larva, class 2 transcripts in cells ingressing into the endodermal region during gastrulation, while class 4 gene expression was repressed at the early gastrula stage. The preferential effect of Fz1-MO on expression of class 2 and 4 transcripts can be attributed to Planar Cell Polarity (PCP) disruption, since it was closely matched by morpholino knockdown of the specific PCP protein Strabismus. We conclude that endoderm and post gastrula-specific gene expression is particularly sensitive to PCP disruption while Wnt-/β-catenin signaling dominates gene regulation along the oral-aboral axis. Phenotype analysis using morpholinos targeting a subset of transcripts indicated developmental roles consistent with expression profiles for both conserved and cnidarian-restricted genes. Overall our unbiased screen allowed systematic identification of regionally expressed genes and provided functional support for a shared eumetazoan developmental regulatory gene set with both predicted and previously unexplored members, but also demonstrated that fundamental developmental processes including axial patterning and endoderm formation in cnidarians can involve newly evolved (or highly diverged) genes.
Liu, Bing; Wei, Gang; Shi, Jinlei; Jin, Jing; Shen, Ting; Ni, Ting; Shen, Wen-Hui; Yu, Yu; Dong, Aiwu
2016-04-01
As a key epigenetic modification, the methylation of histone H3 lysine 36 (H3K36) modulates chromatin structure and is involved in diverse biological processes. To better understand the language of H3K36 methylation in rice (Oryza sativa), we chose potential histone methylation enzymes for functional exploration. In particular, we characterized rice SET DOMAIN GROUP 708 (SDG708) as an H3K36-specific methyltransferase possessing the ability to deposit up to three methyl groups on H3K36. Compared with the wild-type, SDG708-knockdown rice mutants displayed a late-flowering phenotype under both long-day and short-day conditions because of the down-regulation of the key flowering regulatory genes Heading date 3a (Hd3a), RICE FLOWERING LOCUS T1 (RFT1), and Early heading date 1 (Ehd1). Chromatin immunoprecipitation experiments indicated that H3K36me1, H3K36me2, and H3K36me3 levels were reduced at these loci in SDG708-deficient plants. More importantly, SDG708 was able to directly target and effect H3K36 methylation on specific flowering genes. In fact, knockdown of SDG708 led to misexpression of a set of functional genes and a genome-wide decrease in H3K36me1/2/3 levels during the early growth stages of rice. SDG708 is a methyltransferase that catalyses genome-wide deposition of all three methyl groups on H3K36 and is involved in many biological processes in addition to flowering promotion. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
NASA Astrophysics Data System (ADS)
Christensen, G. A.; Wymore, A. M.; King, A. J.; Podar, M.; Hurt, R. A., Jr.; Santillan, E. F. U.; Gilmour, C. C.; Brandt, C. C.; Brown, S. D.; Palumbo, A. V.; Elias, D. A.
2015-12-01
Two proteins (HgcA and HgcB) have been determined to be essential for mercury (Hg)-methylation and either one alone is not sufficient for this process. Detection and quantification of these genes to determine at risk environments is critical. Universal degenerate polymerase chain reaction (PCR) primers spanning hgcAB were developed to ascertain organismal diversity and validate that both genes were present as an established prerequisite for Hg-methylation. To confirm this approach, an extensive set of pure cultures with published genomes (including methylators and non-methylators: 13 Deltaproteobacteria, 9 Firmicutes, and 10 methanogenic Archaea) were assayed with the newly designed universal hgcAB primer set. A single band within an agarose gel was observed for the majority of the cultures with known hgcAB and confirmed via Sanger sequencing. For environmental applications, once the potential for Hg-methylation is established from PCR amplification with the universal hgcAB primer set, quantification of clade-specific hgcAB gene abundance is desirable. We developed quantitative polymerase chain reaction (qPCR) degenerate primers targeting hgcA from each of the three dominate clades (Deltaproteobacteria, Firmicutes and methanogenic Archaea) known to be associated with anaerobic Hg-methylation. The qPCR primers amplify virtually all hgcA positive cultures overall and are specific for their designed clade. Finally, to ensure the procedure is robust and sensitive in complex environmental matrices, cells from all clades were mixed in different combinations and ratios to assess qPCR primer specificity. The development and validation of these high fidelity quantitative molecular tools now allows for rapid and accurate risk management assessment in any environment.
Baum, K. G.; Menezes, G.; Helguera, M.
2011-01-01
Medical imaging system simulators are tools that provide a means to evaluate system architecture and create artificial image sets that are appropriate for specific applications. We have modified SIMRI, a Bloch equation-based magnetic resonance image simulator, in order to successfully generate high-resolution 3D MR images of the Montreal brain phantom using Blue Gene/L systems. Results show that redistribution of the workload allows an anatomically accurate 256 3 voxel spin-echo simulation in less than 5 hours when executed on an 8192-node partition of a Blue Gene/L system.
Baum, K G; Menezes, G; Helguera, M
2011-01-01
Medical imaging system simulators are tools that provide a means to evaluate system architecture and create artificial image sets that are appropriate for specific applications. We have modified SIMRI, a Bloch equation-based magnetic resonance image simulator, in order to successfully generate high-resolution 3D MR images of the Montreal brain phantom using Blue Gene/L systems. Results show that redistribution of the workload allows an anatomically accurate 256(3) voxel spin-echo simulation in less than 5 hours when executed on an 8192-node partition of a Blue Gene/L system.
Platt, James L.; Rogers, Benjamin J.; Rogers, Kelley C.; Harwood, Adrian J.; Kimmel, Alan R.
2013-01-01
Control of chromatin structure is crucial for multicellular development and regulation of cell differentiation. The CHD (chromodomain-helicase-DNA binding) protein family is one of the major ATP-dependent, chromatin remodeling factors that regulate nucleosome positioning and access of transcription factors and RNA polymerase to the eukaryotic genome. There are three mammalian CHD subfamilies and their impaired functions are associated with several human diseases. Here, we identify three CHD orthologs (ChdA, ChdB and ChdC) in Dictyostelium discoideum. These CHDs are expressed throughout development, but with unique patterns. Null mutants lacking each CHD have distinct phenotypes that reflect their expression patterns and suggest functional specificity. Accordingly, using genome-wide (RNA-seq) transcriptome profiling for each null strain, we show that the different CHDs regulate distinct gene sets during both growth and development. ChdC is an apparent ortholog of the mammalian Class III CHD group that is associated with the human CHARGE syndrome, and GO analyses of aberrant gene expression in chdC nulls suggest defects in both cell-autonomous and non-autonomous signaling, which have been confirmed through analyses of chdC nulls developed in pure populations or with low levels of wild-type cells. This study provides novel insight into the broad function of CHDs in the regulation development and disease, through chromatin-mediated changes in directed gene expression. PMID:24301467
Long noncoding RNAs as enhancers of gene expression.
Ørom, U A; Derrien, T; Guigo, R; Shiekhattar, R
2010-01-01
The human genome contains thousands of long noncoding RNAs (ncRNAs) transcribed from diverse genomic locations. A large set of long ncRNAs is transcribed independent of protein-coding genes. We have used the GENCODE annotation of the human genome to identify 3019 long ncRNAs expressed in various human cell lines and tissue. This set of long ncRNAs responds to differentiation signals in primary human keratinocytes and is coexpressed with important regulators of keratinocyte development. Depletion of a number of these long ncRNAs leads to the repression of specific genes in their surrounding locus, supportive of an activating function for ncRNAs. Using reporter assays, we confirmed such activating function and show that such transcriptional enhancement is mediated through the long ncRNA transcripts. Our studies show that long ncRNAs exhibit functions similar to classically defined enhancers, through an RNA-dependent mechanism.
Disentangling the multigenic and pleiotropic nature of molecular function
2015-01-01
Background Biological processes at the molecular level are usually represented by molecular interaction networks. Function is organised and modularity identified based on network topology, however, this approach often fails to account for the dynamic and multifunctional nature of molecular components. For example, a molecule engaging in spatially or temporally independent functions may be inappropriately clustered into a single functional module. To capture biologically meaningful sets of interacting molecules, we use experimentally defined pathways as spatial/temporal units of molecular activity. Results We defined functional profiles of Saccharomyces cerevisiae based on a minimal set of Gene Ontology terms sufficient to represent each pathway's genes. The Gene Ontology terms were used to annotate 271 pathways, accounting for pathway multi-functionality and gene pleiotropy. Pathways were then arranged into a network, linked by shared functionality. Of the genes in our data set, 44% appeared in multiple pathways performing a diverse set of functions. Linking pathways by overlapping functionality revealed a modular network with energy metabolism forming a sparse centre, surrounded by several denser clusters comprised of regulatory and metabolic pathways. Signalling pathways formed a relatively discrete cluster connected to the centre of the network. Genetic interactions were enriched within the clusters of pathways by a factor of 5.5, confirming the organisation of our pathway network is biologically significant. Conclusions Our representation of molecular function according to pathway relationships enables analysis of gene/protein activity in the context of specific functional roles, as an alternative to typical molecule-centric graph-based methods. The pathway network demonstrates the cooperation of multiple pathways to perform biological processes and organises pathways into functionally related clusters with interdependent outcomes. PMID:26678917
Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay
2004-01-01
Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
Fakhro, Khalid A.; Choi, Murim; Ware, Stephanie M.; Belmont, John W.; Towbin, Jeffrey A.; Lifton, Richard P.; Khokha, Mustafa K.; Brueckner, Martina
2011-01-01
Dominant human genetic diseases that impair reproductive fitness and have high locus heterogeneity constitute a problem for gene discovery because the usual criterion of finding more mutations in specific genes than expected by chance may require extremely large populations. Heterotaxy (Htx), a congenital heart disease resulting from abnormalities in left-right (LR) body patterning, has features suggesting that many cases fall into this category. In this setting, appropriate model systems may provide a means to support implication of specific genes. By high-resolution genotyping of 262 Htx subjects and 991 controls, we identify a twofold excess of subjects with rare genic copy number variations in Htx (14.5% vs. 7.4%, P = 1.5 × 10−4). Although 7 of 45 Htx copy number variations were large chromosomal abnormalities, 38 smaller copy number variations altered a total of 61 genes, 22 of which had Xenopus orthologs. In situ hybridization identified 7 of these 22 genes with expression in the ciliated LR organizer (gastrocoel roof plate), a marked enrichment compared with 40 of 845 previously studied genes (sevenfold enrichment, P < 10−6). Morpholino knockdown in Xenopus of Htx candidates demonstrated that five (NEK2, ROCK2, TGFBR2, GALNT11, and NUP188) strongly disrupted both morphological LR development and expression of pitx2, a molecular marker of LR patterning. These effects were specific, because 0 of 13 control genes from rare Htx or control copy number variations produced significant LR abnormalities (P = 0.001). These findings identify genes not previously implicated in LR patterning. PMID:21282601
Fakhro, Khalid A; Choi, Murim; Ware, Stephanie M; Belmont, John W; Towbin, Jeffrey A; Lifton, Richard P; Khokha, Mustafa K; Brueckner, Martina
2011-02-15
Dominant human genetic diseases that impair reproductive fitness and have high locus heterogeneity constitute a problem for gene discovery because the usual criterion of finding more mutations in specific genes than expected by chance may require extremely large populations. Heterotaxy (Htx), a congenital heart disease resulting from abnormalities in left-right (LR) body patterning, has features suggesting that many cases fall into this category. In this setting, appropriate model systems may provide a means to support implication of specific genes. By high-resolution genotyping of 262 Htx subjects and 991 controls, we identify a twofold excess of subjects with rare genic copy number variations in Htx (14.5% vs. 7.4%, P = 1.5 × 10(-4)). Although 7 of 45 Htx copy number variations were large chromosomal abnormalities, 38 smaller copy number variations altered a total of 61 genes, 22 of which had Xenopus orthologs. In situ hybridization identified 7 of these 22 genes with expression in the ciliated LR organizer (gastrocoel roof plate), a marked enrichment compared with 40 of 845 previously studied genes (sevenfold enrichment, P < 10(-6)). Morpholino knockdown in Xenopus of Htx candidates demonstrated that five (NEK2, ROCK2, TGFBR2, GALNT11, and NUP188) strongly disrupted both morphological LR development and expression of pitx2, a molecular marker of LR patterning. These effects were specific, because 0 of 13 control genes from rare Htx or control copy number variations produced significant LR abnormalities (P = 0.001). These findings identify genes not previously implicated in LR patterning.
Response of Desulfovibrio vulgaris to Alkaline Stress
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stolyar, S.; He, Q.; He, Z.
2007-11-30
The response of exponentially growing Desulfovibrio vulgarisHildenborough to pH 10 stress was studied using oligonucleotidemicroarrays and a study set of mutants with genes suggested by microarraydata to be involved in the alkaline stress response deleted. The datashowed that the response of D. vulgaris to increased pH is generallysimilar to that of Escherichia coli but is apparently controlled byunique regulatory circuits since the alternative sigma factors (sigma Sand sigma E) contributing to this stress response in E. coli appear to beabsent in D. vulgaris. Genes previously reported to be up-regulated in E.coli were up-regulated in D. vulgaris; these genes included threemore » ATPasegenes and a tryptophan synthase gene. Transcription of chaperone andprotease genes (encoding ATP-dependent Clp and La proteases and DnaK) wasalso elevated in D. vulgaris. As in E. coli, genes involved in flagellumsynthesis were down-regulated. The transcriptional data also identifiedregulators, distinct from sigma S and sigma E, that are likely part of aD. vulgaris Hildenborough-specific stress response system.Characterization of a study set of mutants with genes implicated inalkaline stress response deleted confirmed that there was protectiveinvolvement of the sodium/proton antiporter NhaC-2, tryptophanase A, andtwo putative regulators/histidine kinases (DVU0331 andDVU2580).« less
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries
Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P
2008-01-01
Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects. PMID:18402700
Graphite Web: web tool for gene set analysis exploiting pathway topology
Sales, Gabriele; Calura, Enrica; Martini, Paolo; Romualdi, Chiara
2013-01-01
Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills. Here we present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases. Graphite Web is freely available at http://graphiteweb.bio.unipd.it/. PMID:23666626
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.
Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P
2008-04-10
Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.
Johnson, Emma C; Border, Richard; Melroy-Greif, Whitney E; de Leeuw, Christiaan A; Ehringer, Marissa A; Keller, Matthew C
2017-11-15
A recent analysis of 25 historical candidate gene polymorphisms for schizophrenia in the largest genome-wide association study conducted to date suggested that these commonly studied variants were no more associated with the disorder than would be expected by chance. However, the same study identified other variants within those candidate genes that demonstrated genome-wide significant associations with schizophrenia. As such, it is possible that variants within historic schizophrenia candidate genes are associated with schizophrenia at levels above those expected by chance, even if the most-studied specific polymorphisms are not. The present study used association statistics from the largest schizophrenia genome-wide association study conducted to date as input to a gene set analysis to investigate whether variants within schizophrenia candidate genes are enriched for association with schizophrenia. As a group, variants in the most-studied candidate genes were no more associated with schizophrenia than were variants in control sets of noncandidate genes. While a small subset of candidate genes did appear to be significantly associated with schizophrenia, these genes were not particularly noteworthy given the large number of more strongly associated noncandidate genes. The history of schizophrenia research should serve as a cautionary tale to candidate gene investigators examining other phenotypes: our findings indicate that the most investigated candidate gene hypotheses of schizophrenia are not well supported by genome-wide association studies, and it is likely that this will be the case for other complex traits as well. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Barsalobres-Cavallari, Carla F; Severino, Fábio E; Maluf, Mirian P; Maia, Ivan G
2009-01-01
Background Quantitative data from gene expression experiments are often normalized by transcription levels of reference or housekeeping genes. An inherent assumption for their use is that the expression of these genes is highly uniform in living organisms during various phases of development, in different cell types and under diverse environmental conditions. To date, the validation of reference genes in plants has received very little attention and suitable reference genes have not been defined for a great number of crop species including Coffea arabica. The aim of the research reported herein was to compare the relative expression of a set of potential reference genes across different types of tissue/organ samples of coffee. We also validated the expression profiles of the selected reference genes at various stages of development and under a specific biotic stress. Results The expression levels of five frequently used housekeeping genes (reference genes), namely alcohol dehydrogenase (adh), 14-3-3, polyubiquitin (poly), β-actin (actin) and glyceraldehyde-3-phosphate dehydrogenase (gapdh) was assessed by quantitative real-time RT-PCR over a set of five tissue/organ samples (root, stem, leaf, flower, and fruits) of Coffea arabica plants. In addition to these commonly used internal controls, three other genes encoding a cysteine proteinase (cys), a caffeine synthase (ccs) and the 60S ribosomal protein L7 (rpl7) were also tested. Their stability and suitability as reference genes were validated by geNorm, NormFinder and BestKeeper programs. The obtained results revealed significantly variable expression levels of all reference genes analyzed, with the exception of gapdh, which showed no significant changes in expression among the investigated experimental conditions. Conclusion Our data suggests that the expression of housekeeping genes is not completely stable in coffee. Based on our results, gapdh, followed by 14-3-3 and rpl7 were found to be homogeneously expressed and are therefore adequate for normalization purposes, showing equivalent transcript levels in different tissue/organ samples. Gapdh is therefore the recommended reference gene for measuring gene expression in Coffea arabica. Its use will enable more accurate and reliable normalization of tissue/organ-specific gene expression studies in this important cherry crop plant. PMID:19126214
Platre, Matthieu Pierre; Barberon, Marie; Caillieux, Erwann; Colot, Vincent
2016-01-01
Summary Multicellular organisms are composed of many cell types that acquire their specific fate through a precisely controlled pattern of gene expression in time and space dictated in part by cell type-specific promoter activity. Understanding the contribution of highly specialized cell types in the development of a whole organism requires the ability to isolate or analyze different cell types separately. We have characterized and validated a large collection of root cell type-specific promoters and have generated cell type-specific marker lines. These benchmarked promoters can be readily used to evaluate cell type-specific complementation of mutant phenotypes, or to knockdown gene expression using targeted expression of artificial miRNA. We also generated vectors and characterized transgenic lines for cell type-specific induction of gene expression and cell type-specific isolation of nuclei for RNA and chromatin profiling. Vectors and seeds from transgenic Arabidopsis plants will be freely available, and will promote rapid progress in cell type-specific functional genomics. We demonstrate the power of this promoter set for analysis of complex biological processes by investigating the contribution of root cell types in the IRT1-dependent root iron uptake. Our findings revealed the complex spatial expression pattern of IRT1 in both root epidermis and phloem companion cells and the requirement for IRT1 to be expressed in both cell types for proper iron homeostasis. PMID:26662936
Auvergne, Romane M; Sim, Fraser J; Wang, Su; Chandler-Militello, Devin; Burch, Jaclyn; Al Fanek, Yazan; Davis, Danielle; Benraiss, Abdellatif; Walter, Kevin; Achanta, Pragathi; Johnson, Mahlon; Quinones-Hinojosa, Alfredo; Natesan, Sridaran; Ford, Heide L; Goldman, Steven A
2013-06-27
Glial progenitor cells (GPCs) are a potential source of malignant gliomas. We used A2B5-based sorting to extract tumorigenic GPCs from human gliomas spanning World Health Organization grades II-IV. Messenger RNA profiling identified a cohort of genes that distinguished A2B5+ glioma tumor progenitor cells (TPCs) from A2B5+ GPCs isolated from normal white matter. A core set of genes and pathways was substantially dysregulated in A2B5+ TPCs, which included the transcription factor SIX1 and its principal cofactors, EYA1 and DACH2. Small hairpin RNAi silencing of SIX1 inhibited the expansion of glioma TPCs in vitro and in vivo, suggesting a critical and unrecognized role of the SIX1-EYA1-DACH2 system in glioma genesis or progression. By comparing the expression patterns of glioma TPCs with those of normal GPCs, we have identified a discrete set of pathways by which glial tumorigenesis may be better understood and more specifically targeted. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Genes Regulated by Vitamin D in Bone Cells Are Positively Selected in East Asians
Chen, Yuan; Xue, Yali; Luiselli, Donata; Tyler-Smith, Chris; Pagani, Luca; Ayub, Qasim
2015-01-01
Vitamin D and folate are activated and degraded by sunlight, respectively, and the physiological processes they control are likely to have been targets of selection as humans expanded from Africa into Eurasia. We investigated signals of positive selection in gene sets involved in the metabolism, regulation and action of these two vitamins in worldwide populations sequenced by Phase I of the 1000 Genomes Project. Comparing allele frequency-spectrum-based summary statistics between these gene sets and matched control genes, we observed a selection signal specific to East Asians for a gene set associated with vitamin D action in bones. The selection signal was mainly driven by three genes CXXC finger protein 1 (CXXC1), low density lipoprotein receptor-related protein 5 (LRP5) and runt-related transcription factor 2 (RUNX2). Examination of population differentiation and haplotypes allowed us to identify several candidate causal regulatory variants in each gene. Four of these candidate variants (one each in CXXC1 and RUNX2 and two in LRP5) had a >70% derived allele frequency in East Asians, but were present at lower (20–60%) frequency in Europeans as well, suggesting that the adaptation might have been part of a common response to climatic and dietary changes as humans expanded out of Africa, with implications for their role in vitamin D-dependent bone mineralization and osteoporosis insurgence. We also observed haplotype sharing between East Asians, Finns and an extinct archaic human (Denisovan) sample at the CXXC1 locus, which is best explained by incomplete lineage sorting. PMID:26719974
Hackett, Justin B; Lu, Yan
2017-05-04
In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed 2 plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the β subunit of cytochrome b 559 , an essential component of PSII. The accD gene encodes the β subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly upregulated in both orrm6 mutants. The upregulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.
Monticone, Massimiliano; Daga, Antonio; Candiani, Simona; Romeo, Francesco; Mirisola, Valentina; Viaggi, Silvia; Melloni, Ilaria; Pedemonte, Simona; Zona, Gianluigi; Giaretti, Walter; Pfeffer, Ulrich; Castagnola, Patrizio
2012-08-17
Most patients affected by Glioblastoma multiforme (GBM, grade IV glioma) experience a recurrence of the disease because of the spreading of tumor cells beyond surgical boundaries. Unveiling mechanisms causing this process is a logic goal to impair the killing capacity of GBM cells by molecular targeting.We noticed that our long-term GBM cultures, established from different patients, may display two categories/types of growth behavior in an orthotopic xenograft model: expansion of the tumor mass and formation of tumor branches/nodules (nodular like, NL-type) or highly diffuse single tumor cell infiltration (HD-type). We determined by DNA microarrays the gene expression profiles of three NL-type and three HD-type long-term GBM cultures. Subsequently, individual genes with different expression levels between the two groups were identified using Significance Analysis of Microarrays (SAM). Real time RT-PCR, immunofluorescence and immunoblot analyses, were performed for a selected subgroup of regulated gene products to confirm the results obtained by the expression analysis. Here, we report the identification of a set of 34 differentially expressed genes in the two types of GBM cultures. Twenty-three of these genes encode for proteins localized to the plasma membrane and 9 of these for proteins are involved in the process of cell adhesion. This study suggests the participation in the diffuse infiltrative/invasive process of GBM cells within the CNS of a novel set of genes coding for membrane-associated proteins, which should be thus susceptible to an inhibition strategy by specific targeting.Massimiliano Monticone and Antonio Daga contributed equally to this work.