An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk
2016-11-16
Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genome-wide transcriptional profiling by microarrays provides a powerful platform for gene expression-based biomarker discovery. After their wide acceptance in human disease diagnosis, prognosis, and drug discovery, these gene signatures are increasingly being adopted for environ...
Genome-wide transcriptional profiling by microarrays provides a powerful platform for gene expression-based biomarker discovery. After their wide acceptance in human disease diagnosis, prognosis, and drug discovery, these gene signatures are increasingly being adopted for environ...
Standardized Plant Disease Evaluations will Enhance Resistance Gene Discovery
USDA-ARS?s Scientific Manuscript database
Gene discovery and marker development using DNA based tools require plant populations with well-documented phenotypes. Related crops such as apples and pears may share a number of genes, for example resistance to common diseases, and data mining in one crop may reveal genes for the other. However, u...
Liu, Jun-Jun; Xiang, Yu
2011-01-01
WRKY transcription factors are key regulators of numerous biological processes in plant growth and development, as well as plant responses to abiotic and biotic stresses. Research on biological functions of plant WRKY genes has focused in the past on model plant species or species with largely characterized transcriptomes. However, a variety of non-model plants, such as forest conifers, are essential as feed, biofuel, and wood or for sustainable ecosystems. Identification of WRKY genes in these non-model plants is equally important for understanding the evolutionary and function-adaptive processes of this transcription factor family. Because of limited genomic information, the rarity of regulatory gene mRNAs in transcriptomes, and the sequence divergence to model organism genes, identification of transcription factors in non-model plants using methods similar to those generally used for model plants is difficult. This chapter describes a gene family discovery strategy for identification of WRKY transcription factors in conifers by a combination of in silico-based prediction and PCR-based experimental approaches. Compared to traditional cDNA library screening or EST sequencing at transcriptome scales, this integrated gene discovery strategy provides fast, simple, reliable, and specific methods to unveil the WRKY gene family at both genome and transcriptome levels in non-model plants.
2010-01-01
Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245
Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong
2010-01-18
The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...
2015-04-09
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.
2015-01-01
Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308
Application of industrial scale genomics to discovery of therapeutic targets in heart failure.
Mehraban, F; Tomlinson, J E
2001-12-01
In recent years intense activity in both academic and industrial sectors has provided a wealth of information on the human genome with an associated impressive increase in the number of novel gene sequences deposited in sequence data repositories and patent applications. This genomic industrial revolution has transformed the way in which drug target discovery is now approached. In this article we discuss how various differential gene expression (DGE) technologies are being utilized for cardiovascular disease (CVD) drug target discovery. Other approaches such as sequencing cDNA from cardiovascular derived tissues and cells coupled with bioinformatic sequence analysis are used with the aim of identifying novel gene sequences that may be exploited towards target discovery. Additional leverage from gene sequence information is obtained through identification of polymorphisms that may confer disease susceptibility and/or affect drug responsiveness. Pharmacogenomic studies are described wherein gene expression-based techniques are used to evaluate drug response and/or efficacy. Industrial-scale genomics supports and addresses not only novel target gene discovery but also the burgeoning issues in pharmaceutical and clinical cardiovascular medicine relative to polymorphic gene responses.
Discovery and validation of a glioblastoma co-expressed gene module
Dunwoodie, Leland J.; Poehlman, William L.; Ficklin, Stephen P.; Feltus, Frank Alexander
2018-01-01
Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network. PMID:29541392
Discovery and validation of a glioblastoma co-expressed gene module.
Dunwoodie, Leland J; Poehlman, William L; Ficklin, Stephen P; Feltus, Frank Alexander
2018-02-16
Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.
The promises and pitfalls of RNA-interference-based therapeutics
Castanotto, Daniela; Rossi, John J.
2009-01-01
The discovery that gene expression can be controlled by the Watson–Crick base-pairing of small RNAs with messenger RNAs containing complementary sequence — a process known as RNA interference — has markedly advanced our understanding of eukaryotic gene regulation and function. The ability of short RNA sequences to modulate gene expression has provided a powerful tool with which to study gene function and is set to revolutionize the treatment of disease. Remarkably, despite being just one decade from its discovery, the phenomenon is already being used therapeutically in human clinical trials, and biotechnology companies that focus on RNA-interference-based therapeutics are already publicly traded. PMID:19158789
Standardized plant disease evaluations will enhance resistance gene discovery
USDA-ARS?s Scientific Manuscript database
Gene discovery and marker development using DNA-based tools require plant populations with well documented phenotypes. If dissimilar phenotype evaluation methods or data scoring techniques are employed with different crops, or at different labs for the same crops, then data mining for genetic marker...
Computational functional genomics-based approaches in analgesic drug discovery and repurposing.
Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn
2018-06-01
Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.
Zhang, Shihua; Zhang, Liang; Tai, Yuling; Wang, Xuewen; Ho, Chi-Tang; Wan, Xiaochun
2018-01-01
Characteristic secondary metabolites, including flavonoids, theanine and caffeine, in the tea plant (Camellia sinensis) are the primary sources of the rich flavors, fresh taste, and health benefits of tea. The decoding of genes involved in these characteristic components is still significantly lagging, which lays an obstacle for applied genetic improvement and metabolic engineering. With the popularity of high-throughout transcriptomics and metabolomics, ‘omics’-based network approaches, such as gene co-expression network and gene-to-metabolite network, have emerged as powerful tools for gene discovery of plant-specialized (secondary) metabolism. Thus, it is pivotal to summarize and introduce such system-based strategies in facilitating gene identification of characteristic metabolic pathways in the tea plant (or other plants). In this review, we describe recent advances in transcriptomics and metabolomics for transcript and metabolite profiling, and highlight ‘omics’-based network strategies using successful examples in model and non-model plants. Further, we summarize recent progress in ‘omics’ analysis for gene identification of characteristic metabolites in the tea plant. Limitations of the current strategies are discussed by comparison with ‘omics’-based network approaches. Finally, we demonstrate the potential of introducing such network strategies in the tea plant, with a prospects ending for a promising network discovery of characteristic metabolite genes in the tea plant. PMID:29915604
Mallik, Saurav; Zhao, Zhongming
2017-12-28
For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.
Murphy, Dennis L; Fox, Meredith A; Timpano, Kiara R; Moya, Pablo R; Ren-Patterson, Renee; Andrews, Anne M; Holmes, Andrew; Lesch, Klaus-Peter; Wendland, Jens R
2008-11-01
Discovered and crystallized over sixty years ago, serotonin's important functions in the brain and body were identified over the ensuing years by neurochemical, physiological and pharmacological investigations. This 2008 M. Rapport Memorial Serotonin Review focuses on some of the most recent discoveries involving serotonin that are based on genetic methodologies. These include examples of the consequences that result from direct serotonergic gene manipulation (gene deletion or overexpression) in mice and other species; an evaluation of some phenotypes related to functional human serotonergic gene variants, particularly in SLC6A4, the serotonin transporter gene; and finally, a consideration of the pharmacogenomics of serotonergic drugs with respect to both their therapeutic actions and side effects. The serotonin transporter (SERT) has been the most comprehensively studied of the serotonin system molecular components, and will be the primary focus of this review. We provide in-depth examples of gene-based discoveries primarily related to SLC6A4 that have clarified serotonin's many important homeostatic functions in humans, non-human primates, mice and other species.
Metagenomics and novel gene discovery
Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin
2014-01-01
Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337
Genome-Wide Methylation Analyses in Glioblastoma Multiforme
Lai, Rose K.; Chen, Yanwen; Guan, Xiaowei; Nousome, Darryl; Sharma, Charu; Canoll, Peter; Bruce, Jeffrey; Sloan, Andrew E.; Cortes, Etty; Vonsattel, Jean-Paul; Su, Tao; Delgado-Cruzata, Lissette; Gurvich, Irina; Santella, Regina M.; Ostrom, Quinn; Lee, Annette; Gregersen, Peter; Barnholtz-Sloan, Jill
2014-01-01
Few studies had investigated genome-wide methylation in glioblastoma multiforme (GBM). Our goals were to study differential methylation across the genome in gene promoters using an array-based method, as well as repetitive elements using surrogate global methylation markers. The discovery sample set for this study consisted of 54 GBM from Columbia University and Case Western Reserve University, and 24 brain controls from the New York Brain Bank. We assembled a validation dataset using methylation data of 162 TCGA GBM and 140 brain controls from dbGAP. HumanMethylation27 Analysis Bead-Chips (Illumina) were used to interrogate 26,486 informative CpG sites in both the discovery and validation datasets. Global methylation levels were assessed by analysis of L1 retrotransposon (LINE1), 5 methyl-deoxycytidine (5m-dC) and 5 hydroxylmethyl-deoxycytidine (5hm-dC) in the discovery dataset. We validated a total of 1548 CpG sites (1307 genes) that were differentially methylated in GBM compared to controls. There were more than twice as many hypomethylated genes as hypermethylated ones. Both the discovery and validation datasets found 5 tumor methylation classes. Pathway analyses showed that the top ten pathways in hypomethylated genes were all related to functions of innate and acquired immunities. Among hypermethylated pathways, transcriptional regulatory network in embryonic stem cells was the most significant. In the study of global methylation markers, 5m-dC level was the best discriminant among methylation classes, whereas in survival analyses, high level of LINE1 methylation was an independent, favorable prognostic factor in the discovery dataset. Based on a pathway approach, hypermethylation in genes that control stem cell differentiation were significant, poor prognostic factors of overall survival in both the discovery and validation datasets. Approaches that targeted these methylated genes may be a future therapeutic goal. PMID:24586730
Celedon, J M; Bohlmann, J
2016-01-01
Terpenoid fragrances are powerful mediators of ecological interactions in nature and have a long history of traditional and modern industrial applications. Plants produce a great diversity of fragrant terpenoid metabolites, which make them a superb source of biosynthetic genes and enzymes. Advances in fragrance gene discovery have enabled new approaches in synthetic biology of high-value speciality molecules toward applications in the fragrance and flavor, food and beverage, cosmetics, and other industries. Rapid developments in transcriptome and genome sequencing of nonmodel plant species have accelerated the discovery of fragrance biosynthetic pathways. In parallel, advances in metabolic engineering of microbial and plant systems have established platforms for synthetic biology applications of some of the thousands of plant genes that underlie fragrance diversity. While many fragrance molecules (eg, simple monoterpenes) are abundant in readily renewable plant materials, some highly valuable fragrant terpenoids (eg, santalols, ambroxides) are rare in nature and interesting targets for synthetic biology. As a representative example for genomics/transcriptomics enabled gene and enzyme discovery, we describe a strategy used successfully for elucidation of a complete fragrance biosynthetic pathway in sandalwood (Santalum album) and its reconstruction in yeast (Saccharomyces cerevisiae). We address questions related to the discovery of specific genes within large gene families and recovery of rare gene transcripts that are selectively expressed in recalcitrant tissues. To substantiate the validity of the approaches, we describe the combination of methods used in the gene and enzyme discovery of a cytochrome P450 in the fragrant heartwood of tropical sandalwood, responsible for the fragrance defining, final step in the biosynthesis of (Z)-santalols. © 2016 Elsevier Inc. All rights reserved.
GWATCH: a web platform for automated gene association discovery analysis.
Svitin, Anton; Malov, Sergey; Cherkasov, Nikolay; Geerts, Paul; Rotkevich, Mikhail; Dobrynin, Pavel; Shevchenko, Andrey; Guan, Li; Troyer, Jennifer; Hendrickson, Sher; Dilks, Holli Hutcheson; Oleksyk, Taras K; Donfield, Sharyne; Gomperts, Edward; Jabs, Douglas A; Sezgin, Efe; Van Natta, Mark; Harrigan, P Richard; Brumme, Zabrina L; O'Brien, Stephen J
2014-01-01
As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations. Here we present a dynamic web-based platform - GWATCH - that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis. GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH.
Genome-wide expression profiling in pediatric septic shock
Wong, Hector R.
2013-01-01
For nearly a decade, our research group has had the privilege of developing and mining a multi-center, microarray-based, genome-wide expression database of critically ill children (≤ 10 years of age) with septic shock. Using bioinformatic and systems biology approaches, the expression data generated through this discovery-oriented, exploratory approach have been leveraged for a variety of objectives, which will be reviewed. Fundamental observations include wide spread repression of gene programs corresponding to the adaptive immune system, and biologically significant differential patterns of gene expression across developmental age groups. The data have also identified gene expression-based subclasses of pediatric septic shock having clinically relevant phenotypic differences. The data have also been leveraged for the discovery of novel therapeutic targets, and for the discovery and development of novel stratification and diagnostic biomarkers. Almost a decade of genome-wide expression profiling in pediatric septic shock is now demonstrating tangible results. The studies have progressed from an initial discovery-oriented and exploratory phase, to a new phase where the data are being translated and applied to address several areas of clinical need. PMID:23329198
Discovery of error-tolerant biclusters from noisy gene expression data.
Gupta, Rohit; Rao, Navneet; Kumar, Vipin
2011-11-24
An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.
Gene and enhancer trap tagging of vascular-expressed genes in poplar trees
Andrew Groover; Joseph R. Fontana; Gayle Dupper; Caiping Ma; Robert Martienssen; Steven Strauss; Richard Meilan
2004-01-01
We report a gene discovery system for poplar trees based on gene and enhancer traps. Gene and enhancer trap vectors carrying the β-glucuronidase (GUS) reporter gene were inserted into the poplar genome via Agrobacterium tumefaciens transformation, where they reveal the expression pattern of genes at or near the insertion sites. Because GUS...
Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda
2016-04-26
Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.
CREB and the discovery of cognitive enhancers.
Scott, Roderick; Bourtchuladze, Rusiko; Gossweiler, Scott; Dubnau, Josh; Tully, Tim
2002-01-01
In the past few years, a series of molecular-genetic, biochemical, cellular and behavioral studies in fruit flies, sea slugs and mice have confirmed a long-standing notion that long-term memory formation depends on the synthesis of new proteins. Experiments focused on the cAMP-responsive transcription factor, CREB, have established that neural activity-induced regulation of gene transcription promotes a synaptic growth process that strengthens the connections among active neurons. This process constitutes a physical basis for the engram--and CREB is a "molecular switch" to produce the engram. Helicon Therapeutics has been formed to identify drug compounds that enhance memory formation via augmentation of CREB biochemistry. Candidate compounds have been identified from a high throughput cell-based screen and are being evaluated in animal models of memory formation. A gene discovery program also seeks to identify new genes, which function downstream of CREB during memory formation, as a source for new drug discoveries in the future. Together, these drug and gene discovery efforts promise new class of pharmaceutical therapies for the treatment of various forms of cognitive dysfunction.
Independent Gene Discovery and Testing
ERIC Educational Resources Information Center
Palsule, Vrushalee; Coric, Dijana; Delancy, Russell; Dunham, Heather; Melancon, Caleb; Thompson, Dennis; Toms, Jamie; White, Ashley; Shultz, Jeffry
2010-01-01
A clear understanding of basic gene structure is critical when teaching molecular genetics, the central dogma and the biological sciences. We sought to create a gene-based teaching project to improve students' understanding of gene structure and to integrate this into a research project that can be implemented by instructors at the secondary level…
Translational research in cancer genetics: the road less traveled.
Schully, S D; Benedicto, C B; Gillanders, E M; Wang, S S; Khoury, M J
2011-01-01
Gene discoveries in cancer have the potential for clinical and public health applications. To take advantage of such discoveries, a translational research agenda is needed to take discoveries from the bench to population health impact. To assess the current status of translational research in cancer genetics, we analyzed the extramural grant portfolio of the National Cancer Institute (NCI) from Fiscal Year 2007, as well as the cancer genetic research articles published in 2007. We classified both funded grants and publications as follows: T0 as discovery research; T1 as research to develop a candidate health application (e.g., test or therapy); T2 as research that evaluates a candidate application and develops evidence-based recommendations; T3 as research that assesses how to integrate an evidence-based recommendation into cancer care and prevention; and T4 as research that assesses health outcomes and population impact. We found that 1.8% of the grant portfolio and 0.6% of the published literature was T2 research or beyond. In addition to discovery research in cancer genetics, a translational research infrastructure is urgently needed to methodically evaluate and translate gene discoveries for cancer care and prevention. Copyright © 2009 S. Karger AG, Basel.
Developing integrated crop knowledge networks to advance candidate gene discovery.
Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher
2016-12-01
The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
Transcriptome assembly and digital gene expression atlas of the rainbow trout
USDA-ARS?s Scientific Manuscript database
Background: Transcriptome analysis is a preferred method for gene discovery, marker development and gene expression profiling in non-model organisms. Previously, we sequenced a transcriptome reference using Sanger-based and 454-pyrosequencing, however, a transcriptome assembly is still incomplete an...
Logue, Mark W.; Smith, Alicia K.; Baldwin, Clinton; Wolf, Erika J.; Guffanti, Guia; Ratanatharathorn, Andrew; Stone, Annjanette; Schichman, Steven A.; Humphries, Donald; Binder, Elisabeth B.; Arloth, Janine; Menke, Andreas; Uddin, Monica; Wildman, Derek; Galea, Sandro; Aiello, Allison E.; Koenen, Karestan C.; Miller, Mark W.
2015-01-01
We examined the association between posttraumatic stress disorder (PTSD) and gene expression using whole blood samples from a cohort of trauma-exposed white non-Hispanic male veterans (115 cases and 28 controls). 10,264 probes of genes and gene transcripts were analyzed. We found 41 that were differentially expressed in PTSD cases versus controls (multiple-testing corrected p<0.05). The most significant was DSCAM, a neurological gene expressed widely in the developing brain and in the amygdala and hippocampus of the adult brain. We then examined the 41 differentially expressed genes in a meta-analysis using two replication cohorts and found significant associations with PTSD for 7 of the 41 (p<0.05), one of which (ATP6AP1L) survived multiple-testing correction. There was also broad evidence of overlap across the discovery and replication samples for the entire set of genes implicated in the discovery data based on the direction of effect and an enrichment of p<0.05 significant probes beyond what would be expected under the null. Finally, we found that the set of differentially expressed genes from the discovery sample was enriched for genes responsive to glucocorticoid signaling with most showing reduced expression in PTSD cases compared to controls. PMID:25867994
Genome-wide and gene-based association implicates FRMD6 in Alzheimer disease.
Hong, Mun-Gwan; Reynolds, Chandra A; Feldman, Adina L; Kallin, Mikael; Lambert, Jean-Charles; Amouyel, Philippe; Ingelsson, Erik; Pedersen, Nancy L; Prince, Jonathan A
2012-03-01
Genome-wide association studies (GWAS) that allow for allelic heterogeneity may facilitate the discovery of novel genes not detectable by models that require replication of a single variant site. One strategy to accomplish this is to focus on genes rather than markers as units of association, and so potentially capture a spectrum of causal alleles that differ across populations. Here, we conducted a GWAS of Alzheimer disease (AD) in 2,586 Swedes and performed gene-based meta-analysis with three additional studies from France, Canada, and the United States, in total encompassing 4,259 cases and 8,284 controls. Implementing a newly designed gene-based algorithm, we identified two loci apart from the region around APOE that achieved study-wide significance in combined samples, the strongest finding being for FRMD6 on chromosome 14q (P = 2.6 × 10(-14)) and a weaker signal for NARS2 that is immediately adjacent to GAB2 on chromosome 11q (P = 7.8 × 10(-9)). Ontology-based pathway analyses revealed significant enrichment of genes involved in glycosylation. Results suggest that gene-based approaches that accommodate allelic heterogeneity in GWAS can provide a complementary avenue for gene discovery and may help to explain a portion of the missing heritability not detectable with single nucleotide polymorphisms (SNPs) derived from marker-specific meta-analysis. © 2011 Wiley Periodicals, Inc.
Lung tumor diagnosis and subtype discovery by gene expression profiling.
Wang, Lu-yong; Tu, Zhuowen
2006-01-01
The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.
Systematic identification of latent disease-gene associations from PubMed articles.
Zhang, Yuji; Shen, Feichen; Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang
2018-01-01
Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.
Systematic identification of latent disease-gene associations from PubMed articles
Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang
2018-01-01
Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research. PMID:29373609
Copper homeostasis gene discovery in Drosophila melanogaster.
Norgate, Melanie; Southon, Adam; Zou, Sige; Zhan, Ming; Sun, Yu; Batterham, Phil; Camakaris, James
2007-06-01
Recent studies have shown a high level of conservation between Drosophila melanogaster and mammalian copper homeostasis mechanisms. These studies have also demonstrated the efficiency with which this species can be used to characterize novel genes, at both the cellular and whole organism level. As a versatile and inexpensive model organism, Drosophila is also particularly useful for gene discovery applications and thus has the potential to be extremely useful in identifying novel copper homeostasis genes and putative disease genes. In order to assess the suitability of Drosophila for this purpose, three screening approaches have been investigated. These include an analysis of the global transcriptional response to copper in both adult flies and an embryonic cell line using DNA microarray analysis. Two mutagenesis-based screens were also utilized. Several candidate copper homeostasis genes have been identified through this work. In addition, the results of each screen were carefully analyzed to identify any factors influencing efficiency and sensitivity. These are discussed here with the aim of maximizing the efficiency of future screens and the most suitable approaches are outlined. Building on this information, there is great potential for the further use of Drosophila for copper homeostasis gene discovery.
Phenome-driven disease genetics prediction toward drug discovery.
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-06-15
Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.
Paisitkriangkrai, Sakrapee; Quek, Kelly; Nievergall, Eva; Jabbour, Anissa; Zannettino, Andrew; Kok, Chung Hoow
2018-06-07
Recurrent oncogenic fusion genes play a critical role in the development of various cancers and diseases and provide, in some cases, excellent therapeutic targets. To date, analysis tools that can identify and compare recurrent fusion genes across multiple samples have not been available to researchers. To address this deficiency, we developed Co-occurrence Fusion (Co-fuse), a new and easy to use software tool that enables biologists to merge RNA-seq information, allowing them to identify recurrent fusion genes, without the need for exhaustive data processing. Notably, Co-fuse is based on pattern mining and statistical analysis which enables the identification of hidden patterns of recurrent fusion genes. In this report, we show that Co-fuse can be used to identify 2 distinct groups within a set of 49 leukemic cell lines based on their recurrent fusion genes: a multiple myeloma (MM) samples-enriched cluster and an acute myeloid leukemia (AML) samples-enriched cluster. Our experimental results further demonstrate that Co-fuse can identify known driver fusion genes (e.g., IGH-MYC, IGH-WHSC1) in MM, when compared to AML samples, indicating the potential of Co-fuse to aid the discovery of yet unknown driver fusion genes through cohort comparisons. Additionally, using a 272 primary glioma sample RNA-seq dataset, Co-fuse was able to validate recurrent fusion genes, further demonstrating the power of this analysis tool to identify recurrent fusion genes. Taken together, Co-fuse is a powerful new analysis tool that can be readily applied to large RNA-seq datasets, and may lead to the discovery of new disease subgroups and potentially new driver genes, for which, targeted therapies could be developed. The Co-fuse R source code is publicly available at https://github.com/sakrapee/co-fuse .
Biomedical Information Extraction: Mining Disease Associated Genes from Literature
ERIC Educational Resources Information Center
Huang, Zhong
2014-01-01
Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…
Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying
2011-08-01
The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
2010-01-01
Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped redundant clones together and illustrated that the SSHscreen plots are a useful tool for choosing anonymous clones for sequencing, since redundant clones cluster together on the enrichment ratio plots. Conclusions We developed the SSHscreen-SSHdb software pipeline, which greatly facilitates gene discovery using suppression subtractive hybridization by improving the selection of clones for sequencing after screening the library on a small number of microarrays. Annotation of the sequence information and collaboration was further enhanced through a web-based SSHdb database, and we illustrated this through identification of drought responsive genes from cowpea, which can now be investigated in gene function studies. SSH is a popular and powerful gene discovery tool, and therefore this pipeline will have application for gene discovery in any biological system, particularly non-model organisms. SSHscreen 2.0.1 and a link to SSHdb are available from http://microarray.up.ac.za/SSHscreen. PMID:20359330
Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid
2017-02-02
Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.
Sumner, Lloyd W.; Lei, Zhentian; Nikolau, Basil J.; ...
2014-10-24
Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This study highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR formore » metabolite identifications, and x-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine.« less
A machine-learned computational functional genomics-based approach to drug classification.
Lötsch, Jörn; Ultsch, Alfred
2016-12-01
The public accessibility of "big data" about the molecular targets of drugs and the biological functions of genes allows novel data science-based approaches to pharmacology that link drugs directly with their effects on pathophysiologic processes. This provides a phenotypic path to drug discovery and repurposing. This paper compares the performance of a functional genomics-based criterion to the traditional drug target-based classification. Knowledge discovery in the DrugBank and Gene Ontology databases allowed the construction of a "drug target versus biological process" matrix as a combination of "drug versus genes" and "genes versus biological processes" matrices. As a canonical example, such matrices were constructed for classical analgesic drugs. These matrices were projected onto a toroid grid of 50 × 82 artificial neurons using a self-organizing map (SOM). The distance, respectively, cluster structure of the high-dimensional feature space of the matrices was visualized on top of this SOM using a U-matrix. The cluster structure emerging on the U-matrix provided a correct classification of the analgesics into two main classes of opioid and non-opioid analgesics. The classification was flawless with both the functional genomics and the traditional target-based criterion. The functional genomics approach inherently included the drugs' modulatory effects on biological processes. The main pharmacological actions known from pharmacological science were captures, e.g., actions on lipid signaling for non-opioid analgesics that comprised many NSAIDs and actions on neuronal signal transmission for opioid analgesics. Using machine-learned techniques for computational drug classification in a comparative assessment, a functional genomics-based criterion was found to be similarly suitable for drug classification as the traditional target-based criterion. This supports a utility of functional genomics-based approaches to computational system pharmacology for drug discovery and repurposing.
Culture-independent discovery of natural products from soil metagenomes.
Katz, Micah; Hover, Bradley M; Brady, Sean F
2016-03-01
Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.
Faggionato, Davide; Serb, Jeanne M
2017-08-01
The rise of high-throughput RNA sequencing (RNA-seq) and de novo transcriptome assembly has had a transformative impact on how we identify and study genes in the phototransduction cascade of non-model organisms. But the advantage provided by the nearly automated annotation of RNA-seq transcriptomes may at the same time hinder the possibility for gene discovery and the discovery of new gene functions. For example, standard functional annotation based on domain homology to known protein families can only confirm group membership, not identify the emergence of new biochemical function. In this study, we show the importance of developing a strategy that circumvents the limitations of semiautomated annotation and apply this workflow to photosensitivity as a means to discover non-opsin photoreceptors. We hypothesize that non-opsin G-protein-coupled receptor (GPCR) proteins may have chromophore-binding lysines in locations that differ from opsin. Here, we provide the first case study describing non-opsin light-sensitive GPCRs based on tissue-specific RNA-seq data of the common bay scallop Argopecten irradians (Lamarck, 1819). Using a combination of sequence analysis and three-dimensional protein modeling, we identified two candidate proteins. We tested their photochemical properties and provide evidence showing that these two proteins incorporate 11-cis and/or all-trans retinal and react to light photochemically. Based on this case study, we demonstrate that there is potential for the discovery of new light-sensitive GPCRs, and we have developed a workflow that starts from RNA-seq assemblies to the discovery of new non-opsin, GPCR-based photopigments.
Model-driven discovery of underground metabolic functions in Escherichia coli.
Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M
2015-01-20
Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.
The Influence of Metabolic Syndrome and Sex on the DNA Methylome in Schizophrenia
Lines, Brittany N.
2018-01-01
Introduction The mechanism by which metabolic syndrome occurs in schizophrenia is not completely known; however, previous work suggests that changes in DNA methylation may be involved which is further influenced by sex. Within this study, the DNA methylome was profiled to identify altered methylation associated with metabolic syndrome in a schizophrenia population on atypical antipsychotics. Methods Peripheral blood from schizophrenia subjects was utilized for DNA methylation analyses. Discovery analyses (n = 96) were performed using an epigenome-wide analysis on the Illumina HumanMethylation450K BeadChip based on metabolic syndrome diagnosis. A secondary discovery analysis was conducted based on sex. The top hits from the discovery analyses were assessed in an additional validation set (n = 166) using site-specific methylation pyrosequencing. Results A significant increase in CDH22 gene methylation in subjects with metabolic syndrome was identified in the overall sample. Additionally, differential methylation was found within the MAP3K13 gene in females and the CCDC8 gene within males. Significant differences in methylation were again observed for the CDH22 and MAP3K13 genes, but not CCDC8, in the validation sample set. Conclusions This study provides preliminary evidence that DNA methylation may be associated with metabolic syndrome and sex in schizophrenia. PMID:29850476
Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of human biomedical science. Many such classifiers discovered thus far lack vigorous statistical and experimental validations, with their stability and rel...
Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren
2018-01-01
Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A
2014-01-01
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
Phenome-driven disease genetics prediction toward drug discovery
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-01-01
Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493
Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben
2015-01-01
The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027
A roadmap for natural product discovery based on large-scale genomics and metabolomics
USDA-ARS?s Scientific Manuscript database
Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic ca...
Fu, Shuyue; Liu, Xiang; Luo, Maochao; Xie, Ke; Nice, Edouard C; Zhang, Haiyuan; Huang, Canhua
2017-04-01
Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Ray, Pritha
2011-04-01
Development and marketing of new drugs require stringent validation that are expensive and time consuming. Non-invasive multimodality molecular imaging using reporter genes holds great potential to expedite these processes at reduced cost. New generations of smarter molecular imaging strategies such as Split reporter, Bioluminescence resonance energy transfer, Multimodality fusion reporter technologies will further assist to streamline and shorten the drug discovery and developmental process. This review illustrates the importance and potential of molecular imaging using multimodality reporter genes in drug development at preclinical phases.
Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt
2014-06-01
In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.
A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.
Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong
2015-01-01
Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.
A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus
Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong
2015-01-01
Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180
ERIC Educational Resources Information Center
Grados, Marco A.
2010-01-01
Objective: To provide a contemporary perspective on genetic discovery methods applied to obsessive-compulsive disorder (OCD) and Tourette syndrome (TS). Method: A review of research trends in genetics research in OCD and TS is conducted, with emphasis on novel approaches. Results: Genome-wide association studies (GWAS) are now in progress in OCD…
Automated Discovery of Long Intergenic RNAs Associated with Breast Cancer Progression
2012-02-01
manuscript in preparation), (2) development and publication of an algorithm for detecting gene fusions in RNA-Seq data [1], and (3) discovery of outlier long...subjected to de novo assembly algorithms to discover novel transcripts representing either unannotated genes or novel somatic mutations such as gene...fusions. To this end the P.I. developed and published a novel algorithm called ChimeraScan to facilitate the discovery and validation of gene
Identifying candidate driver genes by integrative ovarian cancer genomics data
NASA Astrophysics Data System (ADS)
Lu, Xinguo; Lu, Jibo
2017-08-01
Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.
Gregori, Josep; Villarreal, Laura; Sánchez, Alex; Baselga, José; Villanueva, Josep
2013-12-16
The microarray community has shown that the low reproducibility observed in gene expression-based biomarker discovery studies is partially due to relying solely on p-values to get the lists of differentially expressed genes. Their conclusions recommended complementing the p-value cutoff with the use of effect-size criteria. The aim of this work was to evaluate the influence of such an effect-size filter on spectral counting-based comparative proteomic analysis. The results proved that the filter increased the number of true positives and decreased the number of false positives and the false discovery rate of the dataset. These results were confirmed by simulation experiments where the effect size filter was used to evaluate systematically variable fractions of differentially expressed proteins. Our results suggest that relaxing the p-value cut-off followed by a post-test filter based on effect size and signal level thresholds can increase the reproducibility of statistical results obtained in comparative proteomic analysis. Based on our work, we recommend using a filter consisting of a minimum absolute log2 fold change of 0.8 and a minimum signal of 2-4 SpC on the most abundant condition for the general practice of comparative proteomics. The implementation of feature filtering approaches could improve proteomic biomarker discovery initiatives by increasing the reproducibility of the results obtained among independent laboratories and MS platforms. Quality control analysis of microarray-based gene expression studies pointed out that the low reproducibility observed in the lists of differentially expressed genes could be partially attributed to the fact that these lists are generated relying solely on p-values. Our study has established that the implementation of an effect size post-test filter improves the statistical results of spectral count-based quantitative proteomics. The results proved that the filter increased the number of true positives whereas decreased the false positives and the false discovery rate of the datasets. The results presented here prove that a post-test filter applying a reasonable effect size and signal level thresholds helps to increase the reproducibility of statistical results in comparative proteomic analysis. Furthermore, the implementation of feature filtering approaches could improve proteomic biomarker discovery initiatives by increasing the reproducibility of results obtained among independent laboratories and MS platforms. This article is part of a Special Issue entitled: Standardization and Quality Control in Proteomics. Copyright © 2013 Elsevier B.V. All rights reserved.
Retinoblastoma-like RRB gene of arabidopsis thaliana
Durfee, Tim; Feiler, Heidi; Gruissem, Wilhelm; Jenkins, Susan; Roe, Judith; Zambryski, Patricia
2004-02-24
This invention provides methods and compositions for altering the growth, organization, and differentiation of plant tissues. The invention is based on the discovery that, in plants, genetically altering the levels of Retinoblastoma-related gene (RRB) activity produces dramatic effects on the growth, proliferation, organization, and differentiation of plant meristem.
Context-sensitive network-based disease genetics prediction and its implications in drug discovery
Chen, Yang; Xu, Rong
2017-01-01
Abstract Motivation: Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. Results: We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach (p
Hwang, Sohyun; Rhee, Seung Y; Marcotte, Edward M; Lee, Insuk
2012-01-01
AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests. PMID:21886106
How rare bone diseases have informed our knowledge of complex diseases.
Johnson, Mark L
2016-01-01
Rare bone diseases, generally defined as monogenic traits with either autosomal recessive or dominant patterns of inheritance, have provided a rich database of genes and associated pathways over the past 2-3 decades. The molecular genetic dissection of these bone diseases has yielded some major surprises in terms of the causal genes and/or involved pathways. The discovery of genes/pathways involved in diseases such as osteopetrosis, osteosclerosis, osteogenesis imperfecta and many other rare bone diseases have all accelerated our understanding of complex traits. Importantly these discoveries have provided either direct validation for a specific gene embedded in a group of genes within an interval identified through a complex trait genome-wide association study (GWAS) or based upon the pathway associated with a monogenic trait gene, provided a means to prioritize a large number of genes for functional validation studies. In some instances GWAS studies have yielded candidate genes that fall within linkage intervals associated with monogenic traits and resulted in the identification of causal mutations in those rare diseases. Driving all of this discovery is a complement of technologies such as genome sequencing, bioinformatics and advanced statistical analysis methods that have accelerated genetic dissection and greatly reduced the cost. Thus, rare bone disorders in partnership with GWAS have brought us to the brink of a new era of personalized genomic medicine in which the prevention and management of complex diseases will be driven by the molecular understanding of each individuals contributing genetic risks for disease.
Antisense oligonucleotide technologies in drug discovery.
Aboul-Fadl, Tarek
2006-09-01
The principle of antisense oligonucleotide (AS-OD) technologies is based on the specific inhibition of unwanted gene expression by blocking mRNA activity. It has long appeared to be an ideal strategy to leverage new genomic knowledge for drug discovery and development. In recent years, AS-OD technologies have been widely used as potent and promising tools for this purpose. There is a rapid increase in the number of antisense molecules progressing in clinical trials. AS-OD technologies provide a simple and efficient approach for drug discovery and development and are expected to become a reality in the near future. This editorial describes the established and emerging AS-OD technologies in drug discovery.
Xi, Jianing; Wang, Minghui; Li, Ao
2018-06-05
Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation
2011-01-01
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
Emerging techniques for the discovery and validation of therapeutic targets for skeletal diseases.
Cho, Christine H; Nuttall, Mark E
2002-12-01
Advances in genomics and proteomics have revolutionised the drug discovery process and target validation. Identification of novel therapeutic targets for chronic skeletal diseases is an extremely challenging process based on the difficulty of obtaining high-quality human diseased versus normal tissue samples. The quality of tissue and genomic information obtained from the sample is critical to identifying disease-related genes. Using a genomics-based approach, novel genes or genes with similar homology to existing genes can be identified from cDNA libraries generated from normal versus diseased tissue. High-quality cDNA libraries are prepared from uncontaminated homogeneous cell populations harvested from tissue sections of interest. Localised gene expression analysis and confirmation are obtained through in situ hybridisation or immunohistochemical studies. Cells overexpressing the recombinant protein are subsequently designed for primary cell-based high-throughput assays that are capable of screening large compound banks for potential hits. Afterwards, secondary functional assays are used to test promising compounds. The same overexpressing cells are used in the secondary assay to test protein activity and functionality as well as screen for small-molecule agonists or antagonists. Once a hit is generated, a structure-activity relationship of the compound is optimised for better oral bioavailability and pharmacokinetics allowing the compound to progress into development. Parallel efforts from proteomics, as well as genetics/transgenics, bioinformatics and combinatorial chemistry, and improvements in high-throughput automation technologies, allow the drug discovery process to meet the demands of the medicinal market. This review discusses and illustrates how different approaches are incorporated into the discovery and validation of novel targets and, consequently, the development of potentially therapeutic agents in the areas of osteoporosis and osteoarthritis. While current treatments exist in the form of hormone replacement therapy, antiresorptive and anabolic agents for osteoporosis, there are no disease-modifying therapies for the treatment of the most common human joint disease, osteoarthritis. A massive market potential for improved options with better safety and efficacy still remains. Therefore, the application of genomics and proteomics for both diseases should provide much needed novel therapeutic approaches to treating these major world health problems.
Alteration of plant meristem function by manipulation of the Retinoblastoma-like plant RRB gene
Durfee, Tim [Madison, WI; Feiler, Heidi [Albany, CA; Gruissem, Wilhelm [Forch, CH; Jenkins, Susan [Martinez, CA; Roe, Judith [Manhattan, KS; Zambryski, Patricia [Berkeley, CA
2007-01-16
This invention provides methods and compositions for altering the growth, organization, and differentiation of plant tissues. The invention is based on the discovery that, in plants, genetically altering the levels of Retinoblastoma-related gene (RRB) activity produces dramatic effects on the growth, proliferation, organization, and differentiation of plant meristem.
Drug discovery based on genetic and metabolic findings in schizophrenia.
Dwyer, Donard S; Weeks, Kathrine; Aamodt, Eric J
2008-11-01
Recent progress in the genetics of schizophrenia provides the rationale for re-evaluating causative factors and therapeutic strategies for this disease. Here, we review the major candidate susceptibility genes and relate the aberrant function of these genes to defective regulation of energy metabolism in the schizophrenic brain. Disturbances in energy metabolism potentially lead to neurodevelopmental deficits, impaired function of the mature nervous system and failure to maintain neurites/dendrites and synaptic connections. Current antipsychotic drugs do not specifically address these underlying deficits; therefore, a new generation of more effective medications is urgently needed. Novel targets for future drug discovery are identified in this review. The coordinated application of structure-based drug design, systems biology and research on model organisms may greatly facilitate the search for next-generation antipsychotic drugs.
Xu, Li; Han, Ting; Ge, Mei; Zhu, Li; Qian, XiuPing
2016-09-01
Analysis of the Amycolatopsis orientalis HCCB10007 genome revealed new gene clusters involved in natural product biosynthesis that were not associated with the production of known compounds. Halogenases are a type of tailoring enzymes that are usually found within these secondary gene clusters. In this study, we identified an indole-type halometabolite 6-chrolo-1H-indole-3-carboxamide, named LYXLF2, by whole genome mining and metabolic profiling of a flavin-dependent halogenase mutant. LYXLF2 is a new plant growth-regulating compound that promotes root elongation. The results of this study demonstrated that the special gene knock-out/comparative metabolic profiling approach provides a powerful tool for the discovery of novel natural products by genome mining.
Rembrandt: Helping Personalized Medicine Become a Reality Through Integrative Translational Research
Madhavan, Subha; Zenklusen, Jean-Claude; Kotliarov, Yuri; Sahni, Himanso; Fine, Howard A.; Buetow, Kenneth
2009-01-01
Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set, and ability to integrate biomedical data from disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient’s tumor. Here we present Rembrandt, Repository of Molecular BRAin Neoplasia DaTa, a cancer clinical genomics database and a web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising nearly 566 gene expression arrays, 834 copy number arrays and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that REMBRANDT represents a prototype of how high throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic. PMID:19208739
iSyTE 2.0: a database for expression-based gene discovery in the eye
Kakrana, Atul; Yang, Andrian; Anand, Deepti; Djordjevic, Djordje; Ramachandruni, Deepti; Singh, Abhyudai; Huang, Hongzhan
2018-01-01
Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. PMID:29036527
Gene signature critical to cancer phenotype as a paradigm for anti-cancer drug discovery
Sampson, Erik R.; McMurray, Helene R.; Hassane, Duane C.; Newman, Laurel; Salzman, Peter; Jordan, Craig T.; Land, Hartmut
2013-01-01
Malignant cell transformation commonly results in the deregulation of thousands of cellular genes, an observation that suggests a complex biological process and an inherently challenging scenario for the development of effective cancer interventions. To better define the genes/pathways essential to regulating the malignant phenotype, we recently described a novel strategy based on the cooperative nature of carcinogenesis that focuses on genes synergistically deregulated in response to cooperating oncogenic mutations. These so-called “cooperation response genes” (CRGs) are highly enriched for genes critical for the cancer phenotype, thereby suggesting their causal role in the malignant state. Here we show that CRGs play an essential role in drug-mediated anti-cancer activity and that anti-cancer agents can be identified through their ability to antagonize the CRG expression profile. These findings provide proof-of-concept for the use of the CRG signature as a novel means of drug discovery with relevance to underlying anti-cancer drug mechanisms. PMID:22964631
Systematic Evaluation of Molecular Networks for Discovery of Disease Genes.
Huang, Justin K; Carlin, Daniel E; Yu, Michael Ku; Zhang, Wei; Kreisberg, Jason F; Tamayo, Pablo; Ideker, Trey
2018-04-25
Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall. A general tendency is that performance scales with network size, suggesting that new interaction discovery currently outweighs the detrimental effects of false positives. Correcting for size, we find that the DIP network provides the highest efficiency (value per interaction). Based on these results, we create a parsimonious composite network with both high efficiency and performance. This work provides a benchmark for selection of molecular networks in human disease research. Copyright © 2018 Elsevier Inc. All rights reserved.
Stamou, M. I.; Cox, K. H.
2015-01-01
The neuroendocrine regulation of reproduction is an intricate process requiring the exquisite coordination of an assortment of cellular networks, all converging on the GnRH neurons. These neurons have a complex life history, migrating mainly from the olfactory placode into the hypothalamus, where GnRH is secreted and acts as the master regulator of the hypothalamic-pituitary-gonadal axis. Much of what we know about the biology of the GnRH neurons has been aided by discoveries made using the human disease model of isolated GnRH deficiency (IGD), a family of rare Mendelian disorders that share a common failure of secretion and/or action of GnRH causing hypogonadotropic hypogonadism. Over the last 30 years, research groups around the world have been investigating the genetic basis of IGD using different strategies based on complex cases that harbor structural abnormalities or single pleiotropic genes, endogamous pedigrees, candidate gene approaches as well as pathway gene analyses. Although such traditional approaches, based on well-validated tools, have been critical to establish the field, new strategies, such as next-generation sequencing, are now providing speed and robustness, but also revealing a surprising number of variants in known IGD genes in both patients and healthy controls. Thus, before the field moves forward with new genetic tools and continues discovery efforts, we must reassess what we know about IGD genetics and prepare to hold our work to a different standard. The purpose of this review is to: 1) look back at the strategies used to discover the “known” genes implicated in the rare forms of IGD; 2) examine the strengths and weaknesses of the methodologies used to validate genetic variation; 3) substantiate the role of known genes in the pathophysiology of the disease; and 4) project forward as we embark upon a widening use of these new and powerful technologies for gene discovery. PMID:26394276
Stamou, M I; Cox, K H; Crowley, William F
2015-12-01
The neuroendocrine regulation of reproduction is an intricate process requiring the exquisite coordination of an assortment of cellular networks, all converging on the GnRH neurons. These neurons have a complex life history, migrating mainly from the olfactory placode into the hypothalamus, where GnRH is secreted and acts as the master regulator of the hypothalamic-pituitary-gonadal axis. Much of what we know about the biology of the GnRH neurons has been aided by discoveries made using the human disease model of isolated GnRH deficiency (IGD), a family of rare Mendelian disorders that share a common failure of secretion and/or action of GnRH causing hypogonadotropic hypogonadism. Over the last 30 years, research groups around the world have been investigating the genetic basis of IGD using different strategies based on complex cases that harbor structural abnormalities or single pleiotropic genes, endogamous pedigrees, candidate gene approaches as well as pathway gene analyses. Although such traditional approaches, based on well-validated tools, have been critical to establish the field, new strategies, such as next-generation sequencing, are now providing speed and robustness, but also revealing a surprising number of variants in known IGD genes in both patients and healthy controls. Thus, before the field moves forward with new genetic tools and continues discovery efforts, we must reassess what we know about IGD genetics and prepare to hold our work to a different standard. The purpose of this review is to: 1) look back at the strategies used to discover the "known" genes implicated in the rare forms of IGD; 2) examine the strengths and weaknesses of the methodologies used to validate genetic variation; 3) substantiate the role of known genes in the pathophysiology of the disease; and 4) project forward as we embark upon a widening use of these new and powerful technologies for gene discovery.
A hybrid network-based method for the detection of disease-related genes
NASA Astrophysics Data System (ADS)
Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene
2018-02-01
Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.
Recently, the landscape of single base mutations in diffuse large B-cell lymphoma (DLBCL) was described. Here we report the discovery of a gene fusion between TBL1XR1 and TP63, the only recurrent somatic novel gene fusion identified in our analysis of transcriptome data from 96 DLBCL cases. Based on this cohort and a further 157 DLBCL cases analyzed by FISH, the incidence in de novo germinal center B cell-like (GCB) DLBCL is 5% (6 of 115).
Penrod, Nadia M; Moore, Jason H
2014-02-05
The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. We use this approach to prioritize genes as drug target candidates in a set of ER⁺ breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER⁺ breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use.
2014-01-01
Background The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. Results We use this approach to prioritize genes as drug target candidates in a set of ER + breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER + breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Conclusions Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use. PMID:24495353
Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G
2013-01-01
Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.
2013-03-14
SUPPLEMENTARY NOTES 14. ABSTRACT Autism is an extremely common and heterogeneous neurodevelopmental disorder. While genetic factors are known to play...AFRL-SA-WP-TR-2013-0013 Comprehensive Clinical Phenotyping and Genetic Mapping for the Discovery of Autism Susceptibility Genes...Genetic Mapping for the Discovery of Autism Susceptibility Genes 5a. CONTRACT NUMBER N/A 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER N/A 6
A current view of Alzheimer's disease.
Hooli, Basavaraj V; Tanzi, Rudolph E
2009-07-08
Several genes that influence susceptibility to Alzheimer's disease (AD) have been known for over two decades. Recent advances have elucidated novel candidate genes and the pathogenetic mechanisms underlying neurodegeneration in AD. Here, we summarize what we have learned from studies of the known AD genes with regard to the causes of AD and emerging therapies. We also review key recent discoveries that have enhanced our understanding of the etiology and pathogenesis of this devastating disease, based on new investigations into the genes and molecular mechanisms underlying AD.
Jin, Yulan; Sharma, Ashok; Bai, Shan; Davis, Colleen; Liu, Haitao; Hopkins, Diane; Barriga, Kathy; Rewers, Marian; She, Jin-Xiong
2014-07-01
There is tremendous scientific and clinical value to further improving the predictive power of autoantibodies because autoantibody-positive (AbP) children have heterogeneous rates of progression to clinical diabetes. This study explored the potential of gene expression profiles as biomarkers for risk stratification among 104 AbP subjects from the Diabetes Autoimmunity Study in the Young (DAISY) using a discovery data set based on microarray and a validation data set based on real-time RT-PCR. The microarray data identified 454 candidate genes with expression levels associated with various type 1 diabetes (T1D) progression rates. RT-PCR analyses of the top-27 candidate genes confirmed 5 genes (BACH2, IGLL3, EIF3A, CDC20, and TXNDC5) associated with differential progression and implicated in lymphocyte activation and function. Multivariate analyses of these five genes in the discovery and validation data sets identified and confirmed four multigene models (BI, ICE, BICE, and BITE, with each letter representing a gene) that consistently stratify high- and low-risk subsets of AbP subjects with hazard ratios >6 (P < 0.01). The results suggest that these genes may be involved in T1D pathogenesis and potentially serve as excellent gene expression biomarkers to predict the risk of progression to clinical diabetes for AbP subjects. © 2014 by the American Diabetes Association.
WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M
2006-01-01
Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281
Zhang, Min; Zhang, Lin; Zou, Jinfeng; Yao, Chen; Xiao, Hui; Liu, Qing; Wang, Jing; Wang, Dong; Wang, Chenguang; Guo, Zheng
2009-07-01
According to current consistency metrics such as percentage of overlapping genes (POG), lists of differentially expressed genes (DEGs) detected from different microarray studies for a complex disease are often highly inconsistent. This irreproducibility problem also exists in other high-throughput post-genomic areas such as proteomics and metabolism. A complex disease is often characterized with many coordinated molecular changes, which should be considered when evaluating the reproducibility of discovery lists from different studies. We proposed metrics percentage of overlapping genes-related (POGR) and normalized POGR (nPOGR) to evaluate the consistency between two DEG lists for a complex disease, considering correlated molecular changes rather than only counting gene overlaps between the lists. Based on microarray datasets of three diseases, we showed that though the POG scores for DEG lists from different studies for each disease are extremely low, the POGR and nPOGR scores can be rather high, suggesting that the apparently inconsistent DEG lists may be highly reproducible in the sense that they are actually significantly correlated. Observing different discovery results for a disease by the POGR and nPOGR scores will obviously reduce the uncertainty of the microarray studies. The proposed metrics could also be applicable in many other high-throughput post-genomic areas.
Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo
2011-02-10
Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.
Context-sensitive network-based disease genetics prediction and its implications in drug discovery.
Chen, Yang; Xu, Rong
2017-04-01
Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach ( p
Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055
Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.
Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C
2018-08-01
Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.
Huang, Chen; Leung, Ross Ka-Kit; Guo, Min; Tuo, Li; Guo, Lin; Yew, Wing Wai; Lou, Inchio; Lee, Simon Ming Yuen; Sun, Chenghang
2016-01-01
Microbial secondary metabolites are valuable resources for novel drug discovery. In particular, actinomycetes expressed a range of antibiotics against a spectrum of bacteria. In genus level, strain Allosalinactinospora lopnorensis CA15-2T is the first new actinomycete isolated from the Lop Nor region, China. Antimicrobial assays revealed that the strain could inhibit the growth of certain types of bacteria, including Acinetobacter baumannii and Staphylococcus aureus, highlighting its clinical significance. Here we report the 5,894,259 base pairs genome of the strain, containing 5,662 predicted genes, and 832 of them cannot be detected by sequence similarity-based methods, suggesting the new species may carry a novel gene pool. Furthermore, our genome-mining investigation reveals that A. lopnorensis CA15-2T contains 17 gene clusters coding for known or novel secondary metabolites. Meanwhile, at least six secondary metabolites were disclosed from ethyl acetate (EA) extract of the fermentation broth of the strain by high-resolution UPLC-MS. Compared with reported clusters of other species, many new genes were found in clusters, and the physical chromosomal location and order of genes in the clusters are distinct. This study presents evidence in support of A. lopnorensis CA15-2T as a potent natural products source for drug discovery. PMID:26864220
Genetics of rheumatoid arthritis contributes to biology and drug discovery.
Okada, Yukinori; Wu, Di; Trynka, Gosia; Raj, Towfique; Terao, Chikashi; Ikari, Katsunori; Kochi, Yuta; Ohmura, Koichiro; Suzuki, Akari; Yoshida, Shinji; Graham, Robert R; Manoharan, Arun; Ortmann, Ward; Bhangale, Tushar; Denny, Joshua C; Carroll, Robert J; Eyler, Anne E; Greenberg, Jeffrey D; Kremer, Joel M; Pappas, Dimitrios A; Jiang, Lei; Yin, Jian; Ye, Lingying; Su, Ding-Feng; Yang, Jian; Xie, Gang; Keystone, Ed; Westra, Harm-Jan; Esko, Tõnu; Metspalu, Andres; Zhou, Xuezhong; Gupta, Namrata; Mirel, Daniel; Stahl, Eli A; Diogo, Dorothée; Cui, Jing; Liao, Katherine; Guo, Michael H; Myouzen, Keiko; Kawaguchi, Takahisa; Coenen, Marieke J H; van Riel, Piet L C M; van de Laar, Mart A F J; Guchelaar, Henk-Jan; Huizinga, Tom W J; Dieudé, Philippe; Mariette, Xavier; Bridges, S Louis; Zhernakova, Alexandra; Toes, Rene E M; Tak, Paul P; Miceli-Richard, Corinne; Bang, So-Young; Lee, Hye-Soon; Martin, Javier; Gonzalez-Gay, Miguel A; Rodriguez-Rodriguez, Luis; Rantapää-Dahlqvist, Solbritt; Arlestig, Lisbeth; Choi, Hyon K; Kamatani, Yoichiro; Galan, Pilar; Lathrop, Mark; Eyre, Steve; Bowes, John; Barton, Anne; de Vries, Niek; Moreland, Larry W; Criswell, Lindsey A; Karlson, Elizabeth W; Taniguchi, Atsuo; Yamada, Ryo; Kubo, Michiaki; Liu, Jun S; Bae, Sang-Cheol; Worthington, Jane; Padyukov, Leonid; Klareskog, Lars; Gregersen, Peter K; Raychaudhuri, Soumya; Stranger, Barbara E; De Jager, Philip L; Franke, Lude; Visscher, Peter M; Brown, Matthew A; Yamanaka, Hisashi; Mimori, Tsuneyo; Takahashi, Atsushi; Xu, Huji; Behrens, Timothy W; Siminovitch, Katherine A; Momohara, Shigeki; Matsuda, Fumihiko; Yamamoto, Kazuhiko; Plenge, Robert M
2014-02-20
A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological data sets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA). Here we performed a genome-wide association study meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ∼10 million single-nucleotide polymorphisms. We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 101 (refs 2 - 4). We devised an in silico pipeline using established bioinformatics methods based on functional annotation, cis-acting expression quantitative trait loci and pathway analyses--as well as novel methods based on genetic overlap with human primary immunodeficiency, haematological cancer somatic mutations and knockout mouse phenotypes--to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.
Activity-based protein profiling for biochemical pathway discovery in cancer
Nomura, Daniel K.; Dix, Melissa M.; Cravatt, Benjamin F.
2011-01-01
Large-scale profiling methods have uncovered numerous gene and protein expression changes that correlate with tumorigenesis. However, determining the relevance of these expression changes and which biochemical pathways they affect has been hindered by our incomplete understanding of the proteome and its myriad functions and modes of regulation. Activity-based profiling platforms enable both the discovery of cancer-relevant enzymes and selective pharmacological probes to perturb and characterize these proteins in tumour cells. When integrated with other large-scale profiling methods, activity-based proteomics can provide insight into the metabolic and signalling pathways that support cancer pathogenesis and illuminate new strategies for disease diagnosis and treatment. PMID:20703252
2012-01-01
Background Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Results Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since multiple TLRs were found in the generic fever network, it is reasonable to hypothesize that vaccine-TLR interactions may play an important role in inducing fever response, which deserves a further investigation. Conclusions This study demonstrated that ontology-based literature mining is a powerful method for analyzing gene interaction networks and generating new scientific hypotheses. PMID:23256563
Liu, Rong; Guo, Cheng-Xian; Zhou, Hong-Hao
2015-01-01
This study aims to identify effective gene networks and prognostic biomarkers associated with estrogen receptor positive (ER+) breast cancer using human mRNA studies. Weighted gene coexpression network analysis was performed with a complex ER+ breast cancer transcriptome to investigate the function of networks and key genes in the prognosis of breast cancer. We found a significant correlation of an expression module with distant metastasis-free survival (HR = 2.25; 95% CI .21.03-4.88 in discovery set; HR = 1.78; 95% CI = 1.07-2.93 in validation set). This module contained genes enriched in the biological process of the M phase. From this module, we further identified and validated 5 hub genes (CDK1, DLGAP5, MELK, NUSAP1, and RRM2), the expression levels of which were strongly associated with poor survival. Highly expressed MELK indicated poor survival in luminal A and luminal B breast cancer molecular subtypes. This gene was also found to be associated with tamoxifen resistance. Results indicated that a network-based approach may facilitate the discovery of biomarkers for the prognosis of ER+ breast cancer and may also be used as a basis for establishing personalized therapies. Nevertheless, before the application of this approach in clinical settings, in vivo and in vitro experiments and multi-center randomized controlled clinical trials are still needed.
Overview Article: Identifying transcriptional cis-regulatory modules in animal genomes
Suryamohan, Kushal; Halfon, Marc S.
2014-01-01
Gene expression is regulated through the activity of transcription factors and chromatin modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily-identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods has led to an explosion of both computational and empirical methods for CRM discovery in model and non-model organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against transcription factors or histone post-translational modifications, identification of nucleosome-depleted “open” chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted transcription factor binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. PMID:25704908
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.
2013-01-01
The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less
Gene-Specific Demethylation as Targeted Therapy in MDS
2016-07-01
methylation remain elusive. This proposal builds on our recent discovery of a novel class of RNAs , the DiRs or DNMT1-interacting RNAs , involved in...cell type-specific DNA methylation patterns. Based on these findings, we hypothesize that DNA methylation changes can be corrected by RNAs . We aim to...aberrant DNA methylation remain elusive. This proposal builds on our recent discovery of a novel class of RNAs , the DiRs or DNMT1-interacting RNAs
VarDetect: a nucleotide sequence variation exploratory tool
Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades
2008-01-01
Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032
2003-12-09
KENNEDY SPACE CENTER, FLA. - In the Orbiter Processing Facility, KSC employee Gene Peavler works in the wheel area on the orbiter Discovery. The vehicle has undergone Orbiter Major Modifications in the past year. Discovery is scheduled to fly on mission STS-121 to the International Space Station.
Discovery of the leinamycin family of natural products by mining actinobacterial genomes
Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen
2017-01-01
Nature’s ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF–SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF–SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm-type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature’s rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature’s biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity. PMID:29229819
Discovery of the leinamycin family of natural products by mining actinobacterial genomes.
Pan, Guohui; Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Yang, Dong; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen; Shen, Ben
2017-12-26
Nature's ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF-SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF-SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm -type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature's rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature's biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity.
iCOSSY: An Online Tool for Context-Specific Subnetwork Discovery from Gene Expression Data
Saha, Ashis; Jeon, Minji; Tan, Aik Choon; Kang, Jaewoo
2015-01-01
Pathway analyses help reveal underlying molecular mechanisms of complex biological phenotypes. Biologists tend to perform multiple pathway analyses on the same dataset, as there is no single answer. It is often inefficient for them to implement and/or install all the algorithms by themselves. Online tools can help the community in this regard. Here we present an online gene expression analytical tool called iCOSSY which implements a novel pathway-based COntext-specific Subnetwork discoverY (COSSY) algorithm. iCOSSY also includes a few modifications of COSSY to increase its reliability and interpretability. Users can upload their gene expression datasets, and discover important subnetworks of closely interacting molecules to differentiate between two phenotypes (context). They can also interactively visualize the resulting subnetworks. iCOSSY is a web server that finds subnetworks that are differentially expressed in two phenotypes. Users can visualize the subnetworks to understand the biology of the difference. PMID:26147457
Customizing microarrays for neuroscience drug discovery.
Girgenti, Matthew J; Newton, Samuel S
2007-08-01
Microarray-based gene profiling has become the centerpiece of gene expression studies in the biological sciences. The ability to now interrogate the entire genome using a single chip demonstrates the progress in technology and instrumentation that has been made over the last two decades. Although this unbiased approach provides researchers with an immense quantity of data, obtaining meaningful insight is not possible without intensive data analysis and processing. Custom developed arrays have emerged as a viable and attractive alternative that can take advantage of this robust technology and tailor it to suit the needs and requirements of individual investigations. The ability to simplify data analysis, reduce noise and carefully optimize experimental conditions makes it a suitable tool that can be effectively utilized in neuroscience drug discovery efforts. Furthermore, incorporating recent advancements in fine focusing gene profiling to include specific cellular phenotypes can help resolve the complex cellular heterogeneity of the brain. This review surveys the use of microarray technology in neuroscience paying special attention to customized arrays and their potential in drug discovery. Novel applications of microarrays and ancillary techniques, such as laser microdissection, FAC sorting and RNA amplification, have also been discussed. The notion that a hypothesis-driven approach can be integrated into drug development programs is highlighted.
Iversen, Patrick L.; Warren, Travis K.; Wells, Jay B.; Garza, Nicole L.; Mourich, Dan V.; Welch, Lisa S.; Panchal, Rekha G.; Bavari, Sina
2012-01-01
There are no currently approved treatments for filovirus infections. In this study we report the discovery process which led to the development of antisense Phosphorodiamidate Morpholino Oligomers (PMOs) AVI-6002 (composed of AVI-7357 and AVI-7539) and AVI-6003 (composed of AVI-7287 and AVI-7288) targeting Ebola virus and Marburg virus respectively. The discovery process involved identification of optimal transcript binding sites for PMO based RNA-therapeutics followed by screening for effective viral gene target in mouse and guinea pig models utilizing adapted viral isolates. An evolution of chemical modifications were tested, beginning with simple Phosphorodiamidate Morpholino Oligomers (PMO) transitioning to cell penetrating peptide conjugated PMOs (PPMO) and ending with PMOplus containing a limited number of positively charged linkages in the PMO structure. The initial lead compounds were combinations of two agents targeting separate genes. In the final analysis, a single agent for treatment of each virus was selected, AVI-7537 targeting the VP24 gene of Ebola virus and AVI-7288 targeting NP of Marburg virus, and are now progressing into late stage clinical development as the optimal therapeutic candidates. PMID:23202506
Iversen, Patrick L; Warren, Travis K; Wells, Jay B; Garza, Nicole L; Mourich, Dan V; Welch, Lisa S; Panchal, Rekha G; Bavari, Sina
2012-11-06
There are no currently approved treatments for filovirus infections. In this study we report the discovery process which led to the development of antisense Phosphorodiamidate Morpholino Oligomers (PMOs) AVI-6002 (composed of AVI-7357 and AVI-7539) and AVI-6003 (composed of AVI-7287 and AVI-7288) targeting Ebola virus and Marburg virus respectively. The discovery process involved identification of optimal transcript binding sites for PMO based RNA-therapeutics followed by screening for effective viral gene target in mouse and guinea pig models utilizing adapted viral isolates. An evolution of chemical modifications were tested, beginning with simple Phosphorodiamidate Morpholino Oligomers (PMO) transitioning to cell penetrating peptide conjugated PMOs (PPMO) and ending with PMOplus containing a limited number of positively charged linkages in the PMO structure. The initial lead compounds were combinations of two agents targeting separate genes. In the final analysis, a single agent for treatment of each virus was selected, AVI-7537 targeting the VP24 gene of Ebola virus and AVI-7288 targeting NP of Marburg virus, and are now progressing into late stage clinical development as the optimal therapeutic candidates.
Wu, Changsheng; Du, Chao; Gubbens, Jacob; Choi, Young Hae; van Wezel, Gilles P
2015-10-23
Actinomycetes are a major source of antimicrobials, anticancer compounds, and other medically important products, and their genomes harbor extensive biosynthetic potential. Major challenges in the screening of these microorganisms are to activate the expression of cryptic biosynthetic gene clusters and the development of technologies for efficient dereplication of known molecules. Here we report the identification of a previously unidentified isatin-type antibiotic produced by Streptomyces sp. MBT28, following a strategy based on NMR-based metabolomics combined with the introduction of streptomycin resistance in the producer strain. NMR-guided isolation by tracking the target proton signal resulted in the characterization of 7-prenylisatin (1) with antimicrobial activity against Bacillus subtilis. The metabolite-guided genome mining of Streptomyces sp. MBT28 combined with proteomics identified a gene cluster with an indole prenyltransferase that catalyzes the conversion of tryptophan into 7-prenylisatin. This study underlines the applicability of NMR-based metabolomics in facilitating the discovery of novel antibiotics.
New strategies in drug discovery.
Ohlstein, Eliot H; Johnson, Anthony G; Elliott, John D; Romanic, Anne M
2006-01-01
Gene identification followed by determination of the expression of genes in a given disease and understanding of the function of the gene products is central to the drug discovery process. The ability to associate a specific gene with a disease can be attributed primarily to the extraordinary progress that has been made in the areas of gene sequencing and information technologies. Selection and validation of novel molecular targets have become of great importance in light of the abundance of new potential therapeutic drug targets that have emerged from human gene sequencing. In response to this revolution within the pharmaceutical industry, the development of high-throughput methods in both biology and chemistry has been necessitated. Further, the successful translation of basic scientific discoveries into clinical experimental medicine and novel therapeutics is an increasing challenge. As such, a new paradigm for drug discovery has emerged. This process involves the integration of clinical, genetic, genomic, and molecular phenotype data partnered with cheminformatics. Central to this process, the data generated are managed, collated, and interpreted with the use of informatics. This review addresses the use of new technologies that have arisen to deal with this new paradigm.
Butler, Merlin G; Rafi, Syed K; McGuire, Austen; Manzardo, Ann M
2016-01-01
To provide an update of currently recognized clinically relevant candidate and known genes for human reproduction and related infertility plotted on high resolution chromosome ideograms (850 band level) and represented alphabetically in tabular form. Descriptive authoritative computer-based website and peer-reviewed medical literature searches used pertinent keywords representing human reproduction and related infertility along with genetics and gene mutations. A master list of genes associated with human reproduction and related infertility was generated with a visual representation of gene locations on high resolution chromosome ideograms. GeneAnalytics pathway analysis was carried out on the resulting list of genes to assess underlying genetic architecture for infertility. Advances in genetic technology have led to the discovery of genes responsible for reproduction and related infertility. Genes identified (N=371) in our search primarily impact ovarian steroidogenesis through sex hormone biology, germ cell production, genito-urinary or gonadal development and function, and related peptide production, receptors and regulatory factors. The location of gene symbols plotted on high resolution chromosome ideograms forms a conceptualized image of the distribution of human reproduction genes. The updated master list can be used to promote better awareness of genetics of reproduction and related infertility and advance discoveries on genetic causes and disease mechanisms. Copyright © 2015 Elsevier B.V. All rights reserved.
Genetic and epigenetic control of gene expression by CRISPR–Cas systems
Lo, Albert; Qi, Lei
2017-01-01
The discovery and adaption of bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated (Cas) systems has revolutionized the way researchers edit genomes. Engineering of catalytically inactivated Cas variants (nuclease-deficient or nuclease-deactivated [dCas]) combined with transcriptional repressors, activators, or epigenetic modifiers enable sequence-specific regulation of gene expression and chromatin state. These CRISPR–Cas-based technologies have contributed to the rapid development of disease models and functional genomics screening approaches, which can facilitate genetic target identification and drug discovery. In this short review, we will cover recent advances of CRISPR–dCas9 systems and their use for transcriptional repression and activation, epigenome editing, and engineered synthetic circuits for complex control of the mammalian genome. PMID:28649363
Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming
2015-01-01
In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.
Vojinovic, Dina; Brison, Nathalie; Ahmad, Shahzad; Noens, Ilse; Pappa, Irene; Karssen, Lennart C; Tiemeier, Henning; van Duijn, Cornelia M; Peeters, Hilde; Amin, Najaf
2017-08-01
Autism spectrum disorder (ASD) is a highly heritable neurodevelopmental disorder with a complex genetic architecture. To identify genetic variants underlying ASD, we performed single-variant and gene-based genome-wide association studies using a dense genotyping array containing over 2.3 million single-nucleotide variants in a discovery sample of 160 families with at least one child affected with non-syndromic ASD using a binary (ASD yes/no) phenotype and a quantitative autistic trait. Replication of the top findings was performed in Psychiatric Genomics Consortium and Erasmus Rucphen Family (ERF) cohort study. Significant association of quantitative autistic trait was observed with the TTC25 gene at 17q21.2 (effect size=10.2, P-value=3.4 × 10 -7 ) in the gene-based analysis. The gene also showed nominally significant association in the cohort-based ERF study (effect=1.75, P-value=0.05). Meta-analysis of discovery and replication improved the association signal (P-value meta =1.5 × 10 -8 ). No genome-wide significant signal was observed in the single-variant analysis of either the binary ASD phenotype or the quantitative autistic trait. Our study has identified a novel gene TTC25 to be associated with quantitative autistic trait in patients with ASD. The replication of association in a cohort-based study and the effect estimate suggest that variants in TTC25 may also be relevant for broader ASD phenotype in the general population. TTC25 is overexpressed in frontal cortex and testis and is known to be involved in cilium movement and thus an interesting candidate gene for autistic trait.
Discovery of cancer common and specific driver gene sets
2017-01-01
Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295
Using the TIGR gene index databases for biological discovery.
Lee, Yuandan; Quackenbush, John
2003-11-01
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
MIPHENO: Data normalization for high throughput metabolic analysis.
High throughput methodologies such as microarrays, mass spectrometry and plate-based small molecule screens are increasingly used to facilitate discoveries from gene function to drug candidate identification. These large-scale experiments are typically carried out over the course...
Discovery of Cationic Polymers for Non-viral Gene Delivery using Combinatorial Approaches
Barua, Sutapa; Ramos, James; Potta, Thrimoorthy; Taylor, David; Huang, Huang-Chiao; Montanez, Gabriela; Rege, Kaushal
2015-01-01
Gene therapy is an attractive treatment option for diseases of genetic origin, including several cancers and cardiovascular diseases. While viruses are effective vectors for delivering exogenous genes to cells, concerns related to insertional mutagenesis, immunogenicity, lack of tropism, decay and high production costs necessitate the discovery of non-viral methods. Significant efforts have been focused on cationic polymers as non-viral alternatives for gene delivery. Recent studies have employed combinatorial syntheses and parallel screening methods for enhancing the efficacy of gene delivery, biocompatibility of the delivery vehicle, and overcoming cellular level barriers as they relate to polymer-mediated transgene uptake, transport, transcription, and expression. This review summarizes and discusses recent advances in combinatorial syntheses and parallel screening of cationic polymer libraries for the discovery of efficient and safe gene delivery systems. PMID:21843141
The developmental transcriptome of Drosophila melanogaster
DOE Office of Scientific and Technical Information (OSTI.GOV)
University of Connecticut; Graveley, Brenton R.; Brooks, Angela N.
Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, predictionmore » and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development. Drosophila melanogaster is an important non-mammalian model system that has had a critical role in basic biological discoveries, such as identifying chromosomes as the carriers of genetic information and uncovering the role of genes in development. Because it shares a substantial genic content with humans, Drosophila is increasingly used as a translational model for human development, homeostasis and disease. High-quality maps are needed for all functional genomic elements. Previous studies demonstrated that a rich collection of genes is deployed during the life cycle of the fly. Although expression profiling using microarrays has revealed the expression of, 13,000 annotated genes, it is difficult to map splice junctions and individual base modifications generated by RNA editing using such approaches. Single-base resolution is essential to define precisely the elements that comprise the Drosophila transcriptome. Estimates of the number of transcript isoforms are less accurate than estimates of the number of genes. Whereas, 20% of Drosophila genes are annotated as encoding alternatively spliced premRNAs, splice-junction microarray experiments indicate that this number is at least 40% (ref. 7). Determining the diversity of mRNAs generated by alternative promoters, alternative splicing and RNA editing will substantially increase the inferred protein repertoire. Non-coding RNA genes (ncRNAs) including short interfering RNAs (siRNAs) and microRNAS (miRNAs) (reviewed in ref. 10), and longer ncRNAs such as bxd (ref. 11) and rox (ref. 12), have important roles in gene regulation, whereas others such as small nucleolar RNAs (snoRNAs)and small nuclear RNAs (snRNAs) are important components of macromolecular machines such as the ribosome and spliceosome. The transcription and processing of these ncRNAs must also be fully documented and mapped. As part of the modENCODE project to annotate the functional elements of the D. melanogaster and Caenorhabditis elegans genomes, we used RNA-Seq and tiling microarrays to sample the Drosophila transcriptome at unprecedented depth throughout development from early embryo to ageing male and female adults. We report on a high-resolution view of the discovery, structure and dynamic expression of the D. melanogaster transcriptome.« less
Cancer gene discovery: exploiting insertional mutagenesis
Ranzani, Marco; Annunziato, Stefano; Adams, David J.; Montini, Eugenio
2013-01-01
Insertional mutagenesis has been utilized as a functional forward genetics screen for the identification of novel genes involved in the pathogenesis of human cancers. Different insertional mutagens have been successfully used to reveal new cancer genes. For example, retroviruses (RVs) are integrating viruses with the capacity to induce the deregulation of genes in the neighborhood of the insertion site. RVs have been employed for more than 30 years to identify cancer genes in the hematopoietic system and mammary gland. Similarly, another tool that has revolutionized cancer gene discovery is the cut-and-paste transposons. These DNA elements have been engineered to contain strong promoters and stop cassettes that may function to perturb gene expression upon integration proximal to genes. In addition, complex mouse models characterized by tissue-restricted activity of transposons have been developed to identify oncogenes and tumor suppressor genes that control the development of a wide range of solid tumor types, extending beyond those tissues accessible using RV-based approaches. Most recently, lentiviral vectors (LVs) have appeared on the scene for use in cancer gene screens. LVs are replication defective integrating vectors that have the advantage of being able to infect non-dividing cells, in a wide range of cell types and tissues. In this review, we describe the various insertional mutagens focusing on their advantages/limitations and we discuss the new and promising tools that will improve the insertional mutagenesis screens of the future. PMID:23928056
The third annual BRDS on research and development of nucleic acid-based nanomedicines
Chaudhary, Amit Kumar
2017-01-01
The completion of human genome project, decrease in the sequencing cost, and correlation of genome sequencing data with specific diseases led to the exponential rise in the nucleic acid-based therapeutic approaches. In the third annual Biopharmaceutical Research and Development Symposium (BRDS) held at the Center for Drug Discovery and Lozier Center for Pharmacy Sciences and Education at the University of Nebraska Medical Center (UNMC), we highlighted the remarkable features of the nucleic acid-based nanomedicines, their significance, NIH funding opportunities on nanomedicines and gene therapy research, challenges and opportunities in the clinical translation of nucleic acids into therapeutics, and the role of intellectual property (IP) in drug discovery and development. PMID:27848223
Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D
2013-01-01
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S.; Sinha, Saurabh
2011-01-01
Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. PMID:21821659
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.
2011-02-18
Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant linksmore » between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.« less
Biomarker discovery for colon cancer using a 761 gene RT-PCR assay.
Clark-Langone, Kim M; Wu, Jenny Y; Sangli, Chithra; Chen, Angela; Snable, James L; Nguyen, Anhthu; Hackett, James R; Baker, Joffre; Yothers, Greg; Kim, Chungyeul; Cronin, Maureen T
2007-08-15
Reverse transcription PCR (RT-PCR) is widely recognized to be the gold standard method for quantifying gene expression. Studies using RT-PCR technology as a discovery tool have historically been limited to relatively small gene sets compared to other gene expression platforms such as microarrays. We have recently shown that TaqMan RT-PCR can be scaled up to profile expression for 192 genes in fixed paraffin-embedded (FPE) clinical study tumor specimens. This technology has also been used to develop and commercialize a widely used clinical test for breast cancer prognosis and prediction, the Onco typeDX assay. A similar need exists in colon cancer for a test that provides information on the likelihood of disease recurrence in colon cancer (prognosis) and the likelihood of tumor response to standard chemotherapy regimens (prediction). We have now scaled our RT-PCR assay to efficiently screen 761 biomarkers across hundreds of patient samples and applied this process to biomarker discovery in colon cancer. This screening strategy remains attractive due to the inherent advantages of maintaining platform consistency from discovery through clinical application. RNA was extracted from formalin fixed paraffin embedded (FPE) tissue, as old as 28 years, from 354 patients enrolled in NSABP C-01 and C-02 colon cancer studies. Multiplexed reverse transcription reactions were performed using a gene specific primer pool containing 761 unique primers. PCR was performed as independent TaqMan reactions for each candidate gene. Hierarchal clustering demonstrates that genes expected to co-express form obvious, distinct and in certain cases very tightly correlated clusters, validating the reliability of this technical approach to biomarker discovery. We have developed a high throughput, quantitatively precise multi-analyte gene expression platform for biomarker discovery that approaches low density DNA arrays in numbers of genes analyzed while maintaining the high specificity, sensitivity and reproducibility that are characteristics of RT-PCR. Biomarkers discovered using this approach can be transferred to a clinical reference laboratory setting without having to re-validate the assay on a second technology platform.
Shum, David; Bhinder, Bhavneet; Djaballah, Hakim
2013-01-01
MicroRNAs (miRNAs) are small endogenous and conserved non-coding RNA molecules that regulate gene expression. Although the first miRNA was discovered well over sixteen years ago, little is known about their biogenesis and it is only recently that we have begun to understand their scope and diversity. For this purpose, we performed an RNAi screen aimed at identifying genes involved in their biogenesis pathway with a potential use as biomarkers. Using a previously developed miRNA 21 (miR-21) EGFP-based biosensor cell based assay monitoring green fluorescence enhancements, we performed an arrayed short hairpin RNA (shRNA) screen against a lentiviral particle ready TRC1 library covering 16,039 genes in 384-well plate format, and interrogating the genome one gene at a time building a panoramic view of endogenous miRNA activity. Using the BDA method for RNAi data analysis, we nominate 497 gene candidates the knockdown of which increased the EGFP fluorescence and yielding an initial hit rate of 3.09%; of which only 22, with reported validated clones, are deemed high-confidence gene candidates. An unexpected and surprising result was that only DROSHA was identified as a hit out of the seven core essential miRNA biogenesis genes; suggesting that perhaps intracellular shRNA processing into the correct duplex may be cell dependent and with differential outcome. Biological classification revealed several major control junctions among them genes involved in transport and vesicular trafficking. In summary, we report on 22 high confidence gene candidate regulators of miRNA biogenesis with potential use in drug and biomarker discovery. PMID:23977983
Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.
Schena, M; Shalon, D; Heller, R; Chai, A; Brown, P O; Davis, R W
1996-01-01
Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery. Images Fig. 1 Fig. 2 Fig. 3 PMID:8855227
Phenotypic mutant library: potential for gene discovery
USDA-ARS?s Scientific Manuscript database
The rapid development of high throughput and affordable Next- Generation Sequencing (NGS) techniques has renewed interest in gene discovery using forward genetics. The conventional forward genetic approach starts with isolation of mutants with a phenotype of interest, mapping the mutation within a s...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sumner, Lloyd W.; Lei, Zhentian; Nikolau, Basil J.
Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This study highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR formore » metabolite identifications, and x-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine.« less
San Lucas, F Anthony; Fowler, Jerry; Chang, Kyle; Kopetz, Scott; Vilar, Eduardo; Scheet, Paul
2014-12-01
Large-scale cancer datasets such as The Cancer Genome Atlas (TCGA) allow researchers to profile tumors based on a wide range of clinical and molecular characteristics. Subsequently, TCGA-derived gene expression profiles can be analyzed with the Connectivity Map (CMap) to find candidate drugs to target tumors with specific clinical phenotypes or molecular characteristics. This represents a powerful computational approach for candidate drug identification, but due to the complexity of TCGA and technology differences between CMap and TCGA experiments, such analyses are challenging to conduct and reproduce. We present Cancer in silico Drug Discovery (CiDD; scheet.org/software), a computational drug discovery platform that addresses these challenges. CiDD integrates data from TCGA, CMap, and Cancer Cell Line Encyclopedia (CCLE) to perform computational drug discovery experiments, generating hypotheses for the following three general problems: (i) determining whether specific clinical phenotypes or molecular characteristics are associated with unique gene expression signatures; (ii) finding candidate drugs to repress these expression signatures; and (iii) identifying cell lines that resemble the tumors being studied for subsequent in vitro experiments. The primary input to CiDD is a clinical or molecular characteristic. The output is a biologically annotated list of candidate drugs and a list of cell lines for in vitro experimentation. We applied CiDD to identify candidate drugs to treat colorectal cancers harboring mutations in BRAF. CiDD identified EGFR and proteasome inhibitors, while proposing five cell lines for in vitro testing. CiDD facilitates phenotype-driven, systematic drug discovery based on clinical and molecular data from TCGA. ©2014 American Association for Cancer Research.
Thangapandian, Sundarapandian; John, Shalini; Lee, Yuno; Kim, Songmi; Lee, Keun Woo
2011-01-01
Histone deacetylase 8 (HDAC8) is an enzyme involved in deacetylating the amino groups of terminal lysine residues, thereby repressing the transcription of various genes including tumor suppressor gene. The over expression of HDAC8 was observed in many cancers and thus inhibition of this enzyme has emerged as an efficient cancer therapeutic strategy. In an effort to facilitate the future discovery of HDAC8 inhibitors, we developed two pharmacophore models containing six and five pharmacophoric features, respectively, using the representative structures from two molecular dynamic (MD) simulations performed in Gromacs 4.0.5 package. Various analyses of trajectories obtained from MD simulations have displayed the changes upon inhibitor binding. Thus utilization of the dynamically-responded protein structures in pharmacophore development has the added advantage of considering the conformational flexibility of protein. The MD trajectories were clustered based on single-linkage method and representative structures were taken to be used in the pharmacophore model development. Active site complimenting structure-based pharmacophore models were developed using Discovery Studio 2.5 program and validated using a dataset of known HDAC8 inhibitors. Virtual screening of chemical database coupled with drug-like filter has identified drug-like hit compounds that match the pharmacophore models. Molecular docking of these hits reduced the false positives and identified two potential compounds to be used in future HDAC8 inhibitor design. PMID:22272142
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.
Discovering novel subsystems using comparative genomics
Ferrer, Luciana; Shearer, Alexander G.; Karp, Peter D.
2011-01-01
Motivation: Key problems for computational genomics include discovering novel pathways in genome data, and discovering functional interaction partners for genes to define new members of partially elucidated pathways. Results: We propose a novel method for the discovery of subsystems from annotated genomes. For each gene pair, a score measuring the likelihood that the two genes belong to a same subsystem is computed using genome context methods. Genes are then grouped based on these scores, and the resulting groups are filtered to keep only high-confidence groups. Since the method is based on genome context analysis, it relies solely on structural annotation of the genomes. The method can be used to discover new pathways, find missing genes from a known pathway, find new protein complexes or other kinds of functional groups and assign function to genes. We tested the accuracy of our method in Escherichia coli K-12. In one configuration of the system, we find that 31.6% of the candidate groups generated by our method match a known pathway or protein complex closely, and that we rediscover 31.2% of all known pathways and protein complexes of at least 4 genes. We believe that a significant proportion of the candidates that do not match any known group in E.coli K-12 corresponds to novel subsystems that may represent promising leads for future laboratory research. We discuss in-depth examples of these findings. Availability: Predicted subsystems are available at http://brg.ai.sri.com/pwy-discovery/journal.html. Contact: lferrer@ai.sri.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21775308
Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan
2017-03-15
The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation. Copyright © 2017 Elsevier Inc. All rights reserved.
Fang, J; Cai, C; Wang, Q; Lin, P; Zhao, Z; Cheng, F
2017-03-01
Massive cancer genomics data have facilitated the rapid revolution of a novel oncology drug discovery paradigm through targeting clinically relevant driver genes or mutations for the development of precision oncology. Natural products with polypharmacological profiles have been demonstrated as promising agents for the development of novel cancer therapies. In this study, we developed an integrated systems pharmacology framework that facilitated identifying potential natural products that target mutated genes across 15 cancer types or subtypes in the realm of precision medicine. High performance was achieved for our systems pharmacology framework. In case studies, we computationally identified novel anticancer indications for several US Food and Drug Administration-approved or clinically investigational natural products (e.g., resveratrol, quercetin, genistein, and fisetin) through targeting significantly mutated genes in multiple cancer types. In summary, this study provides a powerful tool for the development of molecularly targeted cancer therapies through targeting the clinically actionable alterations by exploiting the systems pharmacology of natural products. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent
2009-01-01
Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new HEATSEEKER module. Conclusion STARNET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a STARNET network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at , and does not require user registration. PMID:19828039
2011-01-01
Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
Dual transcriptional-translational cascade permits cellular level tuneable expression control
Morra, Rosa; Shankar, Jayendra; Robinson, Christopher J.; Halliwell, Samantha; Butler, Lisa; Upton, Mathew; Hay, Sam; Micklefield, Jason; Dixon, Neil
2016-01-01
The ability to induce gene expression in a small molecule dependent manner has led to many applications in target discovery, functional elucidation and bio-production. To date these applications have relied on a limited set of protein-based control mechanisms operating at the level of transcription initiation. The discovery, design and reengineering of riboswitches offer an alternative means by which to control gene expression. Here we report the development and characterization of a novel tunable recombinant expression system, termed RiboTite, which operates at both the transcriptional and translational level. Using standard inducible promoters and orthogonal riboswitches, a multi-layered modular genetic control circuit was developed to control the expression of both bacteriophage T7 RNA polymerase and recombinant gene(s) of interest. The system was benchmarked against a number of commonly used E. coli expression systems, and shows tight basal control, precise analogue tunability of gene expression at the cellular level, dose-dependent regulation of protein production rates over extended growth periods and enhanced cell viability. This novel system expands the number of E. coli expression systems for use in recombinant protein production and represents a major performance enhancement over and above the most widely used expression systems. PMID:26405200
Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S
2017-04-01
Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM 10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM 2.5 and PM 10 exposures over 2-years were estimated for each participant's residential address using spatiotemporal interpolation in combination with a dispersion model. Average long-term PM 10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m 3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM 10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM 10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 ( p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM 10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM 10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Expression of the sex-specific candidate genes identified in the discovery population predicted PM 10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular mechanisms for PM-induced health effects.
High-Density Real-Time PCR-Based in Vivo Toxicogenomic Screen to Predict Organ-Specific Toxicity
Fabian, Gabriella; Farago, Nora; Feher, Liliana Z.; Nagy, Lajos I.; Kulin, Sandor; Kitajka, Klara; Bito, Tamas; Tubak, Vilmos; Katona, Robert L.; Tiszlavicz, Laszlo; Puskas, Laszlo G.
2011-01-01
Toxicogenomics, based on the temporal effects of drugs on gene expression, is able to predict toxic effects earlier than traditional technologies by analyzing changes in genomic biomarkers that could precede subsequent protein translation and initiation of histological organ damage. In the present study our objective was to extend in vivo toxicogenomic screening from analyzing one or a few tissues to multiple organs, including heart, kidney, brain, liver and spleen. Nanocapillary quantitative real-time PCR (QRT-PCR) was used in the study, due to its higher throughput, sensitivity and reproducibility, and larger dynamic range compared to DNA microarray technologies. Based on previous data, 56 gene markers were selected coding for proteins with different functions, such as proteins for acute phase response, inflammation, oxidative stress, metabolic processes, heat-shock response, cell cycle/apoptosis regulation and enzymes which are involved in detoxification. Some of the marker genes are specific to certain organs, and some of them are general indicators of toxicity in multiple organs. Utility of the nanocapillary QRT-PCR platform was demonstrated by screening different references, as well as discovery of drug-like compounds for their gene expression profiles in different organs of treated mice in an acute experiment. For each compound, 896 QRT-PCR were done: four organs were used from each of the treated four animals to monitor the relative expression of 56 genes. Based on expression data of the discovery gene set of toxicology biomarkers the cardio- and nephrotoxicity of doxorubicin and sulfasalazin, the hepato- and nephrotoxicity of rotenone, dihydrocoumarin and aniline, and the liver toxicity of 2,4-diaminotoluene could be confirmed. The acute heart and kidney toxicity of the active metabolite SN-38 from its less toxic prodrug, irinotecan could be differentiated, and two novel gene markers for hormone replacement therapy were identified, namely fabp4 and pparg, which were down-regulated by estradiol treatment. PMID:22016648
Molecular genetics of early-onset Alzheimer's disease revisited.
Cacace, Rita; Sleegers, Kristel; Van Broeckhoven, Christine
2016-06-01
As the discovery of the Alzheimer's disease (AD) genes, APP, PSEN1, and PSEN2, in families with autosomal dominant early-onset AD (EOAD), gene discovery in familial EOAD came more or less to a standstill. Only 5% of EOAD patients are carrying a pathogenic mutation in one of the AD genes or a apolipoprotein E (APOE) risk allele ε4, most of EOAD patients remain unexplained. Here, we aimed at summarizing the current knowledge of EOAD genetics and its role in ongoing approaches to understand the biology of AD and disease symptomatology as well as developing new therapeutics. Next, we explored the possible molecular mechanisms that might underlie the missing genetic etiology of EOAD and discussed how the use of massive parallel sequencing technologies triggered novel gene discoveries. To conclude, we commented on the relevance of reinvestigating EOAD patients as a means to explore potential new avenues for translational research and therapeutic discoveries. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Agarwal, Gaurav; Clevenger, Josh; Pandey, Manish K; Wang, Hui; Shasidhar, Yaduru; Chu, Ye; Fountain, Jake C; Choudhary, Divya; Culbreath, Albert K; Liu, Xin; Huang, Guodong; Wang, Xingjun; Deshmukh, Rupesh; Holbrook, C Corley; Bertioli, David J; Ozias-Akins, Peggy; Jackson, Scott A; Varshney, Rajeev K; Guo, Baozhu
2018-04-10
Whole-genome resequencing (WGRS) of mapping populations has facilitated development of high-density genetic maps essential for fine mapping and candidate gene discovery for traits of interest in crop species. Leaf spots, including early leaf spot (ELS) and late leaf spot (LLS), and Tomato spotted wilt virus (TSWV) are devastating diseases in peanut causing significant yield loss. We generated WGRS data on a recombinant inbred line population, developed a SNP-based high-density genetic map, and conducted fine mapping, candidate gene discovery and marker validation for ELS, LLS and TSWV. The first sequence-based high-density map was constructed with 8869 SNPs assigned to 20 linkage groups, representing 20 chromosomes, for the 'T' population (Tifrunner × GT-C20) with a map length of 3120 cM and an average distance of 1.45 cM. The quantitative trait locus (QTL) analysis using high-density genetic map and multiple season phenotyping data identified 35 main-effect QTLs with phenotypic variation explained (PVE) from 6.32% to 47.63%. Among major-effect QTLs mapped, there were two QTLs for ELS on B05 with 47.42% PVE and B03 with 47.38% PVE, two QTLs for LLS on A05 with 47.63% and B03 with 34.03% PVE and one QTL for TSWV on B09 with 40.71% PVE. The epistasis and environment interaction analyses identified significant environmental effects on these traits. The identified QTL regions had disease resistance genes including R-genes and transcription factors. KASP markers were developed for major QTLs and validated in the population and are ready for further deployment in genomics-assisted breeding in peanut. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Cross-organism learning method to discover new gene functionalities.
Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro
2016-04-01
Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Chen, Ruibing; Li, Qing; Tan, Hexin; Chen, Junfeng; Xiao, Ying; Ma, Ruifang; Gao, Shouhong; Zerbe, Philipp; Chen, Wansheng; Zhang, Lei
2015-01-01
Root and leaf tissue of Isatis indigotica shows notable anti-viral efficacy, and are widely used as “Banlangen” and “Daqingye” in traditional Chinese medicine. The plants' pharmacological activity is attributed to phenylpropanoids, especially a group of lignan metabolites. However, the biosynthesis of lignans in I. indigotica remains opaque. This study describes the discovery and analysis of biosynthetic genes and AP2/ERF-type transcription factors involved in lignan biosynthesis in I. indigotica. MeJA treatment revealed differential expression of three genes involved in phenylpropanoid backbone biosynthesis (IiPAL, IiC4H, Ii4CL), five genes involved in lignan biosynthesis (IiCAD, IiC3H, IiCCR, IiDIR, and IiPLR), and 112 putative AP2/ERF transcription factors. In addition, four intermediates of lariciresinol biosynthesis were found to be induced. Based on these results, a canonical correlation analysis using Pearson's correlation coefficient was performed to construct gene-to-metabolite networks and identify putative key genes and rate-limiting reactions in lignan biosynthesis. Over-expression of IiC3H, identified as a key pathway gene, was used for metabolic engineering of I. indigotica hairy roots, and resulted in an increase in lariciresinol production. These findings illustrate the utility of canonical correlation analysis for the discovery and metabolic engineering of key metabolic genes in plants. PMID:26579184
Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M.; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S.
2016-01-01
Background: Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Objectives: Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Methods: Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM2.5 and PM10 exposures over 2-years were estimated for each participant’s residential address using spatiotemporal interpolation in combination with a dispersion model. Results: Average long-term PM10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 (p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Conclusions: Expression of the sex-specific candidate genes identified in the discovery population predicted PM10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular mechanisms for PM-induced health effects. Citation: Vrijens K, Winckelmans E, Tsamou M, Baeyens W, De Boever P, Jennen D, de Kok TM, Den Hond E, Lefebvre W, Plusquin M, Reynders H, Schoeters G, Van Larebeke N, Vanpoucke C, Kleinjans J, Nawrot TS. 2017. Sex-specific associations between particulate matter exposure and gene expression in independent discovery and validation cohorts of middle-aged men and women. Environ Health Perspect 125:660–669; http://dx.doi.org/10.1289/EHP370 PMID:27740511
High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.
Zhou, Yuexin; Zhu, Shiyou; Cai, Changzu; Yuan, Pengfei; Li, Chunmei; Huang, Yanyi; Wei, Wensheng
2014-05-22
Targeted genome editing technologies are powerful tools for studying biology and disease, and have a broad range of research applications. In contrast to the rapid development of toolkits to manipulate individual genes, large-scale screening methods based on the complete loss of gene expression are only now beginning to be developed. Here we report the development of a focused CRISPR/Cas-based (clustered regularly interspaced short palindromic repeats/CRISPR-associated) lentiviral library in human cells and a method of gene identification based on functional screening and high-throughput sequencing analysis. Using knockout library screens, we successfully identified the host genes essential for the intoxication of cells by anthrax and diphtheria toxins, which were confirmed by functional validation. The broad application of this powerful genetic screening strategy will not only facilitate the rapid identification of genes important for bacterial toxicity but will also enable the discovery of genes that participate in other biological processes.
iPcc: a novel feature extraction method for accurate disease class discovery and prediction
Ren, Xianwen; Wang, Yong; Zhang, Xiang-Sun; Jin, Qi
2013-01-01
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles. PMID:23761440
Edmondson, Rasheena; Broglie, Jessica Jenkins; Adcock, Audrey F.
2014-01-01
Abstract Three-dimensional (3D) cell culture systems have gained increasing interest in drug discovery and tissue engineering due to their evident advantages in providing more physiologically relevant information and more predictive data for in vivo tests. In this review, we discuss the characteristics of 3D cell culture systems in comparison to the two-dimensional (2D) monolayer culture, focusing on cell growth conditions, cell proliferation, population, and gene and protein expression profiles. The innovations and development in 3D culture systems for drug discovery over the past 5 years are also reviewed in the article, emphasizing the cellular response to different classes of anticancer drugs, focusing particularly on similarities and differences between 3D and 2D models across the field. The progression and advancement in the application of 3D cell cultures in cell-based biosensors is another focal point of this review. PMID:24831787
Materials Informatics: The Materials ``Gene'' and Big Data
NASA Astrophysics Data System (ADS)
Rajan, Krishna
2015-07-01
Materials informatics provides the foundations for a new paradigm of materials discovery. It shifts our emphasis from one of solely searching among large volumes of data that may be generated by experiment or computation to one of targeted materials discovery via high-throughput identification of the key factors (i.e., “genes”) and via showing how these factors can be quantitatively integrated by statistical learning methods into design rules (i.e., “gene sequencing”) governing targeted materials functionality. However, a critical challenge in discovering these materials genes is the difficulty in unraveling the complexity of the data associated with numerous factors including noise, uncertainty, and the complex diversity of data that one needs to consider (i.e., Big Data). In this article, we explore one aspect of materials informatics, namely how one can efficiently explore for new knowledge in regimes of structure-property space, especially when no reasonable selection pathways based on theory or clear trends in observations exist among an almost infinite set of possibilities.
Lee, Nam-Kyung; Bidlingmaier, Scott; Su, Yang; Liu, Bin
2018-01-01
Monoclonal antibodies and antibody-derived therapeutics have emerged as a rapidly growing class of biological drugs for the treatment of cancer, autoimmunity, infection, and neurological diseases. To support the development of human antibodies, various display techniques based on antibody gene repertoires have been constructed over the last two decades. In particular, scFv-antibody phage display has been extensively utilized to select lead antibodies against a variety of target antigens. To construct a scFv phage display that enables efficient antibody discovery, and optimization, it is desirable to develop a system that allows modular assembly of highly diverse variable heavy chain and light chain (Vκ and Vλ) repertoires. Here, we describe modular construction of large non-immune human antibody phage-display libraries built on variable gene cassettes from heavy chain and light chain repertoires (Vκ- and Vλ-light can be made into independent cassettes). We describe utility of such libraries in antibody discovery and optimization through chain shuffling.
Hassane, Duane C.; Guzman, Monica L.; Corbett, Cheryl; Li, Xiaojie; Abboud, Ramzi; Young, Fay; Liesveld, Jane L.; Carroll, Martin
2008-01-01
Increasing evidence indicates that malignant stem cells are important for the pathogenesis of acute myelogenous leukemia (AML) and represent a reservoir of cells that drive the development of AML and relapse. Therefore, new treatment regimens are necessary to prevent relapse and improve therapeutic outcomes. Previous studies have shown that the sesquiterpene lactone, parthenolide (PTL), ablates bulk, progenitor, and stem AML cells while causing no appreciable toxicity to normal hematopoietic cells. Thus, PTL must evoke cellular responses capable of mediating AML selective cell death. Given recent advances in chemical genomics such as gene expression-based high-throughput screening (GE-HTS) and the Connectivity Map, we hypothesized that the gene expression signature resulting from treatment of primary AML with PTL could be used to search for similar signatures in publicly available gene expression profiles deposited into the Gene Expression Omnibus (GEO). We therefore devised a broad in silico screen of the GEO database using the PTL gene expression signature as a template and discovered 2 new agents, celastrol and 4-hydroxy-2-nonenal, that effectively eradicate AML at the bulk, progenitor, and stem cell level. These findings suggest the use of multicenter collections of high-throughput data to facilitate discovery of leukemia drugs and drug targets. PMID:18305216
Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases
Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand
2010-01-01
The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. PMID:20885778
Liu, Ling; Huang, Jin-Sha; Han, Chao; Zhang, Guo-Xin; Xu, Xiao-Yun; Shen, Yan; Li, Jie; Jiang, Hai-Yang; Lin, Zhi-Cheng; Xiong, Nian; Wang, Tao
2016-12-01
Huntington's disease (HD) is an incurable neurodegenerative disorder that is characterized by motor dysfunction, cognitive impairment, and behavioral abnormalities. It is an autosomal dominant disorder caused by a CAG repeat expansion in the huntingtin gene, resulting in progressive neuronal loss predominately in the striatum and cortex. Despite the discovery of the causative gene in 1993, the exact mechanisms underlying HD pathogenesis have yet to be elucidated. Treatments that slow or halt the disease process are currently unavailable. Recent advances in induced pluripotent stem cell (iPSC) technologies have transformed our ability to study disease in human neural cells. Here, we firstly review the progress made to model HD in vitro using patient-derived iPSCs, which reveal unique insights into illuminating molecular mechanisms and provide a novel human cell-based platform for drug discovery. We then highlight the promises and challenges for pluripotent stem cells that might be used as a therapeutic source for cell replacement therapy of the lost neurons in HD brains.
Challenges of the information age: the impact of false discovery on pathway identification.
Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E
2012-11-21
Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.
Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N
2013-03-15
The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
Hastie, Eric; Samulski, R Jude
2015-05-01
Fifty years after the discovery of adeno-associated virus (AAV) and more than 30 years after the first gene transfer experiment was conducted, dozens of gene therapy clinical trials are in progress, one vector is approved for use in Europe, and breakthroughs in virus modification and disease modeling are paving the way for a revolution in the treatment of rare diseases, cancer, as well as HIV. This review will provide a historical perspective on the progression of AAV for gene therapy from discovery to the clinic, focusing on contributions from the Samulski lab regarding basic science and cloning of AAV, optimized large-scale production of vectors, preclinical large animal studies and safety data, vector modifications for improved efficacy, and successful clinical applications.
McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.
2012-01-01
Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946
USDA-ARS?s Scientific Manuscript database
The next generation sequencing (NGS) technologies have opened a wealth of opportunities for plant breeding and genomics research, and changed the paradigms of marker detection, genotyping, and gene discovery. Abundant genomic resources have been generated using a whole genome resequencing (WGR) str...
A brief history of Alzheimer's disease gene discovery.
Tanzi, Rudolph E
2013-01-01
The rich and colorful history of gene discovery in Alzheimer's disease (AD) over the past three decades is as complex and heterogeneous as the disease, itself. Twin and family studies indicate that genetic factors are estimated to play a role in at least 80% of AD cases. The inheritance of AD exhibits a dichotomous pattern. On one hand, rare mutations inAPP, PSEN1, and PSEN2 are fully penetrant for early-onset (<60 years) familial AD, which represents <5% of AD. On the other hand, common gene polymorphisms, such as the 4 and 2 variants of the APOE gene, influence susceptibility for common (>95%) late-onset AD. These four genes account for 30-50% of the inheritability of AD. Genome-wide association studies have recently led to the identification of additional highly confirmed AD candidate genes. Here, I review the past, present, and future of attempts to elucidate the complex and heterogeneous genetic underpinnings of AD along with some of the unique events that made these discoveries possible.
Kao, Chung-Feng; Chen, Hui-Wen; Chen, Hsi-Chung; Yang, Jenn-Hwai; Huang, Ming-Chyi; Chiu, Yi-Hang; Lin, Shih-Ku; Lee, Ya-Chin; Liu, Chih-Min; Chuang, Li-Chung; Chen, Chien-Hsiun; Wu, Jer-Yuarn; Lu, Ru-Band; Kuo, Po-Hsiu
2016-12-01
This study aimed to identify susceptible loci and enriched pathways for bipolar disorder subtype II. We conducted a genome-wide association scan in discovery samples with 189 bipolar disorder subtype II patients and 1773 controls, and replication samples with 283 bipolar disorder subtype II patients and 500 controls in a Taiwanese Han population using Affymetrix Axiom Genome-Wide CHB1 Array. We performed single-marker and gene-based association analyses, as well as calculated polygeneic risk scores for bipolar disorder subtype II. Pathway enrichment analyses were employed to reveal significant biological pathways. Seven markers were found to be associated with bipolar disorder subtype II in meta-analysis combining both discovery and replication samples (P<5.0×10 -6 ), including markers in or close to MYO16, HSP90AB3P, noncoding gene LOC100507632, and markers in chromosomes 4 and 10. A novel locus, ETF1, was associated with bipolar disorder subtype II (P<6.0×10 -3 ) in gene-based association tests. Results of risk evaluation demonstrated that higher genetic risk scores were able to distinguish bipolar disorder subtype II patients from healthy controls in both discovery (P=3.9×10 -4 ~1.0×10 -3 ) and replication samples (2.8×10 -4 ~1.7×10 -3 ). Genetic variance explained by chip markers for bipolar disorder subtype II was substantial in the discovery (55.1%) and replication (60.5%) samples. Moreover, pathways related to neurodevelopmental function, signal transduction, neuronal system, and cell adhesion molecules were significantly associated with bipolar disorder subtype II. We reported novel susceptible loci for pure bipolar subtype II disorder that is less addressed in the literature. Future studies are needed to confirm the roles of these loci for bipolar disorder subtype II. © The Author 2016. Published by Oxford University Press on behalf of CINP.
Limitations and potentials of current motif discovery algorithms
Hu, Jianjun; Li, Bin; Kihara, Daisuke
2005-01-01
Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. PMID:16284194
A comparative review of estimates of the proportion unchanged genes and the false discovery rate
Broberg, Per
2005-01-01
Background In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available. Results A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error. Conclusion The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information regarding false positive and negative rates as well as the proportion unchanged when identifying changed genes. PMID:16086831
Generating mouse lines for lineage tracing and knockout studies.
Kraus, Petra; Sivakamasundari, V; Xing, Xing; Lufkin, Thomas
2014-01-01
In 2007 Capecchi, Evans, and Smithies received the Nobel Prize in recognition for discovering the principles for introducing specific gene modifications in mice via embryonic stem cells, a technology, which has revolutionized the field of biomedical science allowing for the generation of genetically engineered animals. Here we describe detailed protocols based on and developed from these ground-breaking discoveries, allowing for the modification of genes not only to create mutations to study gene function but additionally to modify genes with fluorescent markers, thus permitting the isolation of specific rare wild-type and mutant cell types for further detailed analysis at the biochemical, pathological, and genomic levels.
Deng, Yan; Wang, Chi Chiu; Choy, Kwong Wai; Du, Quan; Chen, Jiao; Wang, Qin; Li, Lu; Chung, Tony Kwok Hung; Tang, Tao
2014-04-01
During recent decades there have been remarkable advances in biology, in which one of the most important discoveries is RNA interference (RNAi). RNAi is a specific post-transcriptional regulatory pathway that can result in silencing gene functions. Efforts have been done to translate this new discovery into clinical applications for disease treatment. However, technical difficulties restrict the development of RNAi, including stability, off-target effects, immunostimulation and delivery problems. Researchers have attempted to surmount these barriers and improve the bioavailability and safety of RNAi-based therapeutics by optimizing the chemistry and structure of these molecules. This paper aimed to describe the principles of RNA interference, review the therapeutic potential in various diseases and discuss the new strategies for in vivo delivery of RNAi to overcome the challenges. Copyright © 2013 Elsevier B.V. All rights reserved.
Kalaitzis, John A
2013-01-01
The marine actinomycete Streptomyces maritimus produces a structurally diverse set of unusual polyketide natural products including the major metabolite enterocin. Investigations of enterocin biosynthesis revealed that the unique carbon skeleton is derived from an aromatic polyketide pathway which is genetically coded by the 21.3 kb enc gene cluster in S. maritimus. Characterization of the enc biosynthesis gene cluster and subsequent manipulation of it via heterologous expression and/or mutagenesis enabled the discovery of other enc-based metabolites that were produced in only very minor amounts in the wild type. Also described are techniques used to harness the enterocin biosynthetic machinery in order to generate unnatural enc-derived polyketide analogues. This review focuses upon the molecular methods used in combination with classical natural products detection and isolation techniques to access minor metabolites of the S. maritimus secondary metabolome.
Five critical elements to ensure the precision medicine.
Chen, Chengshui; He, Mingyan; Zhu, Yichun; Shi, Lin; Wang, Xiangdong
2015-06-01
The precision medicine as a new emerging area and therapeutic strategy has occurred and was practiced in the individual and brought unexpected successes, and gained high attentions from professional and social aspects as a new path to improve the treatment and prognosis of patients. There will be a number of new components to appear or be discovered, of which clinical bioinformatics integrates clinical phenotypes and informatics with bioinformatics, computational science, mathematics, and systems biology. In addition to those tools, precision medicine calls more accurate and repeatable methodologies for the identification and validation of gene discovery. Precision medicine will bring more new therapeutic strategies, drug discovery and development, and gene-oriented treatment. There is an urgent need to identify and validate disease-specific, mechanism-based, or epigenetics-dependent biomarkers to monitor precision medicine, and develop "precision" regulations to guard the application of precision medicine.
Modeling Emergence in Neuroprotective Regulatory Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanfilippo, Antonio P.; Haack, Jereme N.; McDermott, Jason E.
2013-01-05
The use of predictive modeling in the analysis of gene expression data can greatly accelerate the pace of scientific discovery in biomedical research by enabling in silico experimentation to test disease triggers and potential drug therapies. Techniques that focus on modeling emergence, such as agent-based modeling and multi-agent simulations, are of particular interest as they support the discovery of pathways that may have never been observed in the past. Thus far, these techniques have been primarily applied at the multi-cellular level, or have focused on signaling and metabolic networks. We present an approach where emergence modeling is extended to regulatorymore » networks and demonstrate its application to the discovery of neuroprotective pathways. An initial evaluation of the approach indicates that emergence modeling provides novel insights for the analysis of regulatory networks that can advance the discovery of acute treatments for stroke and other diseases.« less
Xu, Rong; Li, Li; Wang, QuanQiu
2013-01-01
Motivation: Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease–phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease–manifestation (D-M) pairs (one specific type of disease–phenotype relationship) from the wide body of published biomedical literature. Data and Methods: Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M–specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. Results: In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. Conclusions: The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. Availability: http://nlp.case.edu/public/data/DMPatternUMLS/ Contact: rxx@case.edu PMID:23828786
Song, Min
2016-01-01
In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695
TOXICOGENOMICS DRUG DISCOVERY AND THE PATHOLOGIST
Toxicogenomics, drug discovery, and pathologist.
The field of toxicogenomics, which currently focuses on the application of large-scale differential gene expression (DGE) data to toxicology, is starting to influence drug discovery and development in the pharmaceutical indu...
Computational biology for cardiovascular biomarker discovery.
Azuaje, Francisco; Devaux, Yvan; Wagner, Daniel
2009-07-01
Computational biology is essential in the process of translating biological knowledge into clinical practice, as well as in the understanding of biological phenomena based on the resources and technologies originating from the clinical environment. One such key contribution of computational biology is the discovery of biomarkers for predicting clinical outcomes using 'omic' information. This process involves the predictive modelling and integration of different types of data and knowledge for screening, diagnostic or prognostic purposes. Moreover, this requires the design and combination of different methodologies based on statistical analysis and machine learning. This article introduces key computational approaches and applications to biomarker discovery based on different types of 'omic' data. Although we emphasize applications in cardiovascular research, the computational requirements and advances discussed here are also relevant to other domains. We will start by introducing some of the contributions of computational biology to translational research, followed by an overview of methods and technologies used for the identification of biomarkers with predictive or classification value. The main types of 'omic' approaches to biomarker discovery will be presented with specific examples from cardiovascular research. This will include a review of computational methodologies for single-source and integrative data applications. Major computational methods for model evaluation will be described together with recommendations for reporting models and results. We will present recent advances in cardiovascular biomarker discovery based on the combination of gene expression and functional network analyses. The review will conclude with a discussion of key challenges for computational biology, including perspectives from the biosciences and clinical areas.
Muchero, Wellington
2018-01-15
Wellington Muchero from Oak Ridge National Laboratory gives a talk titled "Discovery of Cell Wall Biosynthesis Genes in Populus" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
2014-01-01
Background In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. Results We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. Conclusions The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. PMID:24731138
USDA-ARS?s Scientific Manuscript database
We present here a whole-cell and permeabilized E. coli cell 1' active/inactive microplate screen for ß-D-xylosidase, xylanase, endocellulase, and ferulic acid esterase enzyme activities which are critical for the enzymatic deconstruction of biomass for fuels and chemicals. Transformants from genomic...
Human Gene Discovery Laboratory: A Problem-Based Learning Experience
ERIC Educational Resources Information Center
Bonds, Wesley D., Sr.; Paolella, Mary Jane
2006-01-01
A single-semester elective combines Mendelian and molecular genetics in a problem-solving format. Students encounter a genetic disease scenario, construct a family pedigree, and try to confirm their medical diagnoses through laboratory experiences. Encouraged to generate ideas as they test their hypotheses, students realize the importance of data…
Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo
2004-05-01
Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).
GEM-TREND: a web tool for gene expression data mining toward relevant network discovery
Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi
2009-01-01
Background DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865
GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.
Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi
2009-09-03
DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.
Zhu, Xiaohong; Pattathil, Sivakumar; Mazumder, Koushik; Brehm, Amanda; Hahn, Michael G; Dinesh-Kumar, S P; Joshi, Chandrashekhar P
2010-09-01
Virus-induced gene silencing (VIGS) is a powerful genetic tool for rapid assessment of plant gene functions in the post-genomic era. Here, we successfully implemented a Tobacco Rattle Virus (TRV)-based VIGS system to study functions of genes involved in either primary or secondary cell wall formation in Nicotiana benthamiana plants. A 3-week post-VIGS time frame is sufficient to observe phenotypic alterations in the anatomical structure of stems and chemical composition of the primary and secondary cell walls. We used cell wall glycan-directed monoclonal antibodies to demonstrate that alteration of cell wall polymer synthesis during the secondary growth phase of VIGS plants has profound effects on the extractability of components from woody stem cell walls. Therefore, TRV-based VIGS together with cell wall component profiling methods provide a high-throughput gene discovery platform for studying plant cell wall formation from a bioenergy perspective.
flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection
Stanley, Craig E.; Kulathinal, Rob J.
2016-01-01
With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. PMID:27226167
flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection.
Stanley, Craig E; Kulathinal, Rob J
2016-08-09
With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster's breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1-1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. Copyright © 2016 Stanley and Kulathinal.
Whole-Exome Sequencing in Familial Parkinson Disease
Farlow, Janice L.; Robak, Laurie A.; Hetrick, Kurt; Bowling, Kevin; Boerwinkle, Eric; Coban-Akdemir, Zeynep H.; Gambin, Tomasz; Gibbs, Richard A.; Gu, Shen; Jain, Preti; Jankovic, Joseph; Jhangiani, Shalini; Kaw, Kaveeta; Lai, Dongbing; Lin, Hai; Ling, Hua; Liu, Yunlong; Lupski, James R.; Muzny, Donna; Porter, Paula; Pugh, Elizabeth; White, Janson; Doheny, Kimberly; Myers, Richard M.; Shulman, Joshua M.; Foroud, Tatiana
2016-01-01
IMPORTANCE Parkinson disease (PD) is a progressive neurodegenerative disease for which susceptibility is linked to genetic and environmental risk factors. OBJECTIVE To identify genetic variants contributing to disease risk in familial PD. DESIGN, SETTING, AND PARTICIPANTS A 2-stage study design that included a discovery cohort of families with PD and a replication cohort of familial probands was used. In the discovery cohort, rare exonic variants that segregated in multiple affected individuals in a family and were predicted to be conserved or damaging were retained. Genes with retained variants were prioritized if expressed in the brain and located within PD-relevant pathways. Genes in which prioritized variants were observed in at least 4 families were selected as candidate genes for replication in the replication cohort. The setting was among individuals with familial PD enrolled from academic movement disorder specialty clinics across the United States. All participants had a family history of PD. MAIN OUTCOMES AND MEASURES Identification of genes containing rare, likely deleterious, genetic variants in individuals with familial PD using a 2-stage exome sequencing study design. RESULTS The 93 individuals from 32 families in the discovery cohort (49.5% [46 of 93] female) had a mean (SD) age at onset of 61.8 (10.0) years. The 49 individuals with familial PD in the replication cohort (32.6% [16 of 49] female) had a mean (SD) age at onset of 50.1 (15.7) years. Discovery cohort recruitment dates were 1999 to 2009, and replication cohort recruitment dates were 2003 to 2014. Data analysis dates were 2011 to 2015. Three genes containing a total of 13 rare and potentially damaging variants were prioritized in the discovery cohort. Two of these genes (TNK2 and TNR) also had rare variants that were predicted to be damaging in the replication cohort. All 9 variants identified in the 2 replicated genes in 12 families across the discovery and replication cohorts were confirmed via Sanger sequencing. CONCLUSIONS AND RELEVANCE TNK2 and TNR harbored rare, likely deleterious, variants in individuals having familial PD, with similar findings in an independent cohort. To our knowledge, these genes have not been previously associated with PD, although they have been linked to critical neuronal functions. Further studies are required to confirm a potential role for these genes in the pathogenesis of PD. PMID:26595808
A wing expressed sequence tag resource for Bicyclus anynana butterflies, an evo-devo model
Beldade, Patrícia; Rudd, Stephen; Gruber, Jonathan D; Long, Anthony D
2006-01-01
Background Butterfly wing color patterns are a key model for integrating evolutionary developmental biology and the study of adaptive morphological evolution. Yet, despite the biological, economical and educational value of butterflies they are still relatively under-represented in terms of available genomic resources. Here, we describe an Expression Sequence Tag (EST) project for Bicyclus anynana that has identified the largest available collection to date of expressed genes for any butterfly. Results By targeting cDNAs from developing wings at the stages when pattern is specified, we biased gene discovery towards genes potentially involved in pattern formation. Assembly of 9,903 ESTs from a subtracted library allowed us to identify 4,251 genes of which 2,461 were annotated based on BLAST analyses against relevant gene collections. Gene prediction software identified 2,202 peptides, of which 215 longer than 100 amino acids had no homology to any known proteins and, thus, potentially represent novel or highly diverged butterfly genes. We combined gene and Single Nucleotide Polymorphism (SNP) identification by constructing cDNA libraries from pools of outbred individuals, and by sequencing clones from the 3' end to maximize alignment depth. Alignments of multi-member contigs allowed us to identify over 14,000 putative SNPs, with 316 genes having at least one high confidence double-hit SNP. We furthermore identified 320 microsatellites in transcribed genes that can potentially be used as genetic markers. Conclusion Our project was designed to combine gene and sequence polymorphism discovery and has generated the largest gene collection available for any butterfly and many potential markers in expressed genes. These resources will be invaluable for exploring the potential of B. anynana in particular, and butterflies in general, as models in ecological, evolutionary, and developmental genetics. PMID:16737530
Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar
2016-01-01
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design. PMID:27958331
Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar
2016-12-13
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S; Sinha, Saurabh
2011-12-01
Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, 'enhancers'), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for 'motif-blind' CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to 'supervise' the search. We propose a new statistical method, based on 'Interpolated Markov Models', for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. © The Author(s) 2011. Published by Oxford University Press.
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures
Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis
2008-01-01
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
Hassani-Pak, Keywan; Rawlings, Christopher
2017-06-13
Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
2012-01-01
Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids
2011-01-01
Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684
Identification of differentially expressed genes and false discovery rate in microarray studies.
Gusnanto, Arief; Calza, Stefano; Pawitan, Yudi
2007-04-01
To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.
Hosseinidoust, Zeinab
2017-01-01
Bacteriophages (bacterial viruses) have long been under investigation as vectors for gene therapy. Similar to other viral vectors, the phage coat proteins have evolved over millions of years to protect the viral genome from degradation post injection, offering protection for the valuable therapeutic sequence. However, what sets phage apart from other viral gene delivery vectors is their safety for human use and the relative ease by which foreign molecules can be expressed on the phage outer surface, enabling highly targeted gene delivery. The latter property also makes phage a popular choice for gene therapy target discovery through directed evolution. Although promising, phage-mediated gene therapy faces several outstanding challenges, the most notable being lower gene delivery efficiency compared to animal viruses, vector stability, and nondesirable immune stimulation. This review presents a critical review of promises and challenges of employing phage as gene delivery vehicles as well as an introduction to the concept of phage-based microbiome therapy as the new frontier and perhaps the most promising application of phage-based gene therapy. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
2011-01-01
Background Technological advances are progressively increasing the application of genomics to a wider array of economically and ecologically important species. High-density maps enriched for transcribed genes facilitate the discovery of connections between genes and phenotypes. We report the construction of a high-density linkage map of expressed genes for the heterozygous genome of Eucalyptus using Single Feature Polymorphism (SFP) markers. Results SFP discovery and mapping was achieved using pseudo-testcross screening and selective mapping to simultaneously optimize linkage mapping and microarray costs. SFP genotyping was carried out by hybridizing complementary RNA prepared from 4.5 year-old trees xylem to an SFP array containing 103,000 25-mer oligonucleotide probes representing 20,726 unigenes derived from a modest size expressed sequence tags collection. An SFP-mapping microarray with 43,777 selected candidate SFP probes representing 15,698 genes was subsequently designed and used to genotype SFPs in a larger subset of the segregating population drawn by selective mapping. A total of 1,845 genes were mapped, with 884 of them ordered with high likelihood support on a framework map anchored to 180 microsatellites with average density of 1.2 cM. Using more probes per unigene increased by two-fold the likelihood of detecting segregating SFPs eventually resulting in more genes mapped. In silico validation showed that 87% of the SFPs map to the expected location on the 4.5X draft sequence of the Eucalyptus grandis genome. Conclusions The Eucalyptus 1,845 gene map is the most highly enriched map for transcriptional information for any forest tree species to date. It represents a major improvement on the number of genes previously positioned on Eucalyptus maps and provides an initial glimpse at the gene space for this global tree genome. A general protocol is proposed to build high-density transcript linkage maps in less characterized plant species by SFP genotyping with a concurrent objective of reducing microarray costs. HIgh-density gene-rich maps represent a powerful resource to assist gene discovery endeavors when used in combination with QTL and association mapping and should be especially valuable to assist the assembly of reference genome sequences soon to come for several plant and animal species. PMID:21492453
Yeast as a tool to identify anti-aging compounds
Zimmermann, Andreas; Hofer, Sebastian; Pendl, Tobias; Kainz, Katharina; Madeo, Frank; Carmona-Gutierrez, Didac
2018-01-01
Abstract In the search for interventions against aging and age-related diseases, biological screening platforms are indispensable tools to identify anti-aging compounds among large substance libraries. The budding yeast, Saccharomyces cerevisiae, has emerged as a powerful chemical and genetic screening platform, as it combines a rapid workflow with experimental amenability and the availability of a wide range of genetic mutant libraries. Given the amount of conserved genes and aging mechanisms between yeast and human, testing candidate anti-aging substances in yeast gene-deletion or overexpression collections, or de novo derived mutants, has proven highly successful in finding potential molecular targets. Yeast-based studies, for example, have led to the discovery of the polyphenol resveratrol and the natural polyamine spermidine as potential anti-aging agents. Here, we present strategies for pharmacological anti-aging screens in yeast, discuss common pitfalls and summarize studies that have used yeast for drug discovery and target identification. PMID:29905792
RNA Interference for Functional Genomics and Improvement of Cotton (Gossypium sp.)
Abdurakhmonov, Ibrokhim Y.; Ayubov, Mirzakamol S.; Ubaydullaeva, Khurshida A.; Buriev, Zabardast T.; Shermatov, Shukhrat E.; Ruziboev, Haydarali S.; Shapulatov, Umid M.; Saha, Sukumar; Ulloa, Mauricio; Yu, John Z.; Percy, Richard G.; Devor, Eric J.; Sharma, Govind C.; Sripathi, Venkateswara R.; Kumpatla, Siva P.; van der Krol, Alexander; Kater, Hake D.; Khamidov, Khakimdjan; Salikhov, Shavkat I.; Jenkins, Johnie N.; Abdukarimov, Abdusattor; Pepper, Alan E.
2016-01-01
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium sp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function and biological roles of many key cotton genes involved in fiber development, fertility and somatic embryogenesis, resistance to important biotic and abiotic stresses, and oil and seed quality improvements as well as the key agronomic traits including yield and maturity. Here, we have comparatively reviewed seminal research efforts in previously used antisense approaches and currently applied breakthrough RNAi studies in cotton, analyzing developed RNAi methodologies, achievements, limitations, and future needs in functional characterizations of cotton genes. We also highlighted needed efforts in the development of RNAi-based cotton cultivars, and their safety and risk assessment, small and large-scale field trials, and commercialization. PMID:26941765
RNA Interference for Functional Genomics and Improvement of Cotton (Gossypium sp.).
Abdurakhmonov, Ibrokhim Y; Ayubov, Mirzakamol S; Ubaydullaeva, Khurshida A; Buriev, Zabardast T; Shermatov, Shukhrat E; Ruziboev, Haydarali S; Shapulatov, Umid M; Saha, Sukumar; Ulloa, Mauricio; Yu, John Z; Percy, Richard G; Devor, Eric J; Sharma, Govind C; Sripathi, Venkateswara R; Kumpatla, Siva P; van der Krol, Alexander; Kater, Hake D; Khamidov, Khakimdjan; Salikhov, Shavkat I; Jenkins, Johnie N; Abdukarimov, Abdusattor; Pepper, Alan E
2016-01-01
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium sp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function and biological roles of many key cotton genes involved in fiber development, fertility and somatic embryogenesis, resistance to important biotic and abiotic stresses, and oil and seed quality improvements as well as the key agronomic traits including yield and maturity. Here, we have comparatively reviewed seminal research efforts in previously used antisense approaches and currently applied breakthrough RNAi studies in cotton, analyzing developed RNAi methodologies, achievements, limitations, and future needs in functional characterizations of cotton genes. We also highlighted needed efforts in the development of RNAi-based cotton cultivars, and their safety and risk assessment, small and large-scale field trials, and commercialization.
Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; ...
2015-07-14
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.« less
A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model
2017-01-01
The discovery of cis-regulatory modules (CRMs) is the key to understanding mechanisms of transcription regulation. Since CRMs have specific regulatory structures that are the basis for the regulation of gene expression, how to model the regulatory structure of CRMs has a considerable impact on the performance of CRM identification. The paper proposes a CRM discovery algorithm called ComSPS. ComSPS builds a regulatory structure model of CRMs based on HMM by exploring the rules of CRM transcriptional grammar that governs the internal motif site arrangement of CRMs. We test ComSPS on three benchmark datasets and compare it with five existing methods. Experimental results show that ComSPS performs better than them. PMID:28497059
Mapping Quantitative Field Resistance Against Apple Scab in a 'Fiesta' x 'Discovery' Progeny.
Liebhard, R; Koller, B; Patocchi, A; Kellerhals, M; Pfammatter, W; Jermini, M; Gessler, C
2003-04-01
ABSTRACT Breeding of resistant apple cultivars (Malus x domestica) as a disease management strategy relies on the knowledge and understanding of the underlying genetics. The availability of molecular markers and genetic linkage maps enables the detection and the analysis of major resistance genes as well as of quantitative trait loci (QTL) contributing to the resistance of a genotype. Such a genetic linkage map was constructed, based on a segregating population of the cross between apple cvs. Fiesta (syn. Red Pippin) and Discovery. The progeny was observed for 3 years at three different sites in Switzerland and field resistance against apple scab (Venturia inaequalis) was assessed. Only a weak correlation was detected between leaf scab and fruit scab. A QTL analysis was performed, based on the genetic linkage map consisting of 804 molecular markers and covering all 17 chromosomes of apple. With the maximum likelihood-based interval mapping method, eight genomic regions were identified, six conferring resistance against leaf scab and two conferring fruit scab resistance. Although cv. Discovery showed a much stronger resistance against scab in the field, most QTL identified were attributed to the more susceptible parent 'Fiesta'. This indicated a high degree of homozygosity at the scab resistance loci in 'Discovery', preventing their detection in the progeny due to the lack of segregation.
Zhang, Minlu; Zhu, Cheng; Jacomy, Alexis; Lu, Long J.; Jegga, Anil G.
2011-01-01
The low prevalence rate of orphan diseases (OD) requires special combined efforts to improve diagnosis, prevention, and discovery of novel therapeutic strategies. To identify and investigate relationships based on shared genes or shared functional features, we have conducted a bioinformatic-based global analysis of all orphan diseases with known disease-causing mutant genes. Starting with a bipartite network of known OD and OD-causing mutant genes and using the human protein interactome, we first construct and topologically analyze three networks: the orphan disease network, the orphan disease-causing mutant gene network, and the orphan disease-causing mutant gene interactome. Our results demonstrate that in contrast to the common disease-causing mutant genes that are predominantly nonessential, a majority of orphan disease-causing mutant genes are essential. In confirmation of this finding, we found that OD-causing mutant genes are topologically important in the protein interactome and are ubiquitously expressed. Additionally, functional enrichment analysis of those genes in which mutations cause ODs shows that a majority result in premature death or are lethal in the orthologous mouse gene knockout models. To address the limitations of traditional gene-based disease networks, we also construct and analyze OD networks on the basis of shared enriched features (biological processes, cellular components, pathways, phenotypes, and literature citations). Analyzing these functionally-linked OD networks, we identified several additional OD-OD relations that are both phenotypically similar and phenotypically diverse. Surprisingly, we observed that the wiring of the gene-based and other feature-based OD networks are largely different; this suggests that the relationship between ODs cannot be fully captured by the gene-based network alone. PMID:21664998
Discovery of phosphonic acid natural products by mining the genomes of 10,000 actinomycetes.
Ju, Kou-San; Gao, Jiangtao; Doroghazi, James R; Wang, Kwo-Kwang A; Thibodeaux, Christopher J; Li, Steven; Metzger, Emily; Fudala, John; Su, Joleen; Zhang, Jun Kai; Lee, Jaeheon; Cioni, Joel P; Evans, Bradley S; Hirota, Ryuichi; Labeda, David P; van der Donk, Wilfred A; Metcalf, William W
2015-09-29
Although natural products have been a particularly rich source of human medicines, activity-based screening results in a very high rate of rediscovery of known molecules. Based on the large number of natural product biosynthetic genes in microbial genomes, many have proposed "genome mining" as an alternative approach for discovery efforts; however, this idea has yet to be performed experimentally on a large scale. Here, we demonstrate the feasibility of large-scale, high-throughput genome mining by screening a collection of over 10,000 actinomycetes for the genetic potential to make phosphonic acids, a class of natural products with diverse and useful bioactivities. Genome sequencing identified a diverse collection of phosphonate biosynthetic gene clusters within 278 strains. These clusters were classified into 64 distinct groups, of which 55 are likely to direct the synthesis of unknown compounds. Characterization of strains within five of these groups resulted in the discovery of a new archetypical pathway for phosphonate biosynthesis, the first (to our knowledge) dedicated pathway for H-phosphinates, and 11 previously undescribed phosphonic acid natural products. Among these compounds are argolaphos, a broad-spectrum antibacterial phosphonopeptide composed of aminomethylphosphonate in peptide linkage to a rare amino acid N(5)-hydroxyarginine; valinophos, an N-acetyl l-Val ester of 2,3-dihydroxypropylphosphonate; and phosphonocystoximate, an unusual thiohydroximate-containing molecule representing a new chemotype of sulfur-containing phosphonate natural products. Analysis of the genome sequences from the remaining strains suggests that the majority of the phosphonate biosynthetic repertoire of Actinobacteria has been captured at the gene level. This dereplicated strain collection now provides a reservoir of numerous, as yet undiscovered, phosphonate natural products.
Discovery of phosphonic acid natural products by mining the genomes of 10,000 actinomycetes
Ju, Kou-San; Gao, Jiangtao; Doroghazi, James R.; Wang, Kwo-Kwang A.; Thibodeaux, Christopher J.; Li, Steven; Metzger, Emily; Fudala, John; Su, Joleen; Zhang, Jun Kai; Lee, Jaeheon; Cioni, Joel P.; Evans, Bradley S.; Hirota, Ryuichi; Labeda, David P.; van der Donk, Wilfred A.; Metcalf, William W.
2015-01-01
Although natural products have been a particularly rich source of human medicines, activity-based screening results in a very high rate of rediscovery of known molecules. Based on the large number of natural product biosynthetic genes in microbial genomes, many have proposed “genome mining” as an alternative approach for discovery efforts; however, this idea has yet to be performed experimentally on a large scale. Here, we demonstrate the feasibility of large-scale, high-throughput genome mining by screening a collection of over 10,000 actinomycetes for the genetic potential to make phosphonic acids, a class of natural products with diverse and useful bioactivities. Genome sequencing identified a diverse collection of phosphonate biosynthetic gene clusters within 278 strains. These clusters were classified into 64 distinct groups, of which 55 are likely to direct the synthesis of unknown compounds. Characterization of strains within five of these groups resulted in the discovery of a new archetypical pathway for phosphonate biosynthesis, the first (to our knowledge) dedicated pathway for H-phosphinates, and 11 previously undescribed phosphonic acid natural products. Among these compounds are argolaphos, a broad-spectrum antibacterial phosphonopeptide composed of aminomethylphosphonate in peptide linkage to a rare amino acid N5-hydroxyarginine; valinophos, an N-acetyl l-Val ester of 2,3-dihydroxypropylphosphonate; and phosphonocystoximate, an unusual thiohydroximate-containing molecule representing a new chemotype of sulfur-containing phosphonate natural products. Analysis of the genome sequences from the remaining strains suggests that the majority of the phosphonate biosynthetic repertoire of Actinobacteria has been captured at the gene level. This dereplicated strain collection now provides a reservoir of numerous, as yet undiscovered, phosphonate natural products. PMID:26324907
Translocations in epithelial cancers
Chad Brenner, J.; Chinnaiyan, Arul M.
2009-01-01
Genomic translocations leading to the expression of chimeric transcripts characterize several hematologic, mesenchymal and epithelial malignancies. While several gene fusions have been linked to essential molecular events in hematologic malignancies, the identification and characterization of recurrent chimeric transcripts in epithelial cancers has been limited. However, the recent discovery of the recurrent gene fusions in prostate cancer has sparked a revitalization of the quest to identify novel rearrangements in epithelial malignancies. Here, the molecular mechanisms of gene fusions that drive several epithelial cancers and the recent technological advances that increase the speed and reliability of recurrent gene fusion discovery are explored. PMID:19406209
Wang, Yanjie; Dong, Chunlan; Xue, Zeyun; Jin, Qijiang; Xu, Yingchun
2016-01-15
Paeonia ostii, an important ornamental and medicinal plant, grows normally on copper (Cu) mines with widespread Cu contamination of soils, and it has the ability to lower Cu contents in the Cu-contaminated soils. However, very little molecular information concerned with Cu resistance of P. ostii is available. In this study, high-throughput de novo transcriptome sequencing was carried out for P. ostii with and without Cu treatment using Illumina HiSeq 2000 platform. A total of 77,704 All-unigenes were obtained with a mean length of 710 bp. Of these unigenes, 47,461 were annotated with public databases based on sequence similarities. Comparative transcript profiling allowed the discovery of 4324 differentially expressed genes (DEGs), with 2207 up-regulated and 2117 down-regulated unigenes in Cu-treated library as compared to the control counterpart. Based on these DEGs, Gene Ontology (GO) enrichment analysis indicated Cu stress-relevant terms, such as 'membrane' and 'antioxidant activity'. Meanwhile, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis uncovered some important pathways, including 'biosynthesis of secondary metabolites' and 'metabolic pathways'. In addition, expression patterns of 12 selected DEGs derived from quantitative real-time polymerase chain reaction (qRT-PCR) were consistent with their transcript abundance changes obtained by transcriptomic analyses, suggesting that all the 12 genes were authentically involved in Cu tolerance in P. ostii. This is the first report to identify genes related to Cu stress responses in P. ostii, which could offer valuable information on the molecular mechanisms of Cu resistance, and provide a basis for further genomics research on this and related ornamental species for phytoremediation. Copyright © 2015 Elsevier B.V. All rights reserved.
Devlin, Joseph C; Battaglia, Thomas; Blaser, Martin J; Ruggles, Kelly V
2018-06-25
Exploration of large data sets, such as shotgun metagenomic sequence or expression data, by biomedical experts and medical professionals remains as a major bottleneck in the scientific discovery process. Although tools for this purpose exist for 16S ribosomal RNA sequencing analysis, there is a growing but still insufficient number of user-friendly interactive visualization workflows for easy data exploration and figure generation. The development of such platforms for this purpose is necessary to accelerate and streamline microbiome laboratory research. We developed the Workflow Hub for Automated Metagenomic Exploration (WHAM!) as a web-based interactive tool capable of user-directed data visualization and statistical analysis of annotated shotgun metagenomic and metatranscriptomic data sets. WHAM! includes exploratory and hypothesis-based gene and taxa search modules for visualizing differences in microbial taxa and gene family expression across experimental groups, and for creating publication quality figures without the need for command line interface or in-house bioinformatics. WHAM! is an interactive and customizable tool for downstream metagenomic and metatranscriptomic analysis providing a user-friendly interface allowing for easy data exploration by microbiome and ecological experts to facilitate discovery in multi-dimensional and large-scale data sets.
Shaheen, Ranad; Al Tala, Saeed; Almoisheer, Agaadir; Alkuraya, Fowzan S
2014-12-01
Primordial dwarfism (PD) is a heterogeneous clinical entity characterised by severe prenatal and postnatal growth deficiency. Despite the recent wave of disease gene discovery, the causal mutations in many PD patients remain unknown. To describe a PD family that maps to a novel locus. Clinical, imaging and laboratory phenotyping of a new family with PD followed by autozygosity mapping, linkage analysis and candidate gene sequencing. We describe a multiplex consanguineous Saudi family in which two full siblings and one half-sibling presented with classical features of Seckel syndrome in addition to optic nerve hypoplasia. We were able to map the phenotype to a single novel locus on 4q25-q28.2, in which we identified a five base-pair deletion in PLK4, which encodes a master regulator of centriole duplication. Our discovery further confirms the role of genes involved in centriole biology in the pathogenesis of PD. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Janky, Rekin's; van Helden, Jacques
2008-01-23
The detection of conserved motifs in promoters of orthologous genes (phylogenetic footprints) has become a common strategy to predict cis-acting regulatory elements. Several software tools are routinely used to raise hypotheses about regulation. However, these tools are generally used as black boxes, with default parameters. A systematic evaluation of optimal parameters for a footprint discovery strategy can bring a sizeable improvement to the predictions. We evaluate the performances of a footprint discovery approach based on the detection of over-represented spaced motifs. This method is particularly suitable for (but not restricted to) Bacteria, since such motifs are typically bound by factors containing a Helix-Turn-Helix domain. We evaluated footprint discovery in 368 Escherichia coli K12 genes with annotated sites, under 40 different combinations of parameters (taxonomical level, background model, organism-specific filtering, operon inference). Motifs are assessed both at the levels of correctness and significance. We further report a detailed analysis of 181 bacterial orthologs of the LexA repressor. Distinct motifs are detected at various taxonomical levels, including the 7 previously characterized taxon-specific motifs. In addition, we highlight a significantly stronger conservation of half-motifs in Actinobacteria, relative to Firmicutes, suggesting an intermediate state in specificity switching between the two Gram-positive phyla, and thereby revealing the on-going evolution of LexA auto-regulation. The footprint discovery method proposed here shows excellent results with E. coli and can readily be extended to predict cis-acting regulatory signals and propose testable hypotheses in bacterial genomes for which nothing is known about regulation.
An Investigative Graduate Laboratory Course for Teaching Modern DNA Techniques
ERIC Educational Resources Information Center
de Lencastre, Alexandre; Torello, A. Thomas; Keller, Lani C.
2017-01-01
This graduate-level DNA methods laboratory course is designed to model a discovery-based research project and engages students in both traditional DNA analysis methods and modern recombinant DNA cloning techniques. In the first part of the course, students clone the "Drosophila" ortholog of a human disease gene of their choosing using…
Discovery of phosphonic acid natural products by mining the genomes of 10,000 actinomycetes
USDA-ARS?s Scientific Manuscript database
Although natural products have been a particularly rich source of human medicines, the rate at which new molecules are being discovered is declining precipitously. Based on the large number of natural product biosynthetic genes in microbial genomes, many have suggested “genome mining” as an approach...
Angus, Steve P.; Beauchamp, Roberta L.; Blakeley, Jaishri O.; Bott, Marga; Burns, Sarah S.; Carlstedt, Annemarie; Chang, Long-Sheng; Chen, Xin; Clapp, D. Wade; Desouza, Patrick A.; Erdin, Serkan; Fernandez-Valle, Cristina; Guinney, Justin; Gusella, James F.; Haggarty, Stephen J.; Johnson, Gary L.; Morrison, Helen; Petrilli, Alejandra M.; Plotkin, Scott R.; Pratap, Abhishek; Ramesh, Vijaya; Sciaky, Noah; Stemmer-Rachamimov, Anat; Stuhlmiller, Tim J.; Talkowski, Michael E.; Yates, Charles W.; Zawistowski, Jon S.; Zhao, Wen-Ning
2018-01-01
Neurofibromatosis 2 (NF2) is a rare tumor suppressor syndrome that manifests with multiple schwannomas and meningiomas. There are no effective drug therapies for these benign tumors and conventional therapies have limited efficacy. Various model systems have been created and several drug targets have been implicated in NF2-driven tumorigenesis based on known effects of the absence of merlin, the product of the NF2 gene. We tested priority compounds based on known biology with traditional dose-concentration studies in meningioma and schwann cell systems. Concurrently, we studied functional kinome and gene expression in these cells pre- and post-treatment to determine merlin deficient molecular phenotypes. Cell viability results showed that three agents (GSK2126458, Panobinostat, CUDC-907) had the greatest activity across schwannoma and meningioma cell systems, but merlin status did not significantly influence response. In vivo, drug effect was tumor specific with meningioma, but not schwannoma, showing response to GSK2126458 and Panobinostat. In culture, changes in both the transcriptome and kinome in response to treatment clustered predominantly based on tumor type. However, there were differences in both gene expression and functional kinome at baseline between meningioma and schwannoma cell systems that may form the basis for future selective therapies. This work has created an openly accessible resource (www.synapse.org/SynodosNF2) of fully characterized isogenic schwannoma and meningioma cell systems as well as a rich data source of kinome and transcriptome data from these assay systems before and after treatment that enables single and combination drug discovery based on molecular phenotype. PMID:29897904
Allaway, Robert; Angus, Steve P; Beauchamp, Roberta L; Blakeley, Jaishri O; Bott, Marga; Burns, Sarah S; Carlstedt, Annemarie; Chang, Long-Sheng; Chen, Xin; Clapp, D Wade; Desouza, Patrick A; Erdin, Serkan; Fernandez-Valle, Cristina; Guinney, Justin; Gusella, James F; Haggarty, Stephen J; Johnson, Gary L; La Rosa, Salvatore; Morrison, Helen; Petrilli, Alejandra M; Plotkin, Scott R; Pratap, Abhishek; Ramesh, Vijaya; Sciaky, Noah; Stemmer-Rachamimov, Anat; Stuhlmiller, Tim J; Talkowski, Michael E; Welling, D Bradley; Yates, Charles W; Zawistowski, Jon S; Zhao, Wen-Ning
2018-01-01
Neurofibromatosis 2 (NF2) is a rare tumor suppressor syndrome that manifests with multiple schwannomas and meningiomas. There are no effective drug therapies for these benign tumors and conventional therapies have limited efficacy. Various model systems have been created and several drug targets have been implicated in NF2-driven tumorigenesis based on known effects of the absence of merlin, the product of the NF2 gene. We tested priority compounds based on known biology with traditional dose-concentration studies in meningioma and schwann cell systems. Concurrently, we studied functional kinome and gene expression in these cells pre- and post-treatment to determine merlin deficient molecular phenotypes. Cell viability results showed that three agents (GSK2126458, Panobinostat, CUDC-907) had the greatest activity across schwannoma and meningioma cell systems, but merlin status did not significantly influence response. In vivo, drug effect was tumor specific with meningioma, but not schwannoma, showing response to GSK2126458 and Panobinostat. In culture, changes in both the transcriptome and kinome in response to treatment clustered predominantly based on tumor type. However, there were differences in both gene expression and functional kinome at baseline between meningioma and schwannoma cell systems that may form the basis for future selective therapies. This work has created an openly accessible resource (www.synapse.org/SynodosNF2) of fully characterized isogenic schwannoma and meningioma cell systems as well as a rich data source of kinome and transcriptome data from these assay systems before and after treatment that enables single and combination drug discovery based on molecular phenotype.
Kieburtz, Karl; Olanow, C Warren
2007-04-01
In the past decade, there has been an increasing emphasis on laboratory-based translational research. This has led to significant scientific advances in our understanding of disease mechanisms and in the development of novel approaches to therapy such as gene therapy, RNA interference, and stem cells. However, the translation of these remarkable scientific achievements into new and effective disease-modifying therapies has lagged behind these scientific accomplishments. We use the term "translational experimental therapeutics" to describe the pathway between the discovery of a basic disease mechanism or novel therapeutic approach and its translation into an effective treatment for patients with a specific disease. In this article, we review the components of this pathway, and discuss issues that might impede this process. Only by optimizing this pathway can we realize the full therapeutic potential of current scientific discoveries and translate the astounding advances that have been accomplished in the laboratory into effective treatments for our patients. Copyright (c) 2007 Mount Sinai School of Medicine.
Ansari, Morad; Balasubramanian, Meena; Blyth, Moira; Brady, Angela F.; Clayton, Stephen; Cole, Trevor; Deshpande, Charu; Fitzgerald, Tomas W.; Foulds, Nicola; Francis, Richard; Gabriel, George; Gerety, Sebastian S.; Goodship, Judith; Hobson, Emma; Jones, Wendy D.; Joss, Shelagh; King, Daniel; Klena, Nikolai; Kumar, Ajith; Lees, Melissa; Lelliott, Chris; Lord, Jenny; McMullan, Dominic; O'Regan, Mary; Osio, Deborah; Piombo, Virginia; Prigmore, Elena; Rajan, Diana; Rosser, Elisabeth; Sifrim, Alejandro; Smith, Audrey; Swaminathan, Ganesh J.; Turnpenny, Peter; Whitworth, James; Wright, Caroline F.; Firth, Helen V.; Barrett, Jeffrey C.; Lo, Cecilia W.; FitzPatrick, David R.; Hurles, Matthew E.
2018-01-01
Discovery of most autosomal recessive disease genes has involved analysis of large, often consanguineous, multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of novel dominant causes of rare, genetically heterogenous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios 1,2. Here we analysed 4,125 families with diverse, rare, genetically heterogeneous developmental disorders and identified four novel autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (identifying probands with rare biallelic putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population, and (ii) the phenotypic similarity of patients with the same recessive candidate gene. This new paradigm promises to catalyse discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations, and those caused predominantly by compound heterozygous genotypes. PMID:26437029
Akawi, Nadia; McRae, Jeremy; Ansari, Morad; Balasubramanian, Meena; Blyth, Moira; Brady, Angela F; Clayton, Stephen; Cole, Trevor; Deshpande, Charu; Fitzgerald, Tomas W; Foulds, Nicola; Francis, Richard; Gabriel, George; Gerety, Sebastian S; Goodship, Judith; Hobson, Emma; Jones, Wendy D; Joss, Shelagh; King, Daniel; Klena, Nikolai; Kumar, Ajith; Lees, Melissa; Lelliott, Chris; Lord, Jenny; McMullan, Dominic; O'Regan, Mary; Osio, Deborah; Piombo, Virginia; Prigmore, Elena; Rajan, Diana; Rosser, Elisabeth; Sifrim, Alejandro; Smith, Audrey; Swaminathan, Ganesh J; Turnpenny, Peter; Whitworth, James; Wright, Caroline F; Firth, Helen V; Barrett, Jeffrey C; Lo, Cecilia W; FitzPatrick, David R; Hurles, Matthew E
2015-11-01
Discovery of most autosomal recessive disease-associated genes has involved analysis of large, often consanguineous multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of new dominant causes of rare, genetically heterogeneous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios. Here we analyzed 4,125 families with diverse, rare and genetically heterogeneous developmental disorders and identified four new autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (selecting probands with rare, biallelic and putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population and (ii) the phenotypic similarity of patients with recessive variants in the same candidate gene. This new paradigm promises to catalyze the discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations and those caused predominantly by compound heterozygous genotypes.
Speth, Daan R; Lagkouvardos, Ilias; Wang, Yong; Qian, Pei-Yuan; Dutilh, Bas E; Jetten, Mike S M
2017-07-01
Several recent studies have indicated that members of the phylum Planctomycetes are abundantly present at the brine-seawater interface (BSI) above multiple brine pools in the Red Sea. Planctomycetes include bacteria capable of anaerobic ammonium oxidation (anammox). Here, we investigated the possibility of anammox at BSI sites using metagenomic shotgun sequencing of DNA obtained from the BSI above the Discovery Deep brine pool. Analysis of sequencing reads matching the 16S rRNA and hzsA genes confirmed presence of anammox bacteria of the genus Scalindua. Phylogenetic analysis of the 16S rRNA gene indicated that this Scalindua sp. belongs to a distinct group, separate from the anammox bacteria in the seawater column, that contains mostly sequences retrieved from high-salt environments. Using coverage- and composition-based binning, we extracted and assembled the draft genome of the dominant anammox bacterium. Comparative genomic analysis indicated that this Scalindua species uses compatible solutes for osmoadaptation, in contrast to other marine anammox bacteria that likely use a salt-in strategy. We propose the name Candidatus Scalindua rubra for this novel species, alluding to its discovery in the Red Sea.
Nudel, Ron; Newbury, Dianne F
2013-01-01
The forkhead box P2 gene, designated FOXP2, is the first gene implicated in a speech and language disorder. Since its discovery, many studies have been carried out in an attempt to explain the mechanism by which it influences these characteristically human traits. This review presents the story of the discovery of the FOXP2 gene, including early studies of the phenotypic implications of a disruption in the gene. We then discuss recent investigations into the molecular function of the FOXP2 gene, including functional and gene expression studies. We conclude this review by presenting the fascinating results of recent studies of the FOXP2 ortholog in other species that are capable of vocal communication. WIREs Cogn Sci 2013, 4:547–560. doi: 10.1002/wcs.1247 PMID:24765219
Network-Based Method for Identifying Co-Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues
Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Cai, Yu-Dong
2017-01-01
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method. PMID:28974058
Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong
2017-10-02
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.
PDPR Gene Expression Correlates with Exercise-Training Insulin Sensitivity Changes
Barberio, Matthew D.; Huffman, Kim M.; Giri, Mamta; Hoffman, Eric P.; Kraus, William E.; Hubal, Monica J.
2016-01-01
Purpose Whole body insulin sensitivity (Si) typically improves following aerobic exercise training; however, individual responses can be highly variable. The purpose of this study was to use global gene expression to identify skeletal muscle genes that correlate with exercise-induced Si changes. Methods Longitudinal cohorts from the Studies of Targeted Risk Reduction Intervention through Defined Exercise (STRRIDE) were utilized as Discovery (Affymetrix) and Confirmation (Illumina) of vastus lateralis gene expression profiles. Discovery (n=39; 21 men) and Confirmation (n=42; 19 men) cohorts were matched for age (52 ± 8 vs. 51 ± 10 yr), BMI (30.4 ± 2.8 vs. 29.7 ± 2.8 kg*m-2), and VO2max (30.4 ± 2.8 vs. 29.7 ± 2.8 mL/kg/min). Si was determined via intravenous glucose tolerance test pre- and post-training. Pearson product-moment correlation coefficients determined relationships between a) baseline and b) training-induced changes in gene expression and %ΔSi after training. Results Expression of 2454 (Discovery) and 1778 genes (Confirmation) at baseline were significantly (P<0.05) correlated to %ΔSi; 112 genes overlapped. Pathway analyses identified Ca2+-signaling-related transcripts in this 112-gene list. Expression changes of 1384 (Discovery) and 1288 genes (Confirmation) following training were significantly (P<0.05) correlated to % ΔSi; 33 genes overlapped, representing contractile apparatus of skeletal and smooth muscle genes. Pyruvate dehydrogenase phosphatase regulatory subunit (PDPR) expression at baseline (p=0.01, r=0.41) and post-training (p=0.01, r=0.43) were both correlated with %ΔSi. Conclusion Exercise-induced adaptations in skeletal muscle Si are related to baseline levels of Ca+2-regulating transcripts, which may prime the muscle for adaptation. Relationships between %ΔSi and PDPR, a regulatory subunit of the pyruvate dehydrogenase complex, indicate that the Si response is strongly related to key steps in metabolic regulation. PMID:27846149
Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard
2014-01-01
Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific information about genes or microRNAs is quick and easily accessible. Hence, this platform can support the ongoing OS research and biomarker discovery. Database URL: http://osteosarcoma-db.uni-muenster.de. © The Author(s) 2014. Published by Oxford University Press.
Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard
2014-01-01
Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific information about genes or microRNAs is quick and easily accessible. Hence, this platform can support the ongoing OS research and biomarker discovery. Database URL: http://osteosarcoma-db.uni-muenster.de PMID:24865352
Bianco, Luca; Riccadonna, Samantha; Lavezzo, Enrico; Falda, Marco; Formentin, Elide; Cavalieri, Duccio; Toppo, Stefano; Fontana, Paolo
2017-02-01
Pathway Inspector is an easy-to-use web application helping researchers to find patterns of expression in complex RNAseq experiments. The tool combines two standard approaches for RNAseq analysis: the identification of differentially expressed genes and a topology-based analysis of enriched pathways. Pathway Inspector is equipped with ad hoc interactive graphical interfaces simplifying the discovery of modulated pathways and the integration of the differentially expressed genes in the corresponding pathway topology. Pathway Inspector is available at the website http://admiral.fmach.it/PI and has been developed in Python, making use of the Django Web Framework. Contact:paolo.fontana@fmach.it
A Hybrid Computational Method for the Discovery of Novel Reproduction-Related Genes
Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong
2015-01-01
Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations. PMID:25768094
A hybrid computational method for the discovery of novel reproduction-related genes.
Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong
2015-01-01
Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations.
Biomimicry as a basis for drug discovery.
Kolb, V M
1998-01-01
Selected works are discussed which clearly demonstrate that mimicking various aspects of the process by which natural products evolved is becoming a powerful tool in contemporary drug discovery. Natural products are an established and rich source of drugs. The term "natural product" is often used synonymously with "secondary metabolite." Knowledge of genetics and molecular evolution helps us understand how biosynthesis of many classes of secondary metabolites evolved. One proposed hypothesis is termed "inventive evolution." It invokes duplication of genes, and mutation of the gene copies, among other genetic events. The modified duplicate genes, per se or in conjunction with other genetic events, may give rise to new enzymes, which, in turn, may generate new products, some of which may be selected for. Steps of the inventive evolution can be mimicked in several ways for purpose of drug discovery. For example, libraries of chemical compounds of any imaginable structure may be produced by combinatorial synthesis. Out of these libraries new active compounds can be selected. In another example, genetic system can be manipulated to produce modified natural products ("unnatural natural products"), from which new drugs can be selected. In some instances, similar natural products turn up in species that are not direct descendants of each other. This is presumably due to a horizontal gene transfer. The mechanism of this inter-species gene transfer can be mimicked in therapeutic gene delivery. Mimicking specifics or principles of chemical evolution including experimental and test-tube evolution also provides leads for new drug discovery.
Nam, Seungyoon
2017-04-01
Cancer transcriptome analysis is one of the leading areas of Big Data science, biomarker, and pharmaceutical discovery, not to forget personalized medicine. Yet, cancer transcriptomics and postgenomic medicine require innovation in bioinformatics as well as comparison of the performance of available algorithms. In this data analytics context, the value of network generation and algorithms has been widely underscored for addressing the salient questions in cancer pathogenesis. Analysis of cancer trancriptome often results in complicated networks where identification of network modularity remains critical, for example, in delineating the "druggable" molecular targets. Network clustering is useful, but depends on the network topology in and of itself. Notably, the performance of different network-generating tools for network cluster (NC) identification has been little investigated to date. Hence, using gastric cancer (GC) transcriptomic datasets, we compared two algorithms for generating pathway versus gene regulatory network-based NCs, showing that the pathway-based approach better agrees with a reference set of cancer-functional contexts. Finally, by applying pathway-based NC identification to GC transcriptome datasets, we describe cancer NCs that associate with candidate therapeutic targets and biomarkers in GC. These observations collectively inform future research on cancer transcriptomics, drug discovery, and rational development of new analysis tools for optimal harnessing of omics data.
Gene Expression Signatures Based on Variability can Robustly Predict Tumor Progression and Prognosis
Dinalankara, Wikum; Bravo, Héctor Corrada
2015-01-01
Gene expression signatures are commonly used to create cancer prognosis and diagnosis methods, yet only a small number of them are successfully deployed in the clinic since many fail to replicate performance on subsequent validation. A primary reason for this lack of reproducibility is the fact that these signatures attempt to model the highly variable and unstable genomic behavior of cancer. Our group recently introduced gene expression anti-profiles as a robust methodology to derive gene expression signatures based on the observation that while gene expression measurements are highly heterogeneous across tumors of a specific cancer type relative to the normal tissue, their degree of deviation from normal tissue expression in specific genes involved in tissue differentiation is a stable tumor mark that is reproducible across experiments and cancer types. Here we show that constructing gene expression signatures based on variability and the anti-profile approach yields classifiers capable of successfully distinguishing benign growths from cancerous growths based on deviation from normal expression. We then show that this same approach generates stable and reproducible signatures that predict probability of relapse and survival based on tumor gene expression. These results suggest that using the anti-profile framework for the discovery of genomic signatures is an avenue leading to the development of reproducible signatures suitable for adoption in clinical settings. PMID:26078586
Weigt, S Samuel; Wang, Xiaoyan; Palchevskiy, Vyacheslav; Patel, Naman; Derhovanessian, Ariss; Shino, Michael Y; Sayah, David M; Lynch, Joseph P; Saggar, Rajan; Ross, David J; Kubak, Bernie M; Ardehali, Abbas; Palmer, Scott; Husain, Shahid; Belperio, John A
2018-06-01
Aspergillus colonization after lung transplant is associated with an increased risk of chronic lung allograft dysfunction (CLAD). We hypothesized that gene expression during Aspergillus colonization could provide clues to CLAD pathogenesis. We examined transcriptional profiles in 3- or 6-month surveillance bronchoalveolar lavage fluid cell pellets from recipients with Aspergillus fumigatus colonization (n = 12) and without colonization (n = 10). Among the Aspergillus colonized, we also explored profiles in those who developed CLAD (n = 6) or remained CLAD-free (n = 6). Transcription profiles were assayed with the HG-U133 Plus 2.0 microarray (Affymetrix). Differential gene expression was based on an absolute fold difference of 2.0 or greater and unadjusted P value less than 0.05. We used NIH Database for Annotation, Visualization and Integrated Discovery for functional analyses, with false discovery rates less than 5% considered significant. Aspergillus colonization was associated with differential expression of 489 probe sets, representing 404 unique genes. "Defense response" genes and genes in the "cytokine-cytokine receptor" Kyoto Encyclopedia of Genes and Genomes pathway were notably enriched in this list. Among Aspergillus colonized patients, CLAD development was associated with differential expression of 69 probe sets, representing 64 unique genes. This list was enriched for genes involved in "immune response" and "response to wounding", among others. Notably, both chitinase 3-like-1 and chitotriosidase were associated with progression to CLAD. Aspergillus colonization is associated with gene expression profiles related to defense responses including cytokine signaling. Epithelial wounding, as well as the innate immune response to chitin that is present in the fungal cell wall, may be key in the link between Aspergillus colonization and CLAD.
Schneider, Susan M
2007-01-01
Nature–nurture views that smack of genetic determinism remain prevalent. Yet, the increasing knowledge base shows ever more clearly that environmental factors and genes form a fully interactional system at all levels. Moore's book covers the major topics of discovery and dispute, including behavior genetics and the twin studies, developmental psychobiology, and developmental systems theory. Knowledge of this larger life-sciences context for behavior principles will become increasingly important as the full complexity of gene–environment relations is revealed. Behavior analysis both contributes to and gains from the larger battle for the recognition of how nature and nurture really work.
Exploring Wound-Healing Genomic Machinery with a Network-Based Approach
Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo
2017-01-01
The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674
2012-12-05
Bisgaier J, Levinson D, Cutts DB, & Rhodes KV., (2011) Access to autism evaluation appointments with developmental-behavioral and neurodevelopmental ...W403 Columbus, OH 43205 Final Report Comprehensive Clinical Phenotyping & Genetic Mapping for the Discovery of Autism Susceptibility Genes...QFOXGHDUHDFRGH 1.0 Summary In 2006, the Central Ohio Registry for Autism (CORA) was initiated as a collaboration between Wright-Patterson Air
Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Rosenthal, Nadia A; Kitano, Hiroaki; Boyd, Sarah E
2015-05-01
Existing de novo software platforms have largely overlooked a valuable resource, the expertise of the intended biologist users. Typical data representations such as long gene lists, or highly dense and overlapping transcription factor networks often hinder biologists from relating these results to their expertise. VISIONET, a streamlined visualisation tool built from experimental needs, enables biologists to transform large and dense overlapping transcription factor networks into sparse human-readable graphs via numerically filtering. The VISIONET interface allows users without a computing background to interactively explore and filter their data, and empowers them to apply their specialist knowledge on far more complex and substantial data sets than is currently possible. Applying VISIONET to the Tbx20-Gata4 transcription factor network led to the discovery and validation of Aldh1a2, an essential developmental gene associated with various important cardiac disorders, as a healthy adult cardiac fibroblast gene co-regulated by cardiogenic transcription factors Gata4 and Tbx20. We demonstrate with experimental validations the utility of VISIONET for expertise-driven gene discovery that opens new experimental directions that would not otherwise have been identified.
Gelernter, Joel; Sherva, Richard; Koesterer, Ryan; Almasy, Laura; Zhao, Hongyu; Kranzler, Henry R.; Farrer, Lindsay
2013-01-01
We report a GWAS for cocaine dependence (CD) in three sets of African- and European-American subjects (AAs and EAs, respectively), to identify pathways, genes, and alleles important in CD risk. The discovery GWAS dataset (n=5,697 subjects) was genotyped using the Illumina OmniQuad microarray (890,000 analyzed SNPs). Additional genotypes were imputed based on the 1000 Genomes reference panel. Top-ranked findings were evaluated by incorporating information from publicly available GWAS data from 4,063 subjects. Then, the most significant GWAS SNPs were genotyped in 2,549 independent subjects. We observed one genomewide-significant (GWS) result: rs7086629 at the FAM53B (“family with sequence similarity 53, member B”) locus. This was supported in both AAs and EAs; p-value (meta-analysis of all samples) =4.28×10−8. The gene maps to the same chromosomal region as the maximum peak we observed in a previous linkage study. NCOR2 (nuclear receptor corepressor 1) SNP rs150954431 was associated with p=1.19×10−9 in the EA discovery sample. SNP rs2456778, which maps to CDK1 (“cyclin-dependent kinase 1”), was associated with cocaine-induced paranoia in AAs in the discovery sample only (p=4.68×10−8). This is the first study to identify risk variants for CD using GWAS. Our results implicate novel risk loci and provide insights into potential therapeutic and prevention strategies. PMID:23958962
Gene Patents and Personalized Cancer Care: Impact of the Myriad Case on Clinical Oncology
Offit, Kenneth; Bradbury, Angela; Storm, Courtney; Merz, Jon F.; Noonan, Kevin E.; Spence, Rebecca
2013-01-01
Genomic discoveries have transformed the practice of oncology and cancer prevention. Diagnostic and therapeutic advances based on cancer genomics developed during a time when it was possible to patent genes. A case before the Supreme Court, Association for Molecular Pathology v Myriad Genetics, Inc seeks to overturn patents on isolated genes. Although the outcomes are uncertain, it is suggested here that the Supreme Court decision will have few immediate effects on oncology practice or research but may have more significant long-term impact. The Federal Circuit court has already rejected Myriad's broad diagnostic methods claims, and this is not affected by the Supreme Court decision. Isolated DNA patents were already becoming obsolete on scientific grounds, in an era when human DNA sequence is public knowledge and because modern methods of next-generation sequencing need not involve isolated DNA. The Association for Molecular Pathology v Myriad Supreme Court decision will have limited impact on new drug development, as new drug patents usually involve cellular methods. A nuanced Supreme Court decision acknowledging the scientific distinction between synthetic cDNA and genomic DNA will further mitigate any adverse impact. A Supreme Court decision to include or exclude all types of DNA from patent eligibility could impact future incentives for genomic discovery as well as the future delivery of medical care. Whatever the outcome of this important case, it is important that judicial and legislative actions in this area maximize genomic discovery while also ensuring patients' access to personalized cancer care. PMID:23766521
Song, J; Doucette, C; Hanniford, D; Hunady, K; Wang, N; Sherf, B; Harrington, J J; Brunden, K R; Stricker-Krongrad, A
2005-06-01
Target-based high-throughput screening (HTS) plays an integral role in drug discovery. The implementation of HTS assays generally requires high expression levels of the target protein, and this is typically accomplished using recombinant cDNA methodologies. However, the isolated gene sequences to many drug targets have intellectual property claims that restrict the ability to implement drug discovery programs. The present study describes the pharmacological characterization of the human histamine H3 receptor that was expressed using random activation of gene expression (RAGE), a technology that over-expresses proteins by up-regulating endogenous genes rather than introducing cDNA expression vectors into the cell. Saturation binding analysis using [125I]iodoproxyfan and RAGE-H3 membranes revealed a single class of binding sites with a K(D) value of 0.77 nM and a B(max) equal to 756 fmol/mg of protein. Competition binding studies showed that the rank order of potency for H3 agonists was N(alpha)-methylhistamine approximately (R)-alpha- methylhistamine > histamine and that the rank order of potency for H3 antagonists was clobenpropit > iodophenpropit > thioperamide. The same rank order of potency for H3 agonists and antagonists was observed in the functional assays as in the binding assays. The Fluorometic Imaging Plate Reader assays in RAGE-H3 cells gave high Z' values for agonist and antagonist screening, respectively. These results reveal that the human H3 receptor expressed with the RAGE technology is pharmacologically comparable to that expressed through recombinant methods. Moreover, the level of expression of the H3 receptor in the RAGE-H3 cells is suitable for HTS and secondary assays.
Genome network medicine: innovation to overcome huge challenges in cancer therapy.
Roukos, Dimitrios H
2014-01-01
The post-ENCODE era shapes now a new biomedical research direction for understanding transcriptional and signaling networks driving gene expression and core cellular processes such as cell fate, survival, and apoptosis. Over the past half century, the Francis Crick 'central dogma' of single n gene/protein-phenotype (trait/disease) has defined biology, human physiology, disease, diagnostics, and drugs discovery. However, the ENCODE project and several other genomic studies using high-throughput sequencing technologies, computational strategies, and imaging techniques to visualize regulatory networks, provide evidence that transcriptional process and gene expression are regulated by highly complex dynamic molecular and signaling networks. This Focus article describes the linear experimentation-based limitations of diagnostics and therapeutics to cure advanced cancer and the need to move on from reductionist to network-based approaches. With evident a wide genomic heterogeneity, the power and challenges of next-generation sequencing (NGS) technologies to identify a patient's personal mutational landscape for tailoring the best target drugs in the individual patient are discussed. However, the available drugs are not capable of targeting aberrant signaling networks and research on functional transcriptional heterogeneity and functional genome organization is poorly understood. Therefore, the future clinical genome network medicine aiming at overcoming multiple problems in the new fields of regulatory DNA mapping, noncoding RNA, enhancer RNAs, and dynamic complexity of transcriptional circuitry are also discussed expecting in new innovation technology and strong appreciation of clinical data and evidence-based medicine. The problematic and potential solutions in the discovery of next-generation, molecular, and signaling circuitry-based biomarkers and drugs are explored. © 2013 Wiley Periodicals, Inc.
The Biotechnology Revolution: Distinguishing Fact from Fantasy and Folly?
ERIC Educational Resources Information Center
Edmondston, Joanne
2000-01-01
Biotechnology and its applications, such as the discovery of DNA used in the identification of genes, are now having significant impact on everyday life. Discusses the impacts of DNA technology and genetic modification practices. Introduces the Human Genome Project whose aim is to determine the order of each of the 3.3 billion bases of human DNA.…
Cao, HuanHuan; Zhang, YuHang; Zhao, Jia; Zhu, Liucun; Wang, Yi; Li, JiaRui; Feng, Yuan-Ming; Zhang, Ning
2017-01-01
Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. According to the "guilt by association" (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen, M.; Vermeulen, R.; Rajaraman, P.
The high incidence of lung cancer in Xuanwei County, China has been attributed to exposure to indoor smoky coal emissions that contain polycyclic aromatic hydrocarbons (PAHs). The inflammatory response induced by coal smoke components may promote lung tumor development. We studied the association between single nucleotide polymorphisms (SNPs) in genes involved in innate immunity and lung cancer risk in a population-based case-control study (122 cases and 122 controls) in Xuanwei. A total of 1,360 tag SNPs in 149 gene regions were included in the analysis. FCER2 rs7249320 was the most significant SNP (OR: 0.30; 95% Cl: 0.16-0.55; P: 0.0001; falsemore » discovery rate value, 0.13) for variant carriers. The gene regions ALOX12B/ALOX15B and KLK2 were associated with increased lung cancer risk globally (false discovery rate value < 0.15). In addition, there were positive interactions between KLK15 rs3745523 and smoky coal use (OR: 9.40; P-interaction = 0.07) and between FCER2 rs7249320 and KLK2 rs2739476 (OR: 10.77; P-interaction = 0.003). Our results suggest that genetic polymorphisms in innate immunity genes may play a role in the genesis of lung cancer caused by PAH-containing coal smoke. Integrin/receptor and complement pathways as well as IgE regulation are particularly noteworthy.« less
Reiner-Benaim, Anat; Yekutieli, Daniel; Letwin, Noah E; Elmer, Gregory I; Lee, Norman H; Kafkafi, Neri; Benjamini, Yoav
2007-09-01
Gene expression and phenotypic functionality can best be associated when they are measured quantitatively within the same experiment. The analysis of such a complex experiment is presented, searching for associations between measures of exploratory behavior in mice and gene expression in brain regions. The analysis of such experiments raises several methodological problems. First and foremost, the size of the pool of potential discoveries being screened is enormous yet only few biologically relevant findings are expected, making the problem of multiple testing especially severe. We present solutions based on screening by testing related hypotheses, then testing the hypotheses of interest. In one variant the subset is selected directly, in the other one a tree of hypotheses is tested hierarchical; both variants control the False Discovery Rate (FDR). Other problems in such experiments are in the fact that the level of data aggregation may be different for the quantitative traits (one per animal) and gene expression measurements (pooled across animals); in that the association may not be linear; and in the resolution of interest only few replications exist. We offer solutions to these problems as well. The hierarchical FDR testing strategies presented here can serve beyond the structure of our motivating example study to any complex microarray study. Supplementary data are available at Bioinformatics online.
In vitro transcriptomic prediction of hepatotoxicity for early drug discovery
Cheng, Feng; Theodorescu, Dan; Schulman, Ira G.; Lee, Jae K.
2012-01-01
Liver toxicity (hepatotoxicity) is a critical issue in drug discovery and development. Standard preclinical evaluation of drug hepatotoxicity is generally performed using in vivo animal systems. However, only a small number of preselected compounds can be examined in vivo due to high experimental costs. A more efficient yet accurate screening technique which can identify potentially hepatotoxic compounds in the early stages of drug development would thus be valuable. Here, we develop and apply a novel genomic prediction technique for screening hepatotoxic compounds based on in vitro human liver cell tests. Using a training set of in vivo rodent experiments for drug hepatotoxicity evaluation, we discovered common biomarkers of drug-induced liver toxicity among six heterogeneous compounds. This gene set was further triaged to a subset of 32 genes that can be used as a multi-gene expression signature to predict hepatotoxicity. This multi-gene predictor was independently validated and showed consistently high prediction performance on five test sets of in vitro human liver cell and in vivo animal toxicity experiments. The predictor also demonstrated utility in evaluating different degrees of toxicity in response to drug concentrations which may be useful not only for discerning a compound’s general hepatotoxicity but also for determining its toxic concentration. PMID:21884709
Plant metabolic clusters - from genetics to genomics.
Nützmann, Hans-Wilhelm; Huang, Ancheng; Osbourn, Anne
2016-08-01
Contents 771 I. 771 II. 772 III. 780 IV. 781 V. 786 786 References 786 SUMMARY: Plant natural products are of great value for agriculture, medicine and a wide range of other industrial applications. The discovery of new plant natural product pathways is currently being revolutionized by two key developments. First, breakthroughs in sequencing technology and reduced cost of sequencing are accelerating the ability to find enzymes and pathways for the biosynthesis of new natural products by identifying the underlying genes. Second, there are now multiple examples in which the genes encoding certain natural product pathways have been found to be grouped together in biosynthetic gene clusters within plant genomes. These advances are now making it possible to develop strategies for systematically mining multiple plant genomes for the discovery of new enzymes, pathways and chemistries. Increased knowledge of the features of plant metabolic gene clusters - architecture, regulation and assembly - will be instrumental in expediting natural product discovery. This review summarizes progress in this area. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Lee, Soo Chan; Idnurm, Alexander
2017-03-01
Although at the level of resolution of genes and molecules most information about mating in fungi is from a single lineage, the Dikarya, many fundamental discoveries about mating in fungi have been made in the earlier branches of the fungi. These are nonmonophyletic groups that were once classified into the chytrids and zygomycetes. Few species in these lineages offer the potential of genetic tractability, thereby hampering the ability to identify the genes that underlie those fundamental insights. Research performed during the past decade has now established the genes required for mating type determination and pheromone synthesis in some species in the phylum Mucoromycota, especially in the order Mucorales. These findings provide striking parallels with the evolution of mating systems in the Dikarya fungi. Other discoveries in the Mucorales provide the first examples of sex-cell type identity being driven directly by a gene that confers mating type, a trait considered more of relevance to animal sex determination but difficult to investigate in animals. Despite these discoveries, there remains much to be gleaned about mating systems from these fungi.
Call, Gerald B.; Olson, John M.; Chen, Jiong; Villarasa, Nikki; Ngo, Kathy T.; Yabroff, Allison M.; Cokus, Shawn; Pellegrini, Matteo; Bibikova, Elena; Bui, Chris; Cespedes, Albert; Chan, Cheryl; Chan, Stacy; Cheema, Amrita K.; Chhabra, Akanksha; Chitsazzadeh, Vida; Do, Minh-Tu; Fang, Q. Angela; Folick, Andrew; Goodstein, Gelsey L.; Huang, Cheng R.; Hung, Tony; Kim, Eunha; Kim, William; Kim, Yulee; Kohan, Emil; Kuoy, Edward; Kwak, Robert; Lee, Eric; Lee, JiEun; Lin, Henry; Liu, H-C. Angela; Moroz, Tatiana; Prasad, Tharani; Prashad, Sacha L.; Patananan, Alexander N.; Rangel, Alma; Rosselli, Desiree; Sidhu, Sohrab; Sitz, Daniel; Taber, Chelsea E.; Tan, Jingwen; Topp, Kasey; Tran, PhuongThao; Tran, Quynh-Minh; Unkovic, Mary; Wells, Maggie; Wickland, Jessica; Yackle, Kevin; Yavari, Amir; Zaretsky, Jesse M.; Allen, Christopher M.; Alli, Latifat; An, Ju; Anwar, Abbas; Arevalo, Sonia; Ayoub, Danny; Badal, Shawn S.; Baghdanian, Armonde; Baghdanian, Arthur H.; Baumann, Sara A.; Becerra, Vivian N.; Chan, Hei J.; Chang, Aileen E.; Cheng, Xibin A.; Chin, Mabel; Chong, Fleurette; Crisostomo, Carlyn; Datta, Sanjit; Delosreyes, Angela; Diep, Francie; Ekanayake, Preethika; Engeln, Mark; Evers, Elizabeth; Farshidi, Farzin; Fischer, Katrina; Formanes, Arlene J.; Gong, Jun; Gupta, Riju; Haas, Blake E.; Hahm, Vicky; Hsieh, Michael; Hui, James Z.; Iao, Mei L.; Jin, Sophia D.; Kim, Angela Y.; Kim, Lydia S-H.; King, Megan; Knudsen-Robbins, Chloe; Kohanchi, David; Kovshilovskaya, Bogdana; Ku, Amy; Kung, Raymond W.; Landig, Mark E. L.; Latterman, Stephanie S.; Lauw, Stephanie S.; Lee, Daniel S.; Lee, Joann S.; Lei, Kai C.; Leung, Lesley L.; Lerner, Renata; Lin, Jian-ya; Lin, Kathleen; Lim, Bryon C.; Lui, Crystal P. Y.; Liu, Tiffany Q.; Luong, Vincent; Makshanoff, Jacob; Mei, An-Chi; Meza, Miguel; Mikhaeil, Yara A.; Moarefi, Majid; Nguyen, Long H.; Pai, Shekhar S.; Pandya, Manish; Patel, Aadit R.; Picard, Paul D.; Safaee, Michael M.; Salame, Carol; Sanchez, Christian; Sanchez, Nina; Seifert, Christina C.; Shah, Abhishek; Shilgevorkyan, Oganes H.; Singh, Inderroop; Soma, Vanessa; Song, Junia J.; Srivastava, Neetika; Sta.Ana, Jennifer L.; Sun, Christie; Tan, Diane; Teruya, Alison S.; Tikia, Robyn; Tran, Trinh; Travis, Emily G.; Trinh, Jennifer D.; Vo, Diane; Walsh, Thomas; Wong, Regan S.; Wu, Katherine; Wu, Ya-Whey; Yang, Nkau X. V.; Yeranosian, Michael; Yu, James S.; Zhou, Jennifer J.; Zhu, Ran X.; Abrams, Anna; Abramson, Amanda; Amado, Latiffe; Anderson, Jenny; Bashour, Keenan; Beyer, Elsa; Bookatz, Allen; Brewer, Sarah; Buu, Natalie; Calvillo, Stephanie; Cao, Joseph; Chan, Amy; Chan, Jenny; Chang, Aileen; Chang, Daniel; Chang, Yuli; Chen, YiBing; Choi, Joo; Chou, Jeyling; Dang, Peter; Datta, Sumit; Davarifar, Ardy; Deravanesian, Artemis; Desai, Poonam; Fabrikant, Jordan; Farnad, Shahbaz; Fu, Katherine; Garcia, Eddie; Garrone, Nick; Gasparyan, Srpouhi; Gayda, Phyllis; Go, Sherrylene; Goffstein, Chad; Gonzalez, Courtney; Guirguis, Mariam; Hassid, Ryan; Hermogeno, Brenda; Hong, Julie; Hong, Aria; Hovestreydt, Lindsay; Hu, Charles; Huff, Devon; Jamshidian, Farid; Jen, James; Kahen, Katrin; Kao, Linda; Kelley, Melissa; Kho, Thomas; Kim, Yein; Kim, Sarah; Kirkpatrick, Brian; Langenbacher, Adam; Laxamana, Santino; Lee, Janet; Lee, Chris; Lee, So-Youn; Lee, ToHang S.; Lee, Toni; Lewis, Gemma; Lezcano, Sheila; Lin, Peter; Luu, Thanh; Luu, Julie; Marrs, Will; Marsh, Erin; Marshall, Jamie; Min, Sarah; Minasian, Tanya; Minye, Helena; Misra, Amit; Morimoto, Miles; Moshfegh, Yasaman; Murray, Jessica; Nguyen, Kha; Nguyen, Cynthia; Nodado, Ernesto; O'Donahue, Amanda; Onugha, Ndidi; Orjiakor, Nneka; Padhiar, Bhavin; Paul, Eric; Pavel-Dinu, Mara; Pavlenko, Alex; Paz, Edwin; Phaklides, Sarah; Pham, Lephong; Poulose, Preethi; Powell, Russell; Pusic, Aya; Ramola, Divi; Regalia, Kirsten; Ribbens, Meghann; Rifai, Bassel; Saakyan, Manyak; Saarikoski, Pamela; Segura, Miriam; Shadpour, Farnaz; Shemmassian, Aram; Singh, Ramnik; Singh, Vivek; Skinner, Emily; Solomin, Daniel; Soneji, Kosha; Spivey, Kristin; Stageberg, Erika; Stavchanskiy, Marina; Tekchandani, Leena; Thai, Leo; Thiyanaratnam, Jayantha; Tong, Maurine; Toor, Aneet; Tovar, Steve; Trangsrud, Kelly; Tsang, Wah-Yung; Uemura, Marc; Vollmer, Emily; Weiss, Emily; Wood, Damien; Wu, Joy; Wu, Sophia; Wu, Winston; Xu, Qing; Yamauchi, Yuki; Yarosh, Will; Yee, Laura; Yen, George; Banerjee, Utpal
2007-01-01
Using a large consortium of undergraduate students in an organized program at the University of California, Los Angeles (UCLA), we have undertaken a functional genomic screen in the Drosophila eye. In addition to the educational value of discovery-based learning, this article presents the first comprehensive genomewide analysis of essential genes involved in eye development. The data reveal the surprising result that the X chromosome has almost twice the frequency of essential genes involved in eye development as that found on the autosomes. PMID:17720911
2013-01-01
Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919
Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin
2016-01-01
ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins, borrelidin, two novel lipopeptides, and one unknown antibiotic from Streptomyces rochei Sal35. The transfer, expression, and screening of the library were all performed in a high-throughput way, so that this approach is scalable and adaptable to industrial automation for next-generation antibiotic discovery. PMID:27451447
Automated Discovery of Functional Generality of Human Gene Expression Programs
Gerber, Georg K; Dowell, Robin D; Jaakkola, Tommi S; Gifford, David K
2007-01-01
An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data. PMID:17696603
Zhao, Zhongming; Guo, An-Yuan; van den Oord, Edwin J C G; Aliev, Fazil; Jia, Peilin; Edenberg, Howard J; Riley, Brien P; Dick, Danielle M; Bettinger, Jill C; Davies, Andrew G; Grotewiel, Michael S; Schuckit, Marc A; Agrawal, Arpana; Kramer, John; Nurnberger, John I; Kendler, Kenneth S; Webb, Bradley T; Miles, Michael F
2012-01-01
A variety of species and experimental designs have been used to study genetic influences on alcohol dependence, ethanol response, and related traits. Integration of these heterogeneous data can be used to produce a ranked target gene list for additional investigation. In this study, we performed a unique multi-species evidence-based data integration using three microarray experiments in mice or humans that generated an initial alcohol dependence (AD) related genes list, human linkage and association results, and gene sets implicated in C. elegans and Drosophila. We then used permutation and false discovery rate (FDR) analyses on the genome-wide association studies (GWAS) dataset from the Collaborative Study on the Genetics of Alcoholism (COGA) to evaluate the ranking results and weighting matrices. We found one weighting score matrix could increase FDR based q-values for a list of 47 genes with a score greater than 2. Our follow up functional enrichment tests revealed these genes were primarily involved in brain responses to ethanol and neural adaptations occurring with alcoholism. These results, along with our experimental validation of specific genes in mice, C. elegans and Drosophila, suggest that a cross-species evidence-based approach is useful to identify candidate genes contributing to alcoholism.
Design of small-molecule epigenetic modulators
Pachaiyappan, Boobalan
2013-01-01
The field of epigenetics has expanded rapidly to reveal multiple new targets for drug discovery. The functional elements of the epigenomic machinery can be catagorized as writers, erasers and readers, and together these elements control cellular gene expression and homeostasis. It is increasingly clear that aberrations in the epigenome can underly a variety of diseases, and thus discovery of small molecules that modulate the epigenome in a specific manner is a viable approach to the discovery of new therapeutic agents. In this Digest, the components of epigenetic control of gene expression will be briefly summarized, and efforts to identify small molecules that modulate epigenetic processes will be described. PMID:24300735
Genome-Wide Association Studies Identify CHRNA5/3 and HTR4 in the Development of Airflow Obstruction
Shrine, Nick R. G.; Loehr, Laura R.; Zhao, Jing Hua; Manichaikul, Ani; Lopez, Lorna M.; Smith, Albert Vernon; Heckbert, Susan R.; Smolonska, Joanna; Tang, Wenbo; Loth, Daan W.; Curjuric, Ivan; Hui, Jennie; Latourelle, Jeanne C.; Henry, Amanda P.; Aldrich, Melinda; Bakke, Per; Beaty, Terri H.; Bentley, Amy R.; Borecki, Ingrid B.; Brusselle, Guy G.; Burkart, Kristin M.; Chen, Ting-hsu; Couper, David; Crapo, James D.; Davies, Gail; Dupuis, Josée; Franceschini, Nora; Gulsvik, Amund; Hancock, Dana B.; Harris, Tamara B.; Hofman, Albert; Imboden, Medea; James, Alan L.; Khaw, Kay-Tee; Lahousse, Lies; Launer, Lenore J.; Litonjua, Augusto; Liu, Yongmei; Lohman, Kurt K.; Lomas, David A.; Lumley, Thomas; Marciante, Kristin D.; McArdle, Wendy L.; Meibohm, Bernd; Morrison, Alanna C.; Musk, Arthur W.; Myers, Richard H.; North, Kari E.; Postma, Dirkje S.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Rochat, Thierry; Rotter, Jerome I.; Artigas, María Soler; Starr, John M.; Uitterlinden, André G.; Wareham, Nicholas J.; Wijmenga, Cisca; Zanen, Pieter; Province, Michael A.; Silverman, Edwin K.; Deary, Ian J.; Palmer, Lyle J.; Cassano, Patricia A.; Gudnason, Vilmundur; Barr, R. Graham; Loos, Ruth J. F.; Strachan, David P.; London, Stephanie J.; Boezen, H. Marike; Probst-Hensch, Nicole; Gharib, Sina A.; Hall, Ian P.; O’Connor, George T.; Tobin, Martin D.; Stricker, Bruno H.
2012-01-01
Rationale: Genome-wide association studies (GWAS) have identified loci influencing lung function, but fewer genes influencing chronic obstructive pulmonary disease (COPD) are known. Objectives: Perform meta-analyses of GWAS for airflow obstruction, a key pathophysiologic characteristic of COPD assessed by spirometry, in population-based cohorts examining all participants, ever smokers, never smokers, asthma-free participants, and more severe cases. Methods: Fifteen cohorts were studied for discovery (3,368 affected; 29,507 unaffected), and a population-based family study and a meta-analysis of case-control studies were used for replication and regional follow-up (3,837 cases; 4,479 control subjects). Airflow obstruction was defined as FEV1 and its ratio to FVC (FEV1/FVC) both less than their respective lower limits of normal as determined by published reference equations. Measurements and Main Results: The discovery meta-analyses identified one region on chromosome 15q25.1 meeting genome-wide significance in ever smokers that includes AGPHD1, IREB2, and CHRNA5/CHRNA3 genes. The region was also modestly associated among never smokers. Gene expression studies confirmed the presence of CHRNA5/3 in lung, airway smooth muscle, and bronchial epithelial cells. A single-nucleotide polymorphism in HTR4, a gene previously related to FEV1/FVC, achieved genome-wide statistical significance in combined meta-analysis. Top single-nucleotide polymorphisms in ADAM19, RARB, PPAP2B, and ADAMTS19 were nominally replicated in the COPD meta-analysis. Conclusions: These results suggest an important role for the CHRNA5/3 region as a genetic risk factor for airflow obstruction that may be independent of smoking and implicate the HTR4 gene in the etiology of airflow obstruction. PMID:22837378
XRCC1 Polymorphism Associated With Late Toxicity After Radiation Therapy in Breast Cancer Patients
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seibold, Petra; Behrens, Sabine; Schmezer, Peter
Purpose: To identify single-nucleotide polymorphisms (SNPs) in oxidative stress–related genes associated with risk of late toxicities in breast cancer patients receiving radiation therapy. Methods and Materials: Using a 2-stage design, 305 SNPs in 59 candidate genes were investigated in the discovery phase in 753 breast cancer patients from 2 prospective cohorts from Germany. The 10 most promising SNPs in 4 genes were evaluated in the replication phase in up to 1883 breast cancer patients from 6 cohorts identified through the Radiogenomics Consortium. Outcomes of interest were late skin toxicity and fibrosis of the breast, as well as an overall toxicity score (Standardized Totalmore » Average Toxicity). Multivariable logistic and linear regression models were used to assess associations between SNPs and late toxicity. A meta-analysis approach was used to summarize evidence. Results: The association of a genetic variant in the base excision repair gene XRCC1, rs2682585, with normal tissue late radiation toxicity was replicated in all tested studies. In the combined analysis of discovery and replication cohorts, carrying the rare allele was associated with a significantly lower risk of skin toxicities (multivariate odds ratio 0.77, 95% confidence interval 0.61-0.96, P=.02) and a decrease in Standardized Total Average Toxicity scores (−0.08, 95% confidence interval −0.15 to −0.02, P=.016). Conclusions: Using a stage design with replication, we identified a variant allele in the base excision repair gene XRCC1 that could be used in combination with additional variants for developing a test to predict late toxicities after radiation therapy in breast cancer patients.« less
Identification of susceptibility genes and genetic modifiers of human diseases
NASA Astrophysics Data System (ADS)
Abel, Kenneth; Kammerer, Stefan; Hoyal, Carolyn; Reneland, Rikard; Marnellos, George; Nelson, Matthew R.; Braun, Andreas
2005-03-01
The completion of the human genome sequence enables the discovery of genes involved in common human disorders. The successful identification of these genes is dependent on the availability of informative sample sets, validated marker panels, a high-throughput scoring technology, and a strategy for combining these resources. We have developed a universal platform technology based on mass spectrometry (MassARRAY) for analyzing nucleic acids with high precision and accuracy. To fuel this technology, we generated more than 100,000 validated assays for single nucleotide polymorphisms (SNPs) covering virtually all known and predicted human genes. We also established a large DNA sample bank comprised of more than 50,000 consented healthy and diseased individuals. This combination of reagents and technology allows the execution of large-scale genome-wide association studies. Taking advantage of MassARRAY"s capability for quantitative analysis of nucleic acids, allele frequencies are estimated in sample pools containing large numbers of individual DNAs. To compare pools as a first-pass "filtering" step is a tremendous advantage in throughput and cost over individual genotyping. We employed this approach in numerous genome-wide, hypothesis-free searches to identify genes associated with common complex diseases, such as breast cancer, osteoporosis, and osteoarthritis, and genes involved in quantitative traits like high density lipoproteins cholesterol (HDL-c) levels and central fat. Access to additional well-characterized patient samples through collaborations allows us to conduct replication studies that validate true disease genes. These discoveries will expand our understanding of genetic disease predisposition, and our ability for early diagnosis and determination of specific disease subtype or progression stage.
Genome-wide ENU mutagenesis for the discovery of novel male fertility regulators.
Jamsai, Duangporn; O'Bryan, Moira K
2010-06-01
The completion of genome sequencing projects has provided an extensive knowledge of the contents of the genomes of human, mouse, and many other organisms. Despite this, the function of most of the estimated 25,000 human genes remains largely unknown. Attention has now turned to elucidating gene function and identifying biological pathways that contribute to human diseases, including male infertility. Our understanding of the genetic regulation of male fertility has been accelerated through the use of genetically modified mouse models including knockout, knock-in, gene-trapped, and transgenic mice. Such reverse genetic approaches however, require some fore-knowledge of a gene's function and, as such, bias against the discovery of completely novel genes and biological pathways. To facilitate high throughput gene discovery, genome-wide mouse mutagenesis via the use of a potent chemical mutagen, N-ethyl-N-nitrosourea (ENU), has been developed over the past decade. This forward genetic, or phenotype-driven, approach relies upon observing a phenotype first, then subsequently defining the underlining genetic defect. Mutations are randomly introduced into the mouse genome via ENU exposure. Through a controlled breeding scheme, mutations causing a phenotype of interest (e.g., male infertility) are then identified by linkage analysis and candidate gene sequencing. This approach allows for the possibility of revealing comprehensive phenotype-genotype relationships for a range of genes and pathways i.e. in addition to null alleles, mice containing partial loss of function or gain-of-function mutations, can be recovered. Such point mutations are likely to be more reflective of those that occur within the human population. Many research groups have successfully used this approach to generate infertile mouse lines and some novel male fertility genes have been revealed. In this review, we focus on the utility of ENU mutagenesis for the discovery of novel male fertility regulators.
HEx: A heterologous expression platform for the discovery of fungal natural products
Schlecht, Ulrich; Horecka, Joe; Lin, Hsiao-Ching; Naughton, Brian; Miranda, Molly; Li, Yong Fuga; Hennessy, James R.; Vandova, Gergana A.; Steinmetz, Lars M.; Sattely, Elizabeth; Khosla, Chaitan; Hillenmeyer, Maureen E.
2018-01-01
For decades, fungi have been a source of U.S. Food and Drug Administration–approved natural products such as penicillin, cyclosporine, and the statins. Recent breakthroughs in DNA sequencing suggest that millions of fungal species exist on Earth, with each genome encoding pathways capable of generating as many as dozens of natural products. However, the majority of encoded molecules are difficult or impossible to access because the organisms are uncultivable or the genes are transcriptionally silent. To overcome this bottleneck in natural product discovery, we developed the HEx (Heterologous EXpression) synthetic biology platform for rapid, scalable expression of fungal biosynthetic genes and their encoded metabolites in Saccharomyces cerevisiae. We applied this platform to 41 fungal biosynthetic gene clusters from diverse fungal species from around the world, 22 of which produced detectable compounds. These included novel compounds with unexpected biosynthetic origins, particularly from poorly studied species. This result establishes the HEx platform for rapid discovery of natural products from any fungal species, even those that are uncultivable, and opens the door to discovery of the next generation of natural products. PMID:29651464
Yu, Liang; Wang, Bingbo; Ma, Xiaoke; Gao, Lin
2016-12-23
Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.
Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal
2017-01-01
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.
Yang, Jian-Hua; Qu, Liang-Hu
2012-01-01
Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
2010-01-01
Subtraction technique has been broadly applied for target gene discovery. However, most current protocols apply relative differential subtraction and result in great amount clone mixtures of unique and differentially expressed genes. This makes it more difficult to identify unique or target-orientated expressed genes. In this study, we developed a novel method for subtraction at mRNA level by integrating magnetic particle technology into driver preparation and tester–driver hybridization to facilitate uniquely expressed gene discovery between peanut immature pod and leaf through a single round subtraction. The resulting target clones were further validated through polymerase chain reaction screening using peanut immature pod and leaf cDNA libraries as templates. This study has resulted in identifying several genes expressed uniquely in immature peanut pod. These target genes can be used for future peanut functional genome and genetic engineering research. PMID:21406066
MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.
Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin
2015-04-01
Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Bianco, Luca; Riccadonna, Samantha; Lavezzo, Enrico; Falda, Marco; Formentin, Elide; Cavalieri, Duccio; Toppo, Stefano
2017-01-01
Abstract Summary: Pathway Inspector is an easy-to-use web application helping researchers to find patterns of expression in complex RNAseq experiments. The tool combines two standard approaches for RNAseq analysis: the identification of differentially expressed genes and a topology-based analysis of enriched pathways. Pathway Inspector is equipped with ad hoc interactive graphical interfaces simplifying the discovery of modulated pathways and the integration of the differentially expressed genes in the corresponding pathway topology. Availability and Implementation: Pathway Inspector is available at the website http://admiral.fmach.it/PI and has been developed in Python, making use of the Django Web Framework. Contact: paolo.fontana@fmach.it PMID:28158604
Discovery of Tumor Suppressor Gene Function.
ERIC Educational Resources Information Center
Oppenheimer, Steven B.
1995-01-01
This is an update of a 1991 review on tumor suppressor genes written at a time when understanding of how the genes work was limited. A recent major breakthrough in the understanding of the function of tumor suppressor genes is discussed. (LZ)
From ecology to base pairs: nursing and genetic science.
Williams, J K; Tripp-Reimer, T
2001-07-01
With the mapping of the human genome has come the opportunity for nursing research to explore topics of concern to the maintenance, restoration, and attainment of genetic-related health. Initially, nursing research on genetic topics originated primarily from physical anthropology and from a clinical, disease-focused perspective. Nursing research subsequently focused on psychosocial aspects of genetic conditions for individuals and their family members. As findings emerge from current human genome discovery, new programs of genetic nursing research are originating from a biobehavioral interface, ranging from the investigations of the influence of specific molecular changes on gene function to social/ethical issues of human health and disease. These initiatives reflect nursing's response to discoveries of gene mutations related to phenotypic expression in both clinical and community-based populations. Genetic research programs are needed that integrate or adapt theoretical and methodological advances in epidemiology, family systems, anthropology, and ethics with those from nursing. Research programs must address not only populations with a specific disease but also community-based genetic health care issues. As genetic health care practice evolves, so will opportunities for research by nurses who can apply genetic concepts and interventions to improve the health of the public. This article presents an analysis of the evolution of genetic nursing research and challengesfor the future.
Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.
Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji
2016-01-01
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145
Apparently low reproducibility of true differential expression discoveries in microarray studies.
Zhang, Min; Yao, Chen; Guo, Zheng; Zou, Jinfeng; Zhang, Lin; Xiao, Hui; Wang, Dong; Yang, Da; Gong, Xue; Zhu, Jing; Li, Yanhui; Li, Xia
2008-09-15
Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.
Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred
2017-01-01
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388
Computational Identification of Novel Genes: Current and Future Perspectives.
Klasberg, Steffen; Bitard-Feildel, Tristan; Mallet, Ludovic
2016-01-01
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
Post-genome integrative biology: so that's what they call clinical science.
Rees, J
2001-01-01
Medical science is increasingly dominated by slogans, a characteristic reflecting its growing bureaucratic and corporate structure. Chief amongst these slogans is the idea that genomics will transform the public health. I believe this view is mistaken. Using studies of the genetics of skin cancer and the genetics of skin pigmentation, I describe how recent discoveries have contributed to our understanding of these topics and of human evolution. I contrast these discoveries with insights gained from other approaches, particularly those based on clinical studies. The 'IKEA model of medical advance'--you just do the basic science in the laboratory and self-assemble in the clinic--is not only damaging to clinical advance, but reflects a widespread ignorance about the nature of disease and how clinical discovery arises. We need to think more about disease and less about genes; more in the clinic and less in the laboratory.
A Road Map for Precision Medicine in the Epilepsies
2015-01-01
Summary Technological advances have paved the way for accelerated genomic discovery and are bringing precision medicine clearly into view. Epilepsy research in particular is well-suited to serve as a model for the development and deployment of targeted therapeutics in precision medicine because of the rapidly expanding genetic knowledge base in epilepsy, the availability of good in vitro and in vivo model systems to efficiently study the biological consequences of genetic mutations, the ability to turn these models into effective drug screening platforms, and the establishment of collaborative research groups. Moving forward, it is critical that we strengthen these collaborations, particularly through integrated research platforms to provide robust analyses both for accurate personal genome analysis and gene and drug discovery. Similarly, the implementation of clinical trial networks will allow the expansion of patient sample populations with genetically defined epilepsy so that drug discovery can be translated into clinical practice. PMID:26416172
Single-cell transcriptomics for microbial eukaryotes.
Kolisko, Martin; Boscaro, Vittorio; Burki, Fabien; Lynn, Denis H; Keeling, Patrick J
2014-11-17
One of the greatest hindrances to a comprehensive understanding of microbial genomics, cell biology, ecology, and evolution is that most microbial life is not in culture. Solutions to this problem have mainly focused on whole-community surveys like metagenomics, but these analyses inevitably loose information and present particular challenges for eukaryotes, which are relatively rare and possess large, gene-sparse genomes. Single-cell analyses present an alternative solution that allows for specific species to be targeted, while retaining information on cellular identity, morphology, and partitioning of activities within microbial communities. Single-cell transcriptomics, pioneered in medical research, offers particular potential advantages for uncultivated eukaryotes, but the efficiency and biases have not been tested. Here we describe a simple and reproducible method for single-cell transcriptomics using manually isolated cells from five model ciliate species; we examine impacts of amplification bias and contamination, and compare the efficacy of gene discovery to traditional culture-based transcriptomics. Gene discovery using single-cell transcriptomes was found to be comparable to mass-culture methods, suggesting single-cell transcriptomics is an efficient entry point into genomic data from the vast majority of eukaryotic biodiversity. Copyright © 2014 Elsevier Ltd. All rights reserved.
Large-scale discovery of novel genetic causes of developmental disorders.
2015-03-12
Despite three decades of successful, predominantly phenotype-driven discovery of the genetic causes of monogenic disorders, up to half of children with severe developmental disorders of probable genetic origin remain without a genetic diagnosis. Particularly challenging are those disorders rare enough to have eluded recognition as a discrete clinical entity, those with highly variable clinical manifestations, and those that are difficult to distinguish from other, very similar, disorders. Here we demonstrate the power of using an unbiased genotype-driven approach to identify subsets of patients with similar disorders. By studying 1,133 children with severe, undiagnosed developmental disorders, and their parents, using a combination of exome sequencing and array-based detection of chromosomal rearrangements, we discovered 12 novel genes associated with developmental disorders. These newly implicated genes increase by 10% (from 28% to 31%) the proportion of children that could be diagnosed. Clustering of missense mutations in six of these newly implicated genes suggests that normal development is being perturbed by an activating or dominant-negative mechanism. Our findings demonstrate the value of adopting a comprehensive strategy, both genome-wide and nationwide, to elucidate the underlying causes of rare genetic disorders.
Anderson, Christopher D.; Biffi, Alessandro; Nalls, Michael A.; Devan, William J.; Schwab, Kristin; Ayres, Alison M.; Valant, Valerie; Ross, Owen A.; Rost, Natalia S.; Saxena, Richa; Viswanathan, Anand; Worrall, Bradford B.; Brott, Thomas G.; Goldstein, Joshua N.; Brown, Devin; Broderick, Joseph P.; Norrving, Bo; Greenberg, Steven M.; Silliman, Scott L.; Hansen, Björn M.; Tirschwell, David L.; Lindgren, Arne; Slowik, Agnieszka; Schmidt, Reinhold; Selim, Magdy; Roquer, Jaume; Montaner, Joan; Singleton, Andrew B.; Kidwell, Chelsea S.; Woo, Daniel; Furie, Karen L.; Meschia, James F.; Rosand, Jonathan
2013-01-01
Background and Purpose Prior studies demonstrated association between mitochondrial DNA variants and ischemic stroke (IS). We investigated whether variants within a larger set of oxidative phosphorylation (OXPHOS) genes encoded by both autosomal and mitochondrial DNA were associated with risk of IS and, based on our results, extended our investigation to intracerebral hemorrhage (ICH). Methods This association study employed a discovery cohort of 1643 individuals, a validation cohort of 2432 individuals for IS, and an extension cohort of 1476 individuals for ICH. Gene-set enrichment analysis (GSEA) was performed on all structural OXPHOS genes, as well as genes contributing to individual respiratory complexes. Gene-sets passing GSEA were tested by constructing genetic scores using common variants residing within each gene. Associations between each variant and IS that emerged in the discovery cohort were examined in validation and extension cohorts. Results IS was associated with genetic risk scores in OXPHOS as a whole (odds ratio (OR)=1.17, p=0.008) and Complex I (OR=1.06, p=0.050). Among IS subtypes, small vessel (SV) stroke showed association with OXPHOS (OR=1.16, p=0.007), Complex I (OR=1.13, p=0.027) and Complex IV (OR 1.14, p=0.018). To further explore this SV association, we extended our analysis to ICH, revealing association between deep hemispheric ICH and Complex IV (OR=1.08, p=0.008). Conclusions This pathway analysis demonstrates association between common genetic variants within OXPHOS genes and stroke. The associations for SV stroke and deep ICH suggest that genetic variation in OXPHOS influences small vessel pathobiology. Further studies are needed to identify culprit genetic variants and assess their functional consequences. PMID:23362085
Meta-analysis of genome-wide association studies for personality
de Moor, Marleen H.M.; Costa, Paul T.; Terracciano, Antonio; Krueger, Robert F.; de Geus, Eco J.C.; Toshiko, Tanaka; Penninx, Brenda W.J.H.; Esko, Tõnu; Madden, Pamela A F; Derringer, Jaime; Amin, Najaf; Willemsen, Gonneke; Hottenga, Jouke-Jan; Distel, Marijn A.; Uda, Manuela; Sanna, Serena; Spinhoven, Philip; Hartman, Catharina A.; Sullivan, Patrick; Realo, Anu; Allik, Jüri; Heath, Andrew C; Pergadia, Michele L; Agrawal, Arpana; Lin, Peng; Grucza, Richard; Nutile, Teresa; Ciullo, Marina; Rujescu, Dan; Giegling, Ina; Konte, Bettina; Widen, Elisabeth; Cousminer, Diana L; Eriksson, Johan G.; Palotie, Aarno; Luciano, Michelle; Tenesa, Albert; Davies, Gail; Lopez, Lorna M.; Hansell, Narelle K.; Medland, Sarah E.; Ferrucci, Luigi; Schlessinger, David; Montgomery, Grant W.; Wright, Margaret J.; Aulchenko, Yurii S.; Janssens, A.Cecile J.W.; Oostra, Ben A.; Metspalu, Andres; Abecasis, Gonçalo R.; Deary, Ian J.; Räikkönen, Katri; Bierut, Laura J.; Martin, Nicholas G.; van Duijn, Cornelia M.; Boomsma, Dorret I.
2013-01-01
Personality can be thought of as a set of characteristics that influence people’s thoughts, feelings, and behaviour across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in ten discovery samples (17 375 adults) and five in-silico replication samples (3 294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data were available of ~2.4M Single Nucleotide Polymorphisms (SNPs; directly typed and imputed using HAPMAP data). In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P = 2.8 × 10−8 and 3.1 × 10−8) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P = 4.9 × 10−8). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In-silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness. PMID:21173776
Strain Prioritization and Genome Mining for Enediyne Natural Products
Yan, Xiaohui; Ge, Huiming; Huang, Tingting; Hindra; Yang, Dong; Teng, Qihui; Crnovčić, Ivana; Li, Xiuling; Rudolf, Jeffrey D.; Lohman, Jeremy R.; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Van Nieuwerburgh, Filip; Rader, Christoph
2016-01-01
ABSTRACT The enediyne family of natural products has had a profound impact on modern chemistry, biology, and medicine, and yet only 11 enediynes have been structurally characterized to date. Here we report a genome survey of 3,400 actinomycetes, identifying 81 strains that harbor genes encoding the enediyne polyketide synthase cassettes that could be grouped into 28 distinct clades based on phylogenetic analysis. Genome sequencing of 31 representative strains confirmed that each clade harbors a distinct enediyne biosynthetic gene cluster. A genome neighborhood network allows prediction of new structural features and biosynthetic insights that could be exploited for enediyne discovery. We confirmed one clade as new C-1027 producers, with a significantly higher C-1027 titer than the original producer, and discovered a new family of enediyne natural products, the tiancimycins (TNMs), that exhibit potent cytotoxicity against a broad spectrum of cancer cell lines. Our results demonstrate the feasibility of rapid discovery of new enediynes from a large strain collection. PMID:27999165
Heterogeneous data fusion for brain tumor classification.
Metsis, Vangelis; Huang, Heng; Andronesi, Ovidiu C; Makedon, Fillia; Tzika, Aria
2012-10-01
Current research in biomedical informatics involves analysis of multiple heterogeneous data sets. This includes patient demographics, clinical and pathology data, treatment history, patient outcomes as well as gene expression, DNA sequences and other information sources such as gene ontology. Analysis of these data sets could lead to better disease diagnosis, prognosis, treatment and drug discovery. In this report, we present a novel machine learning framework for brain tumor classification based on heterogeneous data fusion of metabolic and molecular datasets, including state-of-the-art high-resolution magic angle spinning (HRMAS) proton (1H) magnetic resonance spectroscopy and gene transcriptome profiling, obtained from intact brain tumor biopsies. Our experimental results show that our novel framework outperforms any analysis using individual dataset.
Verbist, Bie; Klambauer, Günter; Vervoort, Liesbet; Talloen, Willem; Shkedy, Ziv; Thas, Olivier; Bender, Andreas; Göhlmann, Hinrich W H; Hochreiter, Sepp
2015-05-01
The pharmaceutical industry is faced with steadily declining R&D efficiency which results in fewer drugs reaching the market despite increased investment. A major cause for this low efficiency is the failure of drug candidates in late-stage development owing to safety issues or previously undiscovered side-effects. We analyzed to what extent gene expression data can help to de-risk drug development in early phases by detecting the biological effects of compounds across disease areas, targets and scaffolds. For eight drug discovery projects within a global pharmaceutical company, gene expression data were informative and able to support go/no-go decisions. Our studies show that gene expression profiling can detect adverse effects of compounds, and is a valuable tool in early-stage drug discovery decision making. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Sweetening the pot: adding glycosylation to the biomarker discovery equation.
Drake, Penelope M; Cho, Wonryeon; Li, Bensheng; Prakobphol, Akraporn; Johansen, Eric; Anderson, N Leigh; Regnier, Fred E; Gibson, Bradford W; Fisher, Susan J
2010-02-01
Cancer has profound effects on gene expression, including a cell's glycosylation machinery. Thus, tumors produce glycoproteins that carry oligosaccharides with structures that are markedly different from the same protein produced by a normal cell. A single protein can have many glycosylation sites that greatly amplify the signals they generate compared with their protein backbones. In this article, we survey clinical tests that target carbohydrate modifications for diagnosing and treating cancer. We present the biological relevance of glycosylation to disease progression by highlighting the role these structures play in adhesion, signaling, and metastasis and then address current methodological approaches to biomarker discovery that capitalize on selectively capturing tumor-associated glycoforms to enrich and identify disease-related candidate analytes. Finally, we discuss emerging technologies--multiple reaction monitoring and lectin-antibody arrays--as potential tools for biomarker validation studies in pursuit of clinically useful tests. The future of carbohydrate-based biomarker studies has arrived. At all stages, from discovery through verification and deployment into clinics, glycosylation should be considered a primary readout or a way of increasing the sensitivity and specificity of protein-based analyses.
Sweetening the pot: adding glycosylation to the biomarker discovery equation
Drake, Penelope M.; Cho, Wonryeon; Li, Bensheng; Prakobphol, Akraporn; Johansen, Eric; Anderson, N. Leigh; Regnier, Fred E.; Gibson, Bradford W.; Fisher, Susan J.
2010-01-01
Background Cancer has profound effects on gene expression, including a cell’s glycosylation machinery. Thus, tumors produce glycoproteins that carry oligosaccharides with structures that are markedly different from the same protein produced by a normal cell. A single protein can have many glycosylation sites that greatly amplify the signals they generate as compared to their protein backbones. Content We survey clinical tests that target carbohydrate modifications. for diagnosing and treating cancer. Next, we present the biological relevance of glycosylation to disease progression by highlighting the role these structures play in adhesion, signaling and metastasis, and then address current methodological approaches to biomarker discovery that capitalize on selectively capturing tumor-associated glycoforms to enrich and identify disease-related candidate analytes. Finally, we discuss emerging technologies—multiple reaction monitoring and lectin-antibody arrays—as potential tools for biomarker validation studies in pursuit of clinically useful tests. Summary The future of carbohydrate-based biomarker studies has arrived. At all stages, from discovery through verification and deployment into clinics, glycosylation should be considered a primary readout or a way of increasing the sensitivity and specificity of protein-based analyses. PMID:19959616
Gene selection for tumor classification using neighborhood rough sets and entropy measures.
Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu
2017-03-01
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.
Delineation of metabolic gene clusters in plant genomes by chromatin signatures
Yu, Nan; Nützmann, Hans-Wilhelm; MacDonald, James T.; Moore, Ben; Field, Ben; Berriri, Souha; Trick, Martin; Rosser, Susan J.; Kumar, S. Vinod; Freemont, Paul S.; Osbourn, Anne
2016-01-01
Plants are a tremendous source of diverse chemicals, including many natural product-derived drugs. It has recently become apparent that the genes for the biosynthesis of numerous different types of plant natural products are organized as metabolic gene clusters, thereby unveiling a highly unusual form of plant genome architecture and offering novel avenues for discovery and exploitation of plant specialized metabolism. Here we show that these clustered pathways are characterized by distinct chromatin signatures of histone 3 lysine trimethylation (H3K27me3) and histone 2 variant H2A.Z, associated with cluster repression and activation, respectively, and represent discrete windows of co-regulation in the genome. We further demonstrate that knowledge of these chromatin signatures along with chromatin mutants can be used to mine genomes for cluster discovery. The roles of H3K27me3 and H2A.Z in repression and activation of single genes in plants are well known. However, our discovery of highly localized operon-like co-regulated regions of chromatin modification is unprecedented in plants. Our findings raise intriguing parallels with groups of physically linked multi-gene complexes in animals and with clustered pathways for specialized metabolism in filamentous fungi. PMID:26895889
He, Bing; Zhang, Hu-Qin
2017-01-01
Lung cancer is one of the most common causes of cancer-related death in the world. The large number of lung cancer cases is non-small cell lung cancer (NSCLC), which approximately accounting for 75% of lung cancer. Over the past years, our comprehensive knowledge about the molecular biology of NSCLC has been rapidly enriching, which has promoted the discovery of driver genes in NSCLC and directed FDA-approved targeted therapies. Of course, the targeted therapies based on driver genes provide a more exact option for advanced non-small cell lung cancer, improving the survival rate of patients. Now, we will review the landscape of driver genes in NSCLC including the characteristics, detection methods, the application of target therapy and challenges. PMID:28915704
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
Design of small molecule epigenetic modulators.
Pachaiyappan, Boobalan; Woster, Patrick M
2014-01-01
The field of epigenetics has expanded rapidly to reveal multiple new targets for drug discovery. The functional elements of the epigenomic machinery can be categorized as writers, erasers and readers, and together these elements control cellular gene expression and homeostasis. It is increasingly clear that aberrations in the epigenome can underly a variety of diseases, and thus discovery of small molecules that modulate the epigenome in a specific manner is a viable approach to the discovery of new therapeutic agents. In this Digest, the components of epigenetic control of gene expression will be briefly summarized, and efforts to identify small molecules that modulate epigenetic processes will be described. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Geeleher, Paul; Cox, Nancy J; Huang, R Stephanie
2016-09-21
We show that variability in general levels of drug sensitivity in pre-clinical cancer models confounds biomarker discovery. However, using a very large panel of cell lines, each treated with many drugs, we could estimate a general level of sensitivity to all drugs in each cell line. By conditioning on this variable, biomarkers were identified that were more likely to be effective in clinical trials than those identified using a conventional uncorrected approach. We find that differences in general levels of drug sensitivity are driven by biologically relevant processes. We developed a gene expression based method that can be used to correct for this confounder in future studies.
Genome engineering for microbial natural product discovery.
Choi, Si-Sun; Katsuyama, Yohei; Bai, Linquan; Deng, Zixin; Ohnishi, Yasuo; Kim, Eung-Soo
2018-03-03
The discovery and development of microbial natural products (MNPs) have played pivotal roles in the fields of human medicine and its related biotechnology sectors over the past several decades. The post-genomic era has witnessed the development of microbial genome mining approaches to isolate previously unsuspected MNP biosynthetic gene clusters (BGCs) hidden in the genome, followed by various BGC awakening techniques to visualize compound production. Additional microbial genome engineering techniques have allowed higher MNP production titers, which could complement a traditional culture-based MNP chasing approach. Here, we describe recent developments in the MNP research paradigm, including microbial genome mining, NP BGC activation, and NP overproducing cell factory design. Copyright © 2018 Elsevier Ltd. All rights reserved.
Discovery of Host Factors and Pathways Utilized in Hantaviral Infection
2016-09-01
AWARD NUMBER: W81XWH-14-1-0204 TITLE: Discovery of Host Factors and Pathways Utilized in Hantaviral Infection PRINCIPAL INVESTIGATOR: Paul...Aug 2016 4. TITLE AND SUBTITLE Discovery of Host Factors and Pathways Utilized in Hantaviral Infection 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c...after significance values were calculated and corrected for false discovery rate. The top hit is ATP6V0A1, a gene encoding a subunit of a vacuolar
Progress toward Gene Therapy for Duchenne Muscular Dystrophy.
Chamberlain, Joel R; Chamberlain, Jeffrey S
2017-05-03
Duchenne muscular dystrophy (DMD) has been a major target for gene therapy development for nearly 30 years. DMD is among the most common genetic diseases, and isolation of the defective gene (DMD, or dystrophin) was a landmark discovery, as it was the first time a human disease gene had been cloned without knowledge of the protein product. Despite tremendous obstacles, including the enormous size of the gene and the large volume of muscle tissue in the human body, efforts to devise a treatment based on gene replacement have advanced steadily through the combined efforts of dozens of labs and patient advocacy groups. Progress in the development of DMD gene therapy has been well documented in Molecular Therapy over the past 20 years and will be reviewed here to highlight prospects for success in the imminent human clinical trials planned by several groups. Copyright © 2017 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.
Down-Regulation of Gene Expression by RNA-Induced Gene Silencing
NASA Astrophysics Data System (ADS)
Travella, Silvia; Keller, Beat
Down-regulation of endogenous genes via post-transcriptional gene silencing (PTGS) is a key to the characterization of gene function in plants. Many RNA-based silencing mechanisms such as post-transcriptional gene silencing, co-suppression, quelling, and RNA interference (RNAi) have been discovered among species of different kingdoms (plants, fungi, and animals). One of the most interesting discoveries was RNAi, a sequence-specific gene-silencing mechanism initiated by the introduction of double-stranded RNA (dsRNA), homologous in sequence to the silenced gene, which triggers degradation of mRNA. Infection of plants with modified viruses can also induce RNA silencing and is referred to as virus-induced gene silencing (VIGS). In contrast to insertional mutagenesis, these emerging new reverse genetic approaches represent a powerful tool for exploring gene function and for manipulating gene expression experimentally in cereal species such as barley and wheat. We examined how RNAi and VIGS have been used to assess gene function in barley and wheat, including molecular mechanisms involved in the process and available methodological elements, such as vectors, inoculation procedures, and analysis of silenced phenotypes.
Suggestive association between variants in IL1RAPL and asthma symptoms in Latin American children.
Marques, Cintia Rodrigues; Costa, Gustavo No; da Silva, Thiago Magalhães; Oliveira, Pablo; Cruz, Alvaro A; Alcantara-Neves, Neuza Maria; Fiaccone, Rosemeire L; Horta, Bernardo L; Hartwig, Fernando Pires; Burchard, Esteban G; Pino-Yanes, Maria; Rodrigues, Laura C; Lima-Costa, Maria Fernanda; Pereira, Alexandre C; Gouveia, Mateus H; Sant Anna, Hanaisa P; Tarazona-Santos, Eduardo; Lima Barreto, Maurício; Figueiredo, Camila Alexandrina
2017-04-01
Several genome-wide association studies have been conducted to investigate the influence of genetic polymorphisms in the development of allergic diseases, but few of them have included the X chromosome. The aim of present study was to perform an X chromosome-wide association study (X-WAS) for asthma symptoms. The study included 1307 children of which 294 were asthma cases. DNA was genotyped using 2.5 HumanOmni Beadchip from Illumina. Statistical analyses were performed in PLINK 1.9, MACH 1.0 and Minimac2. The variant rs12007907 (g.29483892C>A) in IL1RAPL gene was suggestively associated with asthma symptoms in discovery set (odds ratio (OR)=0.49, 95% confidence interval (CI): 0.37-0.67; P=3.33 × 10 - 6 ). This result was replicated in the ProAr cohort in men only (OR=0.45, 95% CI: 0.21-0.95; P=0.038). Furthermore, investigating the functional role of the rs12007907 on the production a Th2-type cytokine, IL-13, we found a negative association between the minor allele A with IL-13 production in the discovery set (P=0.044). Gene-based analysis revealed that NUDT10 was the most consistently associated with asthma symptoms in discovery sample. In conclusion, the rs12007907 variant in IL1RAPL gene was negatively associated with asthma and IL-13 production in our study and a sex-specific association was observed in one of the validation samples. It suggests an effect on asthma susceptibility and may explain differences in severe asthma frequency between women and men.
Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes
NASA Astrophysics Data System (ADS)
Ashkani, Jahanshah; Naidoo, Kevin J.
2016-05-01
Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.
Exploiting Pre-rRNA Processing in Diamond Blackfan Anemia Gene Discovery and Diagnosis
Farrar, Jason E.; Quarello, Paola; Fisher, Ross; O’Brien, Kelly A.; Aspesi, Anna; Parrella, Sara; Henson, Adrianna L.; Seidel, Nancy E.; Atsidaftos, Eva; Prakash, Supraja; Bari, Shahla; Garelli, Emanuela; Arceci, Robert J.; Dianzani, Irma; Ramenghi, Ugo; Vlachos, Adrianna; Lipton, Jeffrey M.; Bodine, David M.; Ellis, Steven R.
2014-01-01
Diamond Blackfan anemia (DBA), a syndrome primarily characterized by anemia and physical abnormalities, is one among a group of related inherited bone marrow failure syndromes (IBMFS) which share overlapping clinical features. Heterozygous mutations or single-copy deletions have been identified in 12 ribosomal protein genes in approximately 60% of DBA cases, with the genetic etiology unexplained in most remaining patients. Unlike many IBMFS, for which functional screening assays complement clinical and genetic findings, suspected DBA in the absence of typical alterations of the known genes must frequently be diagnosed after exclusion of other IBMFS. We report here a novel deletion in a child that presented such a diagnostic challenge and prompted development of a novel functional assay that can assist in the diagnosis of a significant fraction of patients with DBA. The ribosomal proteins affected in DBA are required for pre-rRNA processing, a process which can be interrogated to monitor steps in the maturation of 40S and 60S ribosomal subunits. In contrast to prior methods used to assess pre-rRNA processing, the assay reported here, based on capillary electrophoresis measurement of the maturation of rRNA in pre-60S ribosomal subunits, would be readily amenable to use in diagnostic laboratories. In addition to utility as a diagnostic tool, we applied this technique to gene discovery in DBA, resulting in the identification of RPL31 as a novel DBA gene. PMID:25042156
Kell, Douglas B
2012-01-01
A considerable number of areas of bioscience, including gene and drug discovery, metabolic engineering for the biotechnological improvement of organisms, and the processes of natural and directed evolution, are best viewed in terms of a ‘landscape’ representing a large search space of possible solutions or experiments populated by a considerably smaller number of actual solutions that then emerge. This is what makes these problems ‘hard’, but as such these are to be seen as combinatorial optimisation problems that are best attacked by heuristic methods known from that field. Such landscapes, which may also represent or include multiple objectives, are effectively modelled in silico, with modern active learning algorithms such as those based on Darwinian evolution providing guidance, using existing knowledge, as to what is the ‘best’ experiment to do next. An awareness, and the application, of these methods can thereby enhance the scientific discovery process considerably. This analysis fits comfortably with an emerging epistemology that sees scientific reasoning, the search for solutions, and scientific discovery as Bayesian processes. PMID:22252984
Kell, Douglas B
2012-03-01
A considerable number of areas of bioscience, including gene and drug discovery, metabolic engineering for the biotechnological improvement of organisms, and the processes of natural and directed evolution, are best viewed in terms of a 'landscape' representing a large search space of possible solutions or experiments populated by a considerably smaller number of actual solutions that then emerge. This is what makes these problems 'hard', but as such these are to be seen as combinatorial optimisation problems that are best attacked by heuristic methods known from that field. Such landscapes, which may also represent or include multiple objectives, are effectively modelled in silico, with modern active learning algorithms such as those based on Darwinian evolution providing guidance, using existing knowledge, as to what is the 'best' experiment to do next. An awareness, and the application, of these methods can thereby enhance the scientific discovery process considerably. This analysis fits comfortably with an emerging epistemology that sees scientific reasoning, the search for solutions, and scientific discovery as Bayesian processes. Copyright © 2012 WILEY Periodicals, Inc.
Classic fungal natural products in the genomic age: the molecular legacy of Harold Raistrick.
Schor, Raissa; Cox, Russell
2018-03-01
Covering: 1893 to 2017Harold Raistrick was involved in the discovery of many of the most important classes of fungal metabolites during the 20th century. This review focusses on how these discoveries led to developments in isotopic labelling, biomimetic chemistry and the discovery, analysis and exploitation of biosynthetic gene clusters for major classes of fungal metabolites including: alternariol; geodin and metabolites of the emodin pathway; maleidrides; citrinin and the azaphilones; dehydrocurvularin; mycophenolic acid; and the tropolones. Key recent advances in the molecular understanding of these important pathways, including the discovery of biosynthetic gene clusters, the investigation of the molecular and chemical aspects of key biosynthetic steps, and the reengineering of key components of the pathways are reviewed and compared. Finally, discussion of key relationships between metabolites and pathways and the most important recent advances and opportunities for future research directions are given.
Discovery of a new polyhydroxyalkanoate synthase from limestone soil through metagenomic approach.
Tai, Yen Teng; Foong, Choon Pin; Najimudin, Nazalan; Sudesh, Kumar
2016-04-01
PHA synthase (PhaC) is the key enzyme in the production of biodegradable plastics known as polyhydroxyalkanoate (PHA). Nevertheless, most of these enzymes are isolated from cultivable bacteria using traditional isolation method. Most of the microorganisms found in nature could not be successfully cultivated due to the lack of knowledge on their growth conditions. In this study, a culture-independent approach was applied. The presence of phaC genes in limestone soil was screened using primers targeting the class I and II PHA synthases. Based on the partial gene sequences, a total of 19 gene clusters have been identified and 7 clones were selected for full length amplification through genome walking. The complete phaC gene sequence of one of the clones (SC8) was obtained and it revealed 81% nucleotide identity to the PHA synthase gene of Chromobacterium violaceum ATCC 12472. This gene obtained from uncultured bacterium was successfully cloned and expressed in a Cupriavidus necator PHB(-)4 PHA-negative mutant resulting in the accumulation of significant amount of PHA. The PHA synthase activity of this transformant was 64 ± 12 U/g proteins. This paper presents a pioneering study on the discovery of phaC in a limestone area using metagenomic approach. Through this study, a new functional phaC was discovered from uncultured bacterium. Phylogenetic classification for all the phaCs isolated from this study has revealed that limestone hill harbors a great diversity of PhaCs with activities that have not yet been investigated. Copyright © 2015 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Lamba, Jatinder K; Crews, Kristine R; Pounds, Stanley B; Cao, Xueyuan; Gandhi, Varsha; Plunkett, William; Razzouk, Bassem I; Lamba, Vishal; Baker, Sharyn D; Raimondi, Susana C; Campana, Dario; Pui, Ching-Hon; Downing, James R; Rubnitz, Jeffrey E; Ribeiro, Raul C
2011-01-01
Aim To identify gene-expression signatures predicting cytarabine response by an integrative analysis of multiple clinical and pharmacological end points in acute myeloid leukemia (AML) patients. Materials & methods We performed an integrated analysis to associate the gene expression of diagnostic bone marrow blasts from acute myeloid leukemia (AML) patients treated in the discovery set (AML97; n = 42) and in the independent validation set (AML02; n = 46) with multiple clinical and pharmacological end points. Based on prior biological knowledge, we defined a gene to show a therapeutically beneficial (detrimental) pattern of association of its expression positively (negatively) correlated with favorable phenotypes such as intracellular cytarabine 5´-triphosphate levels, morphological response and event-free survival, and negatively (positively) correlated with unfavorable end points such as post-cytarabine DNA synthesis levels, minimal residual disease and cytarabine LC50. Results We identified 240 probe sets predicting a therapeutically beneficial pattern and 97 predicting detrimental pattern (p ≤ 0.005) in the discovery set. Of these, 60 were confirmed in the independent validation set. The validated probe sets correspond to genes involved in PIK3/PTEN/AKT/mTOR signaling, G-protein-coupled receptor signaling and leukemogenesis. This suggests that targeting these pathways as potential pharmacogenomic and therapeutic candidates could be useful for improving treatment outcomes in AML. Conclusion This study illustrates the power of integrated data analysis of genomic data as well as multiple clinical and pharmacologic end points in the identification of genes and pathways of biological relevance. PMID:21449673
Zhao, Chao; Chu, Yanan; Li, Yanhong; Yang, Chengfeng; Chen, Yuqing; Wang, Xumin; Liu, Bin
2017-01-01
To analyze the microbial diversity and gene content of a thermophilic cellulose-degrading consortium from hot springs in Xiamen, China using 454 pyrosequencing for discovering cellulolytic enzyme resources. A thermophilic cellulose-degrading consortium, XM70 that was isolated from a hot spring, used sugarcane bagasse as sole carbon and energy source. DNA sequencing of the XM70 sample resulted in 349,978 reads with an average read length of 380 bases, accounting for 133,896,867 bases of sequence information. The characterization of sequencing reads and assembled contigs revealed that most microbes were derived from four phyla: Geobacillus (Firmicutes), Thermus, Bacillus, and Anoxybacillus. Twenty-eight homologous genes belonging to 15 glycoside hydrolase families were detected, including several cellulase genes. A novel hot spring metagenome-derived thermophilic cellulase was expressed and characterized. The application value of thermostable sugarcane bagasse-degrading enzymes is shown for production of cellulosic biofuel. The practical power of using a short-read-based metagenomic approach for harvesting novel microbial genes is also demonstrated.
Applications of transgenics in studies of bone sialoprotein.
Zhang, Jin; Tu, Qisheng; Chen, Jake
2009-07-01
Bone sialoprotein (BSP) is a major non-collagenous protein in mineralizing connective tissues such as dentin, cementum and calcified cartilage tissues. As a member of the Small Integrin-Binding Ligand, N-linked Glycoprotein (SIBLING) gene family of glycoproteins, BSP is involved in regulating hydroxyapatite crystal formation in bones and teeth, and has long been used as a marker gene for osteogenic differentiation. In the most recent decade, new discoveries in BSP gene expression and regulation, bone remodeling, bone metastasis, and bone tissue engineering have been achieved with the help of transgenic mice. In this review, we discuss these new discoveries obtained from the literatures and from our own laboratory, which were derived from the use of transgenic mouse mutants related to BSP gene or its promoter activity.
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2011-01-01
Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235
GreenPhylDB v2.0: comparative and functional genomics in plants.
Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G
2011-01-01
GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.
Chevrier, Sandy; Boidot, Romain
2014-10-06
The widespread use of Next Generation Sequencing has opened up new avenues for cancer research and diagnosis. NGS will bring huge amounts of new data on cancer, and especially cancer genetics. Current knowledge and future discoveries will make it necessary to study a huge number of genes that could be involved in a genetic predisposition to cancer. In this regard, we developed a Nextera design to study 11 complete genes involved in DNA damage repair. This protocol was developed to safely study 11 genes (ATM, BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD50, RAD51C, RAD80, and TP53) from promoter to 3'-UTR in 24 patients simultaneously. This protocol, based on transposase technology and gDNA enrichment, gives a great advantage in terms of time for the genetic diagnosis thanks to sample multiplexing. This protocol can be safely used with blood gDNA.
Krämer, Andreas; Shah, Sohela; Rebres, Robert Anthony; Tang, Susan; Richards, Daniel Rene
2017-08-11
Next-generation sequencing is widely used to identify disease-causing variants in patients with rare genetic disorders. Identifying those variants from whole-genome or exome data can be both scientifically challenging and time consuming. A significant amount of time is spent on variant annotation, and interpretation. Fully or partly automated solutions are therefore needed to streamline and scale this process. We describe Phenotype Driven Ranking (PDR), an algorithm integrated into Ingenuity Variant Analysis, that uses observed patient phenotypes to prioritize diseases and genes in order to expedite causal-variant discovery. Our method is based on a network of phenotype-disease-gene relationships derived from the QIAGEN Knowledge Base, which allows for efficient computational association of phenotypes to implicated diseases, and also enables scoring and ranking. We have demonstrated the utility and performance of PDR by applying it to a number of clinical rare-disease cases, where the true causal gene was known beforehand. It is also shown that PDR compares favorably to a representative alternative tool.
Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies.
Lin, Jhih-Rong; Zhang, Quanwei; Cai, Ying; Morrow, Bernice E; Zhang, Zhengdong D
2017-12-01
Rare variants of major effect play an important role in human complex diseases and can be discovered by sequencing-based genome-wide association studies. Here, we introduce an integrated approach that combines the rare variant association test with gene network and phenotype information to identify risk genes implicated by rare variants for human complex diseases. Our data integration method follows a 'discovery-driven' strategy without relying on prior knowledge about the disease and thus maintains the unbiased character of genome-wide association studies. Simulations reveal that our method can outperform a widely-used rare variant association test method by 2 to 3 times. In a case study of a small disease cohort, we uncovered putative risk genes and the corresponding rare variants that may act as genetic modifiers of congenital heart disease in 22q11.2 deletion syndrome patients. These variants were missed by a conventional approach that relied on the rare variant association test alone.
Kang, Hahk-Soo
2017-02-01
Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.
2009-01-01
Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666
Jani, Saurin D; Argraves, Gary L; Barth, Jeremy L; Argraves, W Scott
2010-04-01
An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies, intermolecular interactions, overlays of genes onto KEGG pathway diagrams and heatmaps of expression intensity values. GeneMesh is freely available online at http://proteogenomics.musc.edu/genemesh/.
Kaufmann, Markus; Schuffenhauer, Ansgar; Fruh, Isabelle; Klein, Jessica; Thiemeyer, Anke; Rigo, Pierre; Gomez-Mancilla, Baltazar; Heidinger-Millot, Valerie; Bouwmeester, Tewis; Schopfer, Ulrich; Mueller, Matthias; Fodor, Barna D; Cobos-Correa, Amanda
2015-10-01
Fragile X syndrome (FXS) is the most common form of inherited mental retardation, and it is caused in most of cases by epigenetic silencing of the Fmr1 gene. Today, no specific therapy exists for FXS, and current treatments are only directed to improve behavioral symptoms. Neuronal progenitors derived from FXS patient induced pluripotent stem cells (iPSCs) represent a unique model to study the disease and develop assays for large-scale drug discovery screens since they conserve the Fmr1 gene silenced within the disease context. We have established a high-content imaging assay to run a large-scale phenotypic screen aimed to identify compounds that reactivate the silenced Fmr1 gene. A set of 50,000 compounds was tested, including modulators of several epigenetic targets. We describe an integrated drug discovery model comprising iPSC generation, culture scale-up, and quality control and screening with a very sensitive high-content imaging assay assisted by single-cell image analysis and multiparametric data analysis based on machine learning algorithms. The screening identified several compounds that induced a weak expression of fragile X mental retardation protein (FMRP) and thus sets the basis for further large-scale screens to find candidate drugs or targets tackling the underlying mechanism of FXS with potential for therapeutic intervention. © 2015 Society for Laboratory Automation and Screening.
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weighill, Deborah; Jones, Piet; Shah, Manesh
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance.« less
SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate
Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon
2016-01-01
Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
Weighill, Deborah; Jones, Piet; Shah, Manesh; ...
2018-05-11
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance.« less
2011-01-01
Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results We have observed that (1) the Tol2 transposase (but not piggyBac) is highly sensitive to molecular engineering; (2) the piggyBac donor with only the 40 bp 3'-and 67 bp 5'-terminal repeat domain is sufficient for effective transposition; and (3) a small amount of piggyBac transposases results in robust transposition suggesting the piggyBac transpospase is highly active. Performing genome-wide target profiling on data sets obtained by retrieving chromosomal targeting sequences from individual clones, we have identified several piggyBac and Tol2 hotspots and observed that (4) piggyBac and Tol2 display a clear difference in targeting preferences in the human genome. Finally, we have observed that (5) only sites with a particular sequence context can be targeted by either piggyBac or Tol2. Conclusions The non-overlapping targeting preference of piggyBac and Tol2 makes them complementary research tools for manipulating mammalian genomes. PiggyBac is the most promising transposon-based vector system for achieving site-specific targeting of therapeutic genes due to the flexibility of its transposase for being molecularly engineered. Insights from this study will provide a basis for engineering piggyBac transposases to achieve site-specific therapeutic gene targeting. PMID:21447194
Harris, Katherine E; Aldred, Shelley Force; Davison, Laura M; Ogana, Heather Anne N; Boudreau, Andrew; Brüggemann, Marianne; Osborn, Michael; Ma, Biao; Buelow, Benjamin; Clarke, Starlynn C; Dang, Kevin H; Iyer, Suhasini; Jorgensen, Brett; Pham, Duy T; Pratap, Payal P; Rangaswamy, Udaya S; Schellenberger, Ute; van Schooten, Wim C; Ugamraj, Harshad S; Vafa, Omid; Buelow, Roland; Trinklein, Nathan D
2018-01-01
We created a novel transgenic rat that expresses human antibodies comprising a diverse repertoire of heavy chains with a single common rearranged kappa light chain (IgKV3-15-JK1). This fixed light chain animal, called OmniFlic, presents a unique system for human therapeutic antibody discovery and a model to study heavy chain repertoire diversity in the context of a constant light chain. The purpose of this study was to analyze heavy chain variable gene usage, clonotype diversity, and to describe the sequence characteristics of antigen-specific monoclonal antibodies (mAbs) isolated from immunized OmniFlic animals. Using next-generation sequencing antibody repertoire analysis, we measured heavy chain variable gene usage and the diversity of clonotypes present in the lymph node germinal centers of 75 OmniFlic rats immunized with 9 different protein antigens. Furthermore, we expressed 2,560 unique heavy chain sequences sampled from a diverse set of clonotypes as fixed light chain antibody proteins and measured their binding to antigen by ELISA. Finally, we measured patterns and overall levels of somatic hypermutation in the full B-cell repertoire and in the 2,560 mAbs tested for binding. The results demonstrate that OmniFlic animals produce an abundance of antigen-specific antibodies with heavy chain clonotype diversity that is similar to what has been described with unrestricted light chain use in mammals. In addition, we show that sequence-based discovery is a highly effective and efficient way to identify a large number of diverse monoclonal antibodies to a protein target of interest.
Mihaescu, Raluca; Detmar, Symone B; Cornel, Martina C; van der Flier, Wiesje M; Heutink, Peter; Hol, Elly M; Rikkert, Marcel G M Olde; van Duijn, Cornelia M; Janssens, A Cecile J W
2010-01-01
Alzheimer's disease (AD) is the most prevalent form of dementia and the number of cases is expected to increase exponentially worldwide. Three highly penetrant genes (AbetaPP, PSEN1, and PSEN2) explain only a small number of AD cases with a Mendelian transmission pattern. Many genes have been analyzed for association with non-Mendelian AD, but the only consistently replicated finding is APOE. At present, possibilities for prevention, early detection, and treatment of the disease are limited. Predictive and diagnostic genetic testing is available only in Mendelian forms of AD. Currently, APOE genotyping is not considered clinically useful for screening, presymptomatic testing, or clinical diagnosis of non-Mendelian AD. However, clinical management of the disease is expected to benefit from the rapid pace of discoveries in the genomics of AD. Following a recently developed framework for the continuum of translation research that is needed to move genetic discoveries to health applications, this paper reviews recent genetic discoveries as well as translational research on genomic applications in the prevention, early detection, and treatment of AD. The four phases of translation research include: 1) translation of basic genomics research into a potential health care application; 2) evaluation of the application for the development of evidence-based guidelines; 3) evaluation of the implementation and use of the application in health care practice; and 4) evaluation of the achieved population health impact. Most research on genome-based applications in AD is still in the first phase of the translational research framework, which means that further research is still needed before their implementation can be considered.
Hsu, Yi-Hsiang; Zillikens, M Carola; Wilson, Scott G; Farber, Charles R; Demissie, Serkalem; Soranzo, Nicole; Bianchi, Estelle N; Grundberg, Elin; Liang, Liming; Richards, J Brent; Estrada, Karol; Zhou, Yanhua; van Nas, Atila; Moffatt, Miriam F; Zhai, Guangju; Hofman, Albert; van Meurs, Joyce B; Pols, Huibert A P; Price, Roger I; Nilsson, Olle; Pastinen, Tomi; Cupples, L Adrienne; Lusis, Aldons J; Schadt, Eric E; Ferrari, Serge; Uitterlinden, André G; Rivadeneira, Fernando; Spector, Timothy D; Karasik, David; Kiel, Douglas P
2010-06-10
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6x10(-8)), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6x10(-13); SOX6, p = 6.4x10(-10)) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
Genome survey sequencing of red swamp crayfish Procambarus clarkii.
Shi, Linlin; Yi, Shaokui; Li, Yanhe
2018-06-21
Red swamp crayfish, Procambarus clarkii, presently is an important aquatic commercial species in China. The crayfish is a hot area of research focus, and its genetic improvement is quite urgent for the crayfish aquaculture in China. However, the knowledge of its genomic landscape is limited. In this study, a survey of P. clarkii genome was investigated based on Illumina's Solexa sequencing platform. Meanwhile, its genome size was estimated using flow cytometry. Interestingly, the genome size estimated is about 8.50 Gb by flow cytometry and 1.86 Gb with genome survey sequencing. Based on the assembled genome sequences, total of 136,962 genes and 152,268 exons were predicted, and the predicted genes ranged from 150 to 12,807 bp in length. The survey sequences could help accelerate the progress of gene discovery involved in genetic diversity and evolutionary analysis, even though it could not successfully applied for estimation of P. clarkii genome size.
In silico identification of novel ligands for G-quadruplex in the c- MYC promoter
NASA Astrophysics Data System (ADS)
Kang, Hyun-Jin; Park, Hyun-Ju
2015-04-01
G-quadruplex DNA formed in NHEIII1 region of oncogene promoter inhibits transcription of the genes. In this study, virtual screening combining pharmacophore-based search and structure-based docking screening was conducted to discover ligands binding to G-quadruplex in promoter region of c- MYC. Several hit ligands showed the selective PCR-arresting effects for oligonucleotide containing c- MYC G-quadruplex forming sequence. Among them, three hits selectively inhibited cell proliferation and decreased c- MYC mRNA level in Ramos cells, where NHEIII1 is included in translocated c- MYC gene for overexpression. Promoter assay using two kinds of constructs with wild-type and mutant sequences showed that interaction of these ligands with the G-quadruplex resulted in turning-off of the reporter gene. In conclusion, combined virtual screening methods were successfully used for discovery of selective c- MYC promoter G-quadruplex binders with anticancer activity.
Ficklin, Stephen P; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Ficklin, Stephen P.; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance. PMID:23874666
Jung, Ki-Hong; Dardick, Christopher; Bartley, Laura E; Cao, Peijian; Phetsom, Jirapa; Canlas, Patrick; Seo, Young-Su; Shultz, Michael; Ouyang, Shu; Yuan, Qiaoping; Frank, Bryan C; Ly, Eugene; Zheng, Li; Jia, Yi; Hsia, An-Ping; An, Kyungsook; Chou, Hui-Hsien; Rocke, David; Lee, Geun Cheol; Schnable, Patrick S; An, Gynheung; Buell, C Robin; Ronald, Pamela C
2008-10-06
Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.
High-Throughput, Motility-Based Sorter for Microswimmers such as C. elegans
Yuan, Jinzhou; Zhou, Jessie; Raizen, David M.; Bau, Haim H.
2015-01-01
Animal motility varies with genotype, disease, aging, and environmental conditions. In many studies, it is desirable to carry out high throughput motility-based sorting to isolate rare animals for, among other things, forward genetic screens to identify genetic pathways that regulate phenotypes of interest. Many commonly used screening processes are labor-intensive, lack sensitivity, and require extensive investigator training. Here, we describe a sensitive, high throughput, automated, motility-based method for sorting nematodes. Our method is implemented in a simple microfluidic device capable of sorting thousands of animals per hour per module, and is amenable to parallelism. The device successfully enriches for known C. elegans motility mutants. Furthermore, using this device, we isolate low-abundance mutants capable of suppressing the somnogenic effects of the flp-13 gene, which regulates C. elegans sleep. By performing genetic complementation tests, we demonstrate that our motility-based sorting device efficiently isolates mutants for the same gene identified by tedious visual inspection of behavior on an agar surface. Therefore, our motility-based sorter is capable of performing high throughput gene discovery approaches to investigate fundamental biological processes. PMID:26008643
Salem, Saeed; Ozcaglar, Cagri
2014-01-01
Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression. We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.
Delivery of gene silencing agents for breast cancer therapy
2013-01-01
The discovery of RNA interference has opened the door for the development of a new class of cancer therapeutics. Small inhibitory RNA oligos are being designed to specifically suppress expression of proteins that are traditionally considered nondruggable, and microRNAs are being evaluated to exert broad control of gene expression for inhibition of tumor growth. Since most naked molecules are not optimized for in vivo applications, the gene silencing agents need to be packaged into delivery vehicles in order to reach the target tissues as their destinations. Thus, the selection of the right delivery vehicles serves as a crucial step in the development of cancer therapeutics. The current review summarizes the status of gene silencing agents in breast cancer and recent development of candidate cancer drugs in clinical trials. Nanotechnology-based delivery vectors for the formulation and packaging of gene silencing agents are also described. PMID:23659575
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Agent Based Evidence Marshaling: Discovery-Based Enhancement Tools for C2 Systems
2003-12-01
www5conf.inria.fr/fich_html/papers/P5/Overview.html, accessed on 1/15/2001. Dawkins , R., The Selfish Gene , Oxford University Press, Oxford, UK, 1989. Eco, U...Charles S. Peirce and Richard Dawkins argued that ideas can be alive and propagated through human life. Dawkins called these living ideas memes... Dawkins , 1989], while Peirce characterized them as “substantial things” [Buchler, 1955, 340]. This concept and the study of memetics that accompanies it
Cellular Bases of Light-regulated Gravity Responses
NASA Technical Reports Server (NTRS)
Roux, Stanley J.
2003-01-01
This report summarizes the most significant research accomplished in our NAG2-1347 project on the cellular bases of light-regulated gravity responses, It elaborates mainly on our discovery of the role of calcium currents in gravity-directed polar development in single germinating spore cells of the fern Ceratopteris, our development of RNA silencing as a viable method of suppressing the expression of specific genes in Ceratopteris, and on the structure, expression and distribution of members of the annexin family in flowering plants, especially Arabidopsis.
From Saccharomyces cerevisiae to human: The important gene co-expression modules.
Liu, Wei; Li, Li; Ye, Hua; Chen, Haiwei; Shen, Weibiao; Zhong, Yuexian; Tian, Tian; He, Huaqin
2017-08-01
Network-based systems biology has become an important method for analyzing high-throughput gene expression data and gene function mining. Yeast has long been a popular model organism for biomedical research. In the current study, a weighted gene co-expression network analysis algorithm was applied to construct a gene co-expression network in Saccharomyces cerevisiae . Seventeen stable gene co-expression modules were detected from 2,814 S. cerevisiae microarray data. Further characterization of these modules with the Database for Annotation, Visualization and Integrated Discovery tool indicated that these modules were associated with certain biological processes, such as heat response, cell cycle, translational regulation, mitochondrion oxidative phosphorylation, amino acid metabolism and autophagy. Hub genes were also screened by intra-modular connectivity. Finally, the module conservation was evaluated in a human disease microarray dataset. Functional modules were identified in budding yeast, some of which are associated with patient survival. The current study provided a paradigm for single cell microorganisms and potentially other organisms.
Pandey, Udai Bhan
2011-01-01
The common fruit fly, Drosophila melanogaster, is a well studied and highly tractable genetic model organism for understanding molecular mechanisms of human diseases. Many basic biological, physiological, and neurological properties are conserved between mammals and D. melanogaster, and nearly 75% of human disease-causing genes are believed to have a functional homolog in the fly. In the discovery process for therapeutics, traditional approaches employ high-throughput screening for small molecules that is based primarily on in vitro cell culture, enzymatic assays, or receptor binding assays. The majority of positive hits identified through these types of in vitro screens, unfortunately, are found to be ineffective and/or toxic in subsequent validation experiments in whole-animal models. New tools and platforms are needed in the discovery arena to overcome these limitations. The incorporation of D. melanogaster into the therapeutic discovery process holds tremendous promise for an enhanced rate of discovery of higher quality leads. D. melanogaster models of human diseases provide several unique features such as powerful genetics, highly conserved disease pathways, and very low comparative costs. The fly can effectively be used for low- to high-throughput drug screens as well as in target discovery. Here, we review the basic biology of the fly and discuss models of human diseases and opportunities for therapeutic discovery for central nervous system disorders, inflammatory disorders, cardiovascular disease, cancer, and diabetes. We also provide information and resources for those interested in pursuing fly models of human disease, as well as those interested in using D. melanogaster in the drug discovery process. PMID:21415126
Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko
2012-07-15
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.
Sindhu, Annu; Arora, Pooja; Chaudhury, Ashok
2012-07-01
A novel laboratory revolution for disease therapy, the RNA interference (RNAi) technology, has adopted a new era of molecular research as the next generation "Gene-targeted prophylaxis." In this review, we have focused on the chief technological challenges associated with the efforts to develop RNAi-based therapeutics that may guide the biomedical researchers. Many non-curable maladies, like neurodegenerative diseases and cancers have effectively been cured using this technology. Rapid advances are still in progress for the development of RNAi-based technologies that will be having a major impact on medical research. We have highlighted the recent discoveries associated with the phenomenon of RNAi, expression of silencing molecules in mammals along with the vector systems used for disease therapeutics.
Targeting Nonsense Mutations in Diseases with Translational Read-Through-Inducing Drugs (TRIDs).
Nagel-Wolfrum, Kerstin; Möller, Fabian; Penner, Inessa; Baasov, Timor; Wolfrum, Uwe
2016-04-01
In recent years, remarkable advances in the ability to diagnose genetic disorders have been made. The identification of disease-causing genes allows the development of gene-specific therapies with the ultimate goal to develop personalized medicines for each patient according to their own specific genetic defect. In-depth genotyping of many different genes has revealed that ~12% of inherited genetic disorders are caused by in-frame nonsense mutations. Nonsense (non-coding) mutations are caused by point mutations, which generate premature termination codons (PTCs) that cause premature translational termination of the mRNA, and subsequently inhibit normal full-length protein expression. Recently, a gene-based therapeutic approach for genetic diseases caused by nonsense mutations has emerged, namely the so-called translational read-through (TR) therapy. Read-through therapy is based on the discovery that small molecules, known as TR-inducing drugs (TRIDs), allow the translation machinery to suppress a nonsense codon, elongate the nascent peptide chain, and consequently result in the synthesis of full-length protein. Several TRIDs are currently under investigation and research has been performed on several genetic disorders caused by nonsense mutations over the years. These findings have raised hope for the usage of TR therapy as a gene-based pharmacogenetic therapy for nonsense mutations in various genes responsible for a variety of genetic diseases.
High-Throughput, Motility-Based Sorter for Microswimmers and Gene Discovery Platform
NASA Astrophysics Data System (ADS)
Yuan, Jinzhou; Raizen, David; Bau, Haim
2015-11-01
Animal motility varies with genotype, disease progression, aging, and environmental conditions. In many studies, it is desirable to carry out high throughput motility-based sorting to isolate rare animals for, among other things, forward genetic screens to identify genetic pathways that regulate phenotypes of interest. Many commonly used screening processes are labor-intensive, lack sensitivity, and require extensive investigator training. Here, we describe a sensitive, high throughput, automated, motility-based method for sorting nematodes. Our method was implemented in a simple microfluidic device capable of sorting many thousands of animals per hour per module, and is amenable to parallelism. The device successfully enriched for known C. elegans motility mutants. Furthermore, using this device, we isolated low-abundance mutants capable of suppressing the somnogenic effects of the flp-13 gene, which regulates sleep-like quiescence in C. elegans. Subsequent genomic sequencing led to the identification of a flp-13-suppressor gene. This research was supported, in part, by NIH NIA Grant 5R03AG042690-02.
A novel eQTL-based analysis reveals the biology of breast cancer risk loci
Li, Qiyuan; Seo, Ji-Heui; Stranger, Barbara; McKenna, Aaron; Pe'er, Itsik; LaFramboise, Thomas; Brown, Myles; Tyekucheva, Svitlana; Freedman, Matthew L.
2014-01-01
Summary Germline determinants of gene expression in tumors are less studied due to the complexity of transcript regulation caused by somatically acquired alterations. We performed expression quantitative trait locus (eQTL) based analyses using the multi-level information provided in The Cancer Genome Atlas (TCGA). Of the factors we measured, cis-acting eQTL saccounted for 1.2% of the total variation of tumor gene expression, while somatic copy number alteration and CpG methylation accounted for 7.3% and 3.3%, respectively. eQTL analyses of 15 previously reported breast cancer risk loci resulted in discovery of three variants that are significantly associated with transcript levels (FDR<0.1). In a novel trans- based analysis, an additional three risk loci were identified to act through ESR1, MYC, and KLF4. These findings provide a more comprehensive picture of gene expression determinants in breast cancer as well as insights into the underlying biology of breast cancer risk loci. PMID:23374354
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis
Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...
2014-01-01
Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
Pine Gene Discovery Project - Final Report - 08/31/1997 - 02/28/2001
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.
2001-04-30
Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the widermore » scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.« less
Reanalysis of RNA-Sequencing Data Reveals Several Additional Fusion Genes with Multiple Isoforms
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
2012-01-01
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts. PMID:23119097
Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
2012-01-01
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.
Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.
Yip, Shun H; Sham, Pak Chung; Wang, Junwen
2018-02-21
Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.
Integrative analysis of micro-RNA, gene expression, and survival of glioblastoma multiforme.
Huang, Yen-Tsung; Hsu, Thomas; Kelsey, Karl T; Lin, Chien-Ling
2015-02-01
Glioblastoma multiforme (GBM), the most common type of malignant brain tumor, is highly fatal. Limited understanding of its rapid progression necessitates additional approaches that integrate what is known about the genomics of this cancer. Using a discovery set (n = 348) and a validation set (n = 174) of GBM patients, we performed genome-wide analyses that integrated mRNA and micro-RNA expression data from GBM as well as associated survival information, assessing coordinated variability in each as this reflects their known mechanistic functions. Cox proportional hazards models were used for the survival analyses, and nonparametric permutation tests were performed for the micro-RNAs to investigate the association between the number of associated genes and its prognostication. We also utilized mediation analyses for micro-RNA-gene pairs to identify their mediation effects. Genome-wide analyses revealed a novel pattern: micro-RNAs related to more gene expressions are more likely to be associated with GBM survival (P = 4.8 × 10(-5)). Genome-wide mediation analyses for the 32,660 micro-RNA-gene pairs with strong association (false discovery rate [FDR] < 0.01%) identified 51 validated pairs with significant mediation effect. Of the 51 pairs, miR-223 had 16 mediation genes. These 16 mediation genes of miR-223 were also highly associated with various other micro-RNAs and mediated their prognostic effects as well. We further constructed a gene signature using the 16 genes, which was highly associated with GBM survival in both the discovery and validation sets (P = 9.8 × 10(-6)). This comprehensive study discovered mediation effects of micro-RNA to gene expression and GBM survival and provided a new analytic framework for integrative genomics. © 2014 WILEY PERIODICALS, INC.
Huys, Isabelle; Van Overwalle, Geertrui; Matthijs, Gert
2011-01-01
The paper focuses on the fundamental debate that is going on in Europe and the United States about whether genes and genetic diagnostic methods are to be regarded as inventions or subject matter eligible for patent protection, or whether they are discoveries or principles of nature and thus excluded from patentability. The study further explores some possible scenarios of American influences on European patent applications with respect to genetic diagnostic methods. Our analysis points out that patent eligibility for genes and genetic diagnostic methods, as discussed in the United States in the Association of Molecular Pathology versus US Patent and Trademark Office decision, is based on a different reasoning compared with the European Patent Convention. PMID:21654725
Huys, Isabelle; Van Overwalle, Geertrui; Matthijs, Gert
2011-10-01
The paper focuses on the fundamental debate that is going on in Europe and the United States about whether genes and genetic diagnostic methods are to be regarded as inventions or subject matter eligible for patent protection, or whether they are discoveries or principles of nature and thus excluded from patentability. The study further explores some possible scenarios of American influences on European patent applications with respect to genetic diagnostic methods. Our analysis points out that patent eligibility for genes and genetic diagnostic methods, as discussed in the United States in the Association of Molecular Pathology versus US Patent and Trademark Office decision, is based on a different reasoning compared with the European Patent Convention.
Moore, David S
2017-01-01
Why do we grow up to have the traits we do? Most 20th century scientists answered this question by referring only to our genes and our environments. But recent discoveries in the emerging field of behavioral epigenetics have revealed factors at the interface between genes and environments that also play crucial roles in development. These factors affect how genes work; scientists now know that what matters as much as which genes you have (and what environments you encounter) is how your genes are affected by their contexts. The discovery that what our genes do depends in part on our experiences has shed light on how Nature and Nurture interact at the molecular level inside of our bodies. Data emerging from the world's behavioral epigenetics laboratories support the idea that a person's genes alone cannot determine if, for example, he or she will end up shy, suffering from cardiovascular disease, or extremely smart. Among the environmental factors that can influence genetic activity are parenting styles, diets, and social statuses. In addition to influencing how doctors treat diseases, discoveries about behavioral epigenetics are likely to alter how biologists think about evolution, because some epigenetic effects of experience appear to be transmissible from generation to generation. This domain of research will likely change how we think about the origins of human nature. WIREs Syst Biol Med 2017, 9:e1333. doi: 10.1002/wsbm.1333 For further resources related to this article, please visit the WIREs website. © 2016 Wiley Periodicals, Inc.
Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit
2016-03-01
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.
Delineation of metabolic gene clusters in plant genomes by chromatin signatures.
Yu, Nan; Nützmann, Hans-Wilhelm; MacDonald, James T; Moore, Ben; Field, Ben; Berriri, Souha; Trick, Martin; Rosser, Susan J; Kumar, S Vinod; Freemont, Paul S; Osbourn, Anne
2016-03-18
Plants are a tremendous source of diverse chemicals, including many natural product-derived drugs. It has recently become apparent that the genes for the biosynthesis of numerous different types of plant natural products are organized as metabolic gene clusters, thereby unveiling a highly unusual form of plant genome architecture and offering novel avenues for discovery and exploitation of plant specialized metabolism. Here we show that these clustered pathways are characterized by distinct chromatin signatures of histone 3 lysine trimethylation (H3K27me3) and histone 2 variant H2A.Z, associated with cluster repression and activation, respectively, and represent discrete windows of co-regulation in the genome. We further demonstrate that knowledge of these chromatin signatures along with chromatin mutants can be used to mine genomes for cluster discovery. The roles of H3K27me3 and H2A.Z in repression and activation of single genes in plants are well known. However, our discovery of highly localized operon-like co-regulated regions of chromatin modification is unprecedented in plants. Our findings raise intriguing parallels with groups of physically linked multi-gene complexes in animals and with clustered pathways for specialized metabolism in filamentous fungi. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
atBioNet--an integrated network analysis tool for genomics and biomarker discovery.
Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida
2012-07-20
Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
A DNA barcode for land plants.
2009-08-04
DNA barcoding involves sequencing a standard region of DNA as a tool for species identification. However, there has been no agreement on which region(s) should be used for barcoding land plants. To provide a community recommendation on a standard plant barcode, we have compared the performance of 7 leading candidate plastid DNA regions (atpF-atpH spacer, matK gene, rbcL gene, rpoB gene, rpoC1 gene, psbK-psbI spacer, and trnH-psbA spacer). Based on assessments of recoverability, sequence quality, and levels of species discrimination, we recommend the 2-locus combination of rbcL+matK as the plant barcode. This core 2-locus barcode will provide a universal framework for the routine use of DNA sequence data to identify specimens and contribute toward the discovery of overlooked species of land plants.
Hollingsworth, Peter M.; Forrest, Laura L.; Spouge, John L.; Hajibabaei, Mehrdad; Ratnasingham, Sujeevan; van der Bank, Michelle; Chase, Mark W.; Cowan, Robyn S.; Erickson, David L.; Fazekas, Aron J.; Graham, Sean W.; James, Karen E.; Kim, Ki-Joong; Kress, W. John; Schneider, Harald; van AlphenStahl, Jonathan; Barrett, Spencer C.H.; van den Berg, Cassio; Bogarin, Diego; Burgess, Kevin S.; Cameron, Kenneth M.; Carine, Mark; Chacón, Juliana; Clark, Alexandra; Clarkson, James J.; Conrad, Ferozah; Devey, Dion S.; Ford, Caroline S.; Hedderson, Terry A.J.; Hollingsworth, Michelle L.; Husband, Brian C.; Kelly, Laura J.; Kesanakurti, Prasad R.; Kim, Jung Sung; Kim, Young-Dong; Lahaye, Renaud; Lee, Hae-Lim; Long, David G.; Madriñán, Santiago; Maurin, Olivier; Meusnier, Isabelle; Newmaster, Steven G.; Park, Chong-Wook; Percy, Diana M.; Petersen, Gitte; Richardson, James E.; Salazar, Gerardo A.; Savolainen, Vincent; Seberg, Ole; Wilkinson, Michael J.; Yi, Dong-Keun; Little, Damon P.
2009-01-01
DNA barcoding involves sequencing a standard region of DNA as a tool for species identification. However, there has been no agreement on which region(s) should be used for barcoding land plants. To provide a community recommendation on a standard plant barcode, we have compared the performance of 7 leading candidate plastid DNA regions (atpF–atpH spacer, matK gene, rbcL gene, rpoB gene, rpoC1 gene, psbK–psbI spacer, and trnH–psbA spacer). Based on assessments of recoverability, sequence quality, and levels of species discrimination, we recommend the 2-locus combination of rbcL+matK as the plant barcode. This core 2-locus barcode will provide a universal framework for the routine use of DNA sequence data to identify specimens and contribute toward the discovery of overlooked species of land plants. PMID:19666622
A multi-model approach to nucleic acid-based drug development.
Gautherot, Isabelle; Sodoyer, Regís
2004-01-01
With the advent of functional genomics and the shift of interest towards sequence-based therapeutics, the past decades have witnessed intense research efforts on nucleic acid-mediated gene regulation technologies. Today, RNA interference is emerging as a groundbreaking discovery, holding promise for development of genetic modulators of unprecedented potency. Twenty-five years after the discovery of antisense RNA and ribozymes, gene control therapeutics are still facing developmental difficulties, with only one US FDA-approved antisense drug currently available in the clinic. Limited predictability of target site selection models is recognized as one major stumbling block that is shared by all of the so-called complementary technologies, slowing the progress towards a commercial product. Currently employed in vitro systems for target site selection include RNAse H-based mapping, antisense oligonucleotide microarrays, and functional screening approaches using libraries of catalysts with randomized target-binding arms to identify optimal ribozyme/DNAzyme cleavage sites. Individually, each strategy has its drawbacks from a drug development perspective. Utilization of message-modulating sequences as therapeutic agents requires that their action on a given target transcript meets criteria of potency and selectivity in the natural physiological environment. In addition to sequence-dependent characteristics, other factors will influence annealing reactions and duplex stability, as well as nucleic acid-mediated catalysis. Parallel consideration of physiological selection systems thus appears essential for screening for nucleic acid compounds proposed for therapeutic applications. Cellular message-targeting studies face issues relating to efficient nucleic acid delivery and appropriate analysis of response. For reliability and simplicity, prokaryotic systems can provide a rapid and cost-effective means of studying message targeting under pseudo-cellular conditions, but such approaches also have limitations. To streamline nucleic acid drug discovery, we propose a multi-model strategy integrating high-throughput-adapted bacterial screening, followed by reporter-based and/or natural cellular models and potentially also in vitro assays for characterization of the most promising candidate sequences, before final in vivo testing.
Exploitation of Fungal Biodiversity for Discovery of Novel Antibiotics.
Karwehl, Sabrina; Stadler, Marc
Fungi were among the first sources for antibiotics. The discovery and development of the penicillin-type and cephalosporin-type β-lactams and their synthetic versions were transformative in emergence of the modern pharmaceutical industry. They remain some of the most important antibiotics, even 70 years after their discovery. Meanwhile, thousands of fungal metabolites have been discovered, yet these metabolites have only contributed a few additional compounds that have entered clinical development. Substantial expansion in fungal biodiversity assessment along with the availability of modern "-OMICS" technology and revolutionary developments in fungal biotechnology have been made in the last 15 years subsequent to the exit of most of the big Pharma companies from the field of novel antibiotics discovery. Therefore, the timing seems opportune to revisit these fascinating chemically rich organisms as a reservoir of small-molecule templates for lead discovery. This review will describe ongoing interdisciplinary scenarios in which specialists in fungal biology collaborate with chemists, pharmacologists and biochemical and process engineers in order to reveal and make new antibiotics. The utility of a pre-selection process based on phylogenetic data and distribution of secondary metabolite encoding gene cluster will be highlighted. Examples of novel bioactive metabolites from fungi derived from special ecological groups and new phylogenetic lineages will also be discussed.
Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network.
Li, Zhandong; An, Lifeng; Li, Hao; Wang, ShaoPeng; Zhou, You; Yuan, Fei; Li, Lin
2016-05-05
Nasopharyngeal cancer or nasopharyngeal carcinoma (NPC) is the most common cancer originating in the nasopharynx. The factors that induce nasopharyngeal cancer are still not clear. Additional information about the chemicals or genes related to nasopharyngeal cancer will promote a better understanding of the pathogenesis of this cancer and the factors that induce it. Thus, a computational method NPC-RGCP was proposed in this study to identify the possible relevant chemicals and genes based on the presently known chemicals and genes related to nasopharyngeal cancer. To extensively utilize the functional associations between proteins and chemicals, a heterogeneous network was constructed based on interactions of proteins and chemicals. The NPC-RGCP included two stages: the searching stage and the screening stage. The former stage is for finding new possible genes and chemicals in the heterogeneous network, while the latter stage is for screening and removing false discoveries and selecting the core genes and chemicals. As a result, five putative genes, CXCR3, IRF1, CDK1, GSTP1, and CDH2, and seven putative chemicals, iron, propionic acid, dimethyl sulfoxide, isopropanol, erythrose 4-phosphate, β-D-Fructose 6-phosphate, and flavin adenine dinucleotide, were identified by NPC-RGCP. Extensive analyses provided confirmation that the putative genes and chemicals have significant associations with nasopharyngeal cancer.
Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network
Li, Zhandong; An, Lifeng; Li, Hao; Wang, ShaoPeng; Zhou, You; Yuan, Fei; Li, Lin
2016-01-01
Nasopharyngeal cancer or nasopharyngeal carcinoma (NPC) is the most common cancer originating in the nasopharynx. The factors that induce nasopharyngeal cancer are still not clear. Additional information about the chemicals or genes related to nasopharyngeal cancer will promote a better understanding of the pathogenesis of this cancer and the factors that induce it. Thus, a computational method NPC-RGCP was proposed in this study to identify the possible relevant chemicals and genes based on the presently known chemicals and genes related to nasopharyngeal cancer. To extensively utilize the functional associations between proteins and chemicals, a heterogeneous network was constructed based on interactions of proteins and chemicals. The NPC-RGCP included two stages: the searching stage and the screening stage. The former stage is for finding new possible genes and chemicals in the heterogeneous network, while the latter stage is for screening and removing false discoveries and selecting the core genes and chemicals. As a result, five putative genes, CXCR3, IRF1, CDK1, GSTP1, and CDH2, and seven putative chemicals, iron, propionic acid, dimethyl sulfoxide, isopropanol, erythrose 4-phosphate, β-D-Fructose 6-phosphate, and flavin adenine dinucleotide, were identified by NPC-RGCP. Extensive analyses provided confirmation that the putative genes and chemicals have significant associations with nasopharyngeal cancer. PMID:27149165
Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal
2014-01-01
We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709
Synthetic biology approaches: Towards sustainable exploitation of marine bioactive molecules.
Seghal Kiran, G; Ramasamy, Pasiyappazham; Sekar, Sivasankari; Ramu, Meenatchi; Hassan, Saqib; Ninawe, A S; Selvin, Joseph
2018-06-01
The discovery of genes responsible for the production of bioactive metabolites via metabolic pathways combined with the advances in synthetic biology tools, has allowed the establishment of numerous microbial cell factories, for instance the yeast cell factories, for the manufacture of highly useful metabolites from renewable biomass. Genome mining and metagenomics are two platforms provide base-line data for reconstruction of genomes and metabolomes which is based in the development of synthetic/semi-synthetic genomes for marine natural products discovery. Engineered biofilms are being innovated on synthetic biology platform using genetic circuits and cell signalling systems as represillators controlling biofilm formation. Recombineering is a process of homologous recombination mediated genetic engineering, includes insertion, deletion or modification of any sequence specifically. Although this discipline considered new to the scientific domain, this field has now developed as promising endeavor on the accomplishment of sustainable exploitation of marine natural products. Copyright © 2018 Elsevier B.V. All rights reserved.
Decoding the complex genetic causes of heart diseases using systems biology.
Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K
2015-03-01
The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
Applications of Transgenics in Studies of Bone Sialoprotein
Zhang, Jin; Tu, Qisheng; Chen, Jake
2010-01-01
Bone sialoprotein (BSP) is a major non-collagenous protein in mineralizing connective tissues such as dentin, cementum and calcified cartilage tissues. As a member of the SIBLING (Small Integrin-Binding Ligand, N-linked Glycoprotein) gene family of glycoproteins, BSP is involved in regulating hydroxyapatite crystal formation in bones and teeth, and has long been used as a marker gene for osteogenic differentiation. In the most recent decade, new discoveries in BSP gene expression and regulation, bone remodeling, bone metastasis, and bone tissue engineering have been achieved with the help of transgenic mice. In this review, we discuss these new discoveries obtained from the literatures and from our own laboratory, which were derived from the use of transgenic mouse mutants related to BSP gene or its promoter activity. PMID:19326395
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.
Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis
2014-12-01
Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A fuzzy neural network for intelligent data processing
NASA Astrophysics Data System (ADS)
Xie, Wei; Chu, Feng; Wang, Lipo; Lim, Eng Thiam
2005-03-01
In this paper, we describe an incrementally generated fuzzy neural network (FNN) for intelligent data processing. This FNN combines the features of initial fuzzy model self-generation, fast input selection, partition validation, parameter optimization and rule-base simplification. A small FNN is created from scratch -- there is no need to specify the initial network architecture, initial membership functions, or initial weights. Fuzzy IF-THEN rules are constantly combined and pruned to minimize the size of the network while maintaining accuracy; irrelevant inputs are detected and deleted, and membership functions and network weights are trained with a gradient descent algorithm, i.e., error backpropagation. Experimental studies on synthesized data sets demonstrate that the proposed Fuzzy Neural Network is able to achieve accuracy comparable to or higher than both a feedforward crisp neural network, i.e., NeuroRule, and a decision tree, i.e., C4.5, with more compact rule bases for most of the data sets used in our experiments. The FNN has achieved outstanding results for cancer classification based on microarray data. The excellent classification result for Small Round Blue Cell Tumors (SRBCTs) data set is shown. Compared with other published methods, we have used a much fewer number of genes for perfect classification, which will help researchers directly focus their attention on some specific genes and may lead to discovery of deep reasons of the development of cancers and discovery of drugs.
Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources
Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel
2013-01-01
An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964
Perualila-Tan, Nolen Joy; Shkedy, Ziv; Talloen, Willem; Göhlmann, Hinrich W H; Moerbeke, Marijke Van; Kasim, Adetayo
2016-08-01
The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.
The impact of genetics on future drug discovery in schizophrenia.
Matsumoto, Mitsuyuki; Walton, Noah M; Yamada, Hiroshi; Kondo, Yuji; Marek, Gerard J; Tajinda, Katsunori
2017-07-01
Failures of investigational new drugs (INDs) for schizophrenia have left huge unmet medical needs for patients. Given the recent lackluster results, it is imperative that new drug discovery approaches (and resultant drug candidates) target pathophysiological alterations that are shared in specific, stratified patient populations that are selected based on pre-identified biological signatures. One path to implementing this paradigm is achievable by leveraging recent advances in genetic information and technologies. Genome-wide exome sequencing and meta-analysis of single nucleotide polymorphism (SNP)-based association studies have already revealed rare deleterious variants and SNPs in patient populations. Areas covered: Herein, the authors review the impact that genetics have on the future of schizophrenia drug discovery. The high polygenicity of schizophrenia strongly indicates that this disease is biologically heterogeneous so the identification of unique subgroups (by patient stratification) is becoming increasingly necessary for future investigational new drugs. Expert opinion: The authors propose a pathophysiology-based stratification of genetically-defined subgroups that share deficits in particular biological pathways. Existing tools, including lower-cost genomic sequencing and advanced gene-editing technology render this strategy ever more feasible. Genetically complex psychiatric disorders such as schizophrenia may also benefit from synergistic research with simpler monogenic disorders that share perturbations in similar biological pathways.
Discovery of novel bacterial toxins by genomics and computational biology.
Doxey, Andrew C; Mansfield, Michael J; Montecucco, Cesare
2018-06-01
Hundreds and hundreds of bacterial protein toxins are presently known. Traditionally, toxin identification begins with pathological studies of bacterial infectious disease. Following identification and cultivation of a bacterial pathogen, the protein toxin is purified from the culture medium and its pathogenic activity is studied using the methods of biochemistry and structural biology, cell biology, tissue and organ biology, and appropriate animal models, supplemented by bioimaging techniques. The ongoing and explosive development of high-throughput DNA sequencing and bioinformatic approaches have set in motion a revolution in many fields of biology, including microbiology. One consequence is that genes encoding novel bacterial toxins can be identified by bioinformatic and computational methods based on previous knowledge accumulated from studies of the biology and pathology of thousands of known bacterial protein toxins. Starting from the paradigmatic cases of diphtheria toxin, tetanus and botulinum neurotoxins, this review discusses traditional experimental approaches as well as bioinformatics and genomics-driven approaches that facilitate the discovery of novel bacterial toxins. We discuss recent work on the identification of novel botulinum-like toxins from genera such as Weissella, Chryseobacterium, and Enteroccocus, and the implications of these computationally identified toxins in the field. Finally, we discuss the promise of metagenomics in the discovery of novel toxins and their ecological niches, and present data suggesting the existence of uncharacterized, botulinum-like toxin genes in insect gut metagenomes. Copyright © 2018. Published by Elsevier Ltd.
Mammalian polycistronic mRNAs and disease
Karginov, Timofey A.; Hejazi Pastor, Daniel Parviz; Semler, Bert L.; Gomez, Christopher M.
2016-01-01
Our understanding of gene expression has come far since the “one-gene one-polypeptide” hypothesis proposed by Beadle and Tatum. This review addresses the gradual recognition that a growing number of polycistronic genes, originally discovered in viruses, are being identified within the mammalian genome, and that these may provide new insights into disease mechanisms and treatment. We have carried out a systematic literature review identifying 13 mammalian genes for which there is evidence for polycistronic expression via translation through an Internal Ribosome Entry Site (IRES). Although the canonical mechanism of translation initiation has been studied extensively, this review highlights a process of non-canonical translation, IRES-mediated translation, that is a growing source of understanding complex inheritance, elucidation of disease mechanisms, and discovery of novel therapeutic targets. Identification of additional polycistronic genes may provide new insights into disease therapy and allow for new discoveries of translational and disease mechanisms. PMID:28012572
Bonfiglio, F; Henström, M; Nag, A; Hadizadeh, F; Zheng, T; Cenit, M C; Tigchelaar, E; Williams, F; Reznichenko, A; Ek, W E; Rivera, N V; Homuth, G; Aghdassi, A A; Kacprowski, T; Männikkö, M; Karhunen, V; Bujanda, L; Rafter, J; Wijmenga, C; Ronkainen, J; Hysi, P; Zhernakova, A; D'Amato, M
2018-04-19
Irritable bowel syndrome (IBS) shows genetic predisposition, however, large-scale, powered gene mapping studies are lacking. We sought to exploit existing genetic (genotype) and epidemiological (questionnaire) data from a series of population-based cohorts for IBS genome-wide association studies (GWAS) and their meta-analysis. Based on questionnaire data compatible with Rome III Criteria, we identified a total of 1335 IBS cases and 9768 asymptomatic individuals from 5 independent European genotyped cohorts. Individual GWAS were carried out with sex-adjusted logistic regression under an additive model, followed by meta-analysis using the inverse variance method. Functional annotation of significant results was obtained via a computational pipeline exploiting ontology and interaction networks, and tissue-specific and gene set enrichment analyses. Suggestive GWAS signals (P ≤ 5.0 × 10 -6 ) were detected for 7 genomic regions, harboring 64 gene candidates to affect IBS risk via functional or expression changes. Functional annotation of this gene set convincingly (best FDR-corrected P = 3.1 × 10 -10 ) highlighted regulation of ion channel activity as the most plausible pathway affecting IBS risk. Our results confirm the feasibility of population-based studies for gene-discovery efforts in IBS, identify risk genes and loci to be prioritized in independent follow-ups, and pinpoint ion channels as important players and potential therapeutic targets warranting further investigation. © 2018 John Wiley & Sons Ltd.
Text mining patents for biomedical knowledge.
Rodriguez-Esteban, Raul; Bundschus, Markus
2016-06-01
Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ramharack, Pritika; Soliman, Mahmoud E S
2018-06-01
Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.
2011-01-01
Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:21401935
Characterization of a Genomic Signature of Pregnancy in the Breast
Belitskaya-Lévy, Ilana; Zeleniuch-Jacquotte, Anne; Russo, Jose; Russo, Irma H.; Bordás, Pal; Åhman, Janet; Afanasyeva, Yelena; Johansson, Robert; Lenner, Per; Li, Xiaochun; de Cicco, Ricardo López; Peri, Suraj; Ross, Eric; Russo, Patricia A.; Santucci-Pereira, Julia; Sheriff, Fathima S.; Slifker, Michael; Hallmans, Göran; Toniolo, Paolo; Arslan, Alan A.
2012-01-01
The objective of the current study was to comprehensively compare the genomic profiles in the breast of parous and nulliparous postmenopausal women to identify genes that permanently change their expression following pregnancy. The study was designed as a two-phase approach. In the discovery phase, we compared breast genomic profiles of 37 parous with 18 nulliparous postmenopausal women. In the validation phase, confirmation of the genomic patterns observed in the discovery phase was sought in an independent set of 30 parous and 22 nulliparous postmenopausal women. RNA was hybridized to Affymetrix HG_U133 Plus 2.0 oligonucleotide arrays containing probes to 54,675 transcripts; scanned and the images analyzed using Affymetrix GCOS software. Surrogate variable analysis, logistic regression and significance analysis for microarrays were used to identify statistically significant differences in expression of genes. The False Discovery Rate (FDR) approach was used to control for multiple comparisons. We found that 208 genes (305 probe sets) were differentially expressed between parous and nulliparous women in both discovery and validation phases of the study at a FDR of 10% and with at least a 1.25-fold change. These genes are involved in regulation of transcription, centrosome organization, RNA splicing, cell cycle control, adhesion and differentiation. The results provide persuasive evidence that full-term pregnancy induces long-term genomic changes in the breast. The genomic signature of pregnancy could be used as an intermediate marker to assess potential chemopreventive interventions with hormones mimicking the effects of pregnancy for prevention of breast cancer. PMID:21622728
MAGIC database and interfaces: an integrated package for gene discovery and expression.
Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H
2004-01-01
The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.
Pehkonen, Petri; Wong, Garry; Törönen, Petri
2010-01-01
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
The PhytoClust tool for metabolic gene clusters discovery in plant genomes
Fuchs, Lisa-Maria
2017-01-01
Abstract The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. PMID:28486689
The PhytoClust tool for metabolic gene clusters discovery in plant genomes.
Töpfer, Nadine; Fuchs, Lisa-Maria; Aharoni, Asaph
2017-07-07
The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers
Mann, Michael B
2018-01-01
Abstract Large-scale oncogenomic studies have identified few frequently mutated cancer drivers and hundreds of infrequently mutated drivers. Defining the biological context for rare driving events is fundamentally important to increasing our understanding of the druggable pathways in cancer. Sleeping Beauty (SB) insertional mutagenesis is a powerful gene discovery tool used to model human cancers in mice. Our lab and others have published a number of studies that identify cancer drivers from these models using various statistical and computational approaches. Here, we have integrated SB data from primary tumor models into an analysis and reporting framework, the Sleeping Beauty Cancer Driver DataBase (SBCDDB, http://sbcddb.moffitt.org), which identifies drivers in individual tumors or tumor populations. Unique to this effort, the SBCDDB utilizes a single, scalable, statistical analysis method that enables data to be grouped by different biological properties. This allows for SB drivers to be evaluated (and re-evaluated) under different contexts. The SBCDDB provides visual representations highlighting the spatial attributes of transposon mutagenesis and couples this functionality with analysis of gene sets, enabling users to interrogate relationships between drivers. The SBCDDB is a powerful resource for comparative oncogenomic analyses with human cancer genomics datasets for driver prioritization. PMID:29059366
Ma, Sisi; Kemmeren, Patrick; Aliferis, Constantin F.; Statnikov, Alexander
2016-01-01
Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methods’ performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost. PMID:26939894
Functional Interaction Network Construction and Analysis for Disease Discovery.
Wu, Guanming; Haw, Robin
2017-01-01
Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
Goddard, Katrina A B; Tromp, Gerard; Romero, Roberto; Olson, Jane M; Lu, Qing; Xu, Zhiying; Parimi, Neeta; Nien, Jyh Kae; Gomez, Ricardo; Behnke, Ernesto; Solari, Margarita; Espinoza, Jimmy; Santolaya, Joaquin; Chaiworapongsa, Tinnakorn; Lenk, Guy M; Volkenant, Kimberly; Anant, Madan Kumar; Salisbury, Benjamin A; Carr, Janet; Lee, Min Soeb; Vovis, Gerald F; Kuivaniemi, Helena
2007-01-01
Pre-eclampsia (PE) affects 5-7% of pregnancies in the US, and is a leading cause of maternal death and perinatal morbidity and mortality worldwide. To identify genes with a role in PE, we conducted a large-scale association study evaluating 775 SNPs in 190 candidate genes selected for a potential role in obstetrical complications. SNP discovery was performed by DNA sequencing, and genotyping was carried out in a high-throughput facility using the MassARRAY(TM) System. Women with PE (n = 394) and their offspring (n = 324) were compared with control women (n = 602) and their offspring (n = 631) from the same hospital-based population. Haplotypes were estimated for each gene using the EM algorithm, and empirical p values were obtained for a logistic regression-based score test, adjusted for significant covariates. An interaction model between maternal and offspring genotypes was also evaluated. The most significant findings for association with PE were COL1A1 (p = 0.0011) and IL1A (p = 0.0014) for the maternal genotype, and PLAUR (p = 0.0008) for the offspring genotype. Common candidate genes for PE, including MTHFR and NOS3, were not significantly associated with PE. For the interaction model, SNPs within IGF1 (p = 0.0035) and IL4R (p = 0.0036) gave the most significant results. This study is one of the most comprehensive genetic association studies of PE to date, including an evaluation of offspring genotypes that have rarely been considered in previous studies. Although we did not identify statistically significant evidence of association for any of the candidate loci evaluated here after adjusting for multiple testing using the false discovery rate, additional compelling evidence exists, including multiple SNPs with nominally significant p values in COL1A1 and the IL1A region, and previous reports of association for IL1A, to support continued interest in these genes as candidates for PE. Identification of the genetic regulators of PE may have broader implications, since women with PE are at increased risk of death from cardiovascular diseases later in life.
Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar
2007-01-01
MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813
Recent Advancement of the Molecular Diagnosis in Pediatric Brain Tumor.
Bae, Jeong-Mo; Won, Jae-Kyung; Park, Sung-Hye
2018-05-01
Recent discoveries of brain tumor-related genes and fast advances in genomic testing technologies have led to the era of molecular diagnosis of brain tumor. Molecular profiling of brain tumor became the significant step in the diagnosis, the prediction of prognosis and the treatment of brain tumor. Because traditional molecular testing methods have limitations in time and cost for multiple gene tests, next-generation sequencing technologies are rapidly introduced into clinical practice. Targeted sequencing panels using these technologies have been developed for brain tumors. In this article, focused on pediatric brain tumor, key discoveries of brain tumor-related genes are reviewed and cancer panels used in the molecular profiling of brain tumor are discussed.
Recent Advancement of the Molecular Diagnosis in Pediatric Brain Tumor
Bae, Jeong-Mo; Won, Jae-Kyung; Park, Sung-Hye
2018-01-01
Recent discoveries of brain tumor-related genes and fast advances in genomic testing technologies have led to the era of molecular diagnosis of brain tumor. Molecular profiling of brain tumor became the significant step in the diagnosis, the prediction of prognosis and the treatment of brain tumor. Because traditional molecular testing methods have limitations in time and cost for multiple gene tests, next-generation sequencing technologies are rapidly introduced into clinical practice. Targeted sequencing panels using these technologies have been developed for brain tumors. In this article, focused on pediatric brain tumor, key discoveries of brain tumor-related genes are reviewed and cancer panels used in the molecular profiling of brain tumor are discussed. PMID:29742887
Interactions between genetic background, insulin resistance and β-cell function.
Kahn, S E; Suvag, S; Wright, L A; Utzschneider, K M
2012-10-01
An interaction between genes and the environment is a critical component underlying the pathogenesis of the hyperglycaemia of type 2 diabetes. The development of more sophisticated techniques for studying gene variants and for analysing genetic data has led to the discovery of some 40 genes associated with type 2 diabetes. Most of these genes are related to changes in β-cell function, with a few associated with decreased insulin sensitivity and obesity. Interestingly, using quantitative traits based on continuous measures rather than dichotomous ones, it has become evident that not all genes associated with changes in fasting or post-prandial glucose are also associated with a diagnosis of type 2 diabetes. Identification of these gene variants has provided novel insights into the physiology and pathophysiology of the β-cell, including the identification of molecules involved in β-cell function that were not previously recognized as playing a role in this critical cell. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.
Yamamoto, Tomoko; Hiroi, Atsuko; Osawa, Makiko; Shibata, Noriyuki
2014-01-01
The muscular dystrophies have been traditionally classified based mainly on clinical manifestation and mode of inheritance. Owing to the discoveries of causative genes, new terminologies derived from each gene, such as dystrophinopathy, α-dystroglycanopathy, sarcoglycanopathy and fukutinopathy, have also become common. Mutations of each gene may cause several clinical phenotypes. Some muscular dystrophies accompany central nervous system (CNS) lesions, especially in the congenital muscular dystrophies. Cobblestone lissencephaly (type II lissencephaly) is a well-known CNS malformation observed in severe forms of α-dystroglycanopathy. Moreover, CNS involvement has been reported in other muscular dystrophies, such as Duchenne muscular dystrophy. In this review, genes related to the muscular dystrophies associated with CNS lesions are briefly described along with the molecular characteristics of each gene and the pathomechanism of the CNS lesions. Understanding of both the clinicopathological characteristics of these CNS lesions and their molecular mechanisms is important for the diagnosis, care of patients, and development of new therapeutic strategies.
NASA Astrophysics Data System (ADS)
Ehler, Martin; Rajapakse, Vinodh; Zeeberg, Barry; Brooks, Brian; Brown, Jacob; Czaja, Wojciech; Bonner, Robert F.
The gene networks underlying closure of the optic fissure during vertebrate eye development are poorly understood. We used a novel clustering method based on Laplacian Eigenmaps, a nonlinear dimension reduction method, to analyze microarray data from laser capture microdissected (LCM) cells at the site and developmental stages (days 10.5 to 12.5) of optic fissure closure. Our new method provided greater biological specificity than classical clustering algorithms in terms of identifying more biological processes and functions related to eye development as defined by Gene Ontology at lower false discovery rates. This new methodology builds on the advantages of LCM to isolate pure phenotypic populations within complex tissues and allows improved ability to identify critical gene products expressed at lower copy number. The combination of LCM of embryonic organs, gene expression microarrays, and extracting spatial and temporal co-variations appear to be a powerful approach to understanding the gene regulatory networks that specify mammalian organogenesis.
SNP discovery in the bovine milk transcriptome using RNA-Seq technology.
Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F
2010-12-01
High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.
Optimal selection of markers for validation or replication from genome-wide association studies.
Greenwood, Celia M T; Rangrej, Jagadish; Sun, Lei
2007-07-01
With reductions in genotyping costs and the fast pace of improvements in genotyping technology, it is not uncommon for the individuals in a single study to undergo genotyping using several different platforms, where each platform may contain different numbers of markers selected via different criteria. For example, a set of cases and controls may be genotyped at markers in a small set of carefully selected candidate genes, and shortly thereafter, the same cases and controls may be used for a genome-wide single nucleotide polymorphism (SNP) association study. After such initial investigations, often, a subset of "interesting" markers is selected for validation or replication. Specifically, by validation, we refer to the investigation of associations between the selected subset of markers and the disease in independent data. However, it is not obvious how to choose the best set of markers for this validation. There may be a prior expectation that some sets of genotyping data are more likely to contain real associations. For example, it may be more likely for markers in plausible candidate genes to show disease associations than markers in a genome-wide scan. Hence, it would be desirable to select proportionally more markers from the candidate gene set. When a fixed number of markers are selected for validation, we propose an approach for identifying an optimal marker-selection configuration by basing the approach on minimizing the stratified false discovery rate. We illustrate this approach using a case-control study of colorectal cancer from Ontario, Canada, and we show that this approach leads to substantial reductions in the estimated false discovery rates in the Ontario dataset for the selected markers, as well as reductions in the expected false discovery rates for the proposed validation dataset. Copyright 2007 Wiley-Liss, Inc.
Discovery of new enzymes and metabolic pathways by using structure and genome context.
Zhao, Suwen; Kumar, Ritesh; Sakai, Ayano; Vetting, Matthew W; Wood, B McKay; Brown, Shoshana; Bonanno, Jeffery B; Hillerich, Brandan S; Seidel, Ronald D; Babbitt, Patricia C; Almo, Steven C; Sweedler, Jonathan V; Gerlt, John A; Cronan, John E; Jacobson, Matthew P
2013-10-31
Assigning valid functions to proteins identified in genome projects is challenging: overprediction and database annotation errors are the principal concerns. We and others are developing computation-guided strategies for functional discovery with 'metabolite docking' to experimentally derived or homology-based three-dimensional structures. Bacterial metabolic pathways often are encoded by 'genome neighbourhoods' (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by 'predicting' the intermediates in the glycolytic pathway in Escherichia coli. Metabolite docking to multiple binding proteins and enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. Here we report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and also the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt concentrations was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guided functional predictions to enable the discovery of new metabolic pathways.
Dystonia: an update on phenomenology, classification, pathogenesis and treatment.
Balint, Bettina; Bhatia, Kailash P
2014-08-01
This article will highlight recent advances in dystonia with focus on clinical aspects such as the new classification, syndromic approach, new gene discoveries and genotype-phenotype correlations. Broadening of phenotype of some of the previously described hereditary dystonias and environmental risk factors and trends in treatment will be covered. Based on phenomenology, a new consensus update on the definition, phenomenology and classification of dystonia and a syndromic approach to guide diagnosis have been proposed. Terminology has changed and 'isolated dystonia' is used wherein dystonia is the only motor feature apart from tremor, and the previously called heredodegenerative dystonias and dystonia plus syndromes are now subsumed under 'combined dystonia'. The recently discovered genes ANO3, GNAL and CIZ1 appear not to be a common cause of adult-onset cervical dystonia. Clinical and genetic heterogeneity underlie myoclonus-dystonia, dopa-responsive dystonia and deafness-dystonia syndrome. ALS2 gene mutations are a newly recognized cause for combined dystonia. The phenotypic and genotypic spectra of ATP1A3 mutations have considerably broadened. Two new genome-wide association studies identified new candidate genes. A retrospective analysis suggested complicated vaginal delivery as a modifying risk factor in DYT1. Recent studies confirm lasting therapeutic effects of deep brain stimulation in isolated dystonia, good treatment response in myoclonus-dystonia, and suggest that early treatment correlates with a better outcome. Phenotypic classification continues to be important to recognize particular forms of dystonia and this includes syndromic associations. There are a number of genes underlying isolated or combined dystonia and there will be further new discoveries with the advances in genetic technologies such as exome and whole-genome sequencing. The identification of new genes will facilitate better elucidation of pathogenetic mechanisms and possible corrective therapies.
Smýkal, Petr; K Varshney, Rajeev; K Singh, Vikas; Coyne, Clarice J; Domoney, Claire; Kejnovský, Eduard; Warkentin, Thomas
2016-12-01
This work discusses several selected topics of plant genetics and breeding in relation to the 150th anniversary of the seminal work of Gregor Johann Mendel. In 2015, we celebrated the 150th anniversary of the presentation of the seminal work of Gregor Johann Mendel. While Darwin's theory of evolution was based on differential survival and differential reproductive success, Mendel's theory of heredity relies on equality and stability throughout all stages of the life cycle. Darwin's concepts were continuous variation and "soft" heredity; Mendel espoused discontinuous variation and "hard" heredity. Thus, the combination of Mendelian genetics with Darwin's theory of natural selection was the process that resulted in the modern synthesis of evolutionary biology. Although biology, genetics, and genomics have been revolutionized in recent years, modern genetics will forever rely on simple principles founded on pea breeding using seven single gene characters. Purposeful use of mutants to study gene function is one of the essential tools of modern genetics. Today, over 100 plant species genomes have been sequenced. Mapping populations and their use in segregation of molecular markers and marker-trait association to map and isolate genes, were developed on the basis of Mendel's work. Genome-wide or genomic selection is a recent approach for the development of improved breeding lines. The analysis of complex traits has been enhanced by high-throughput phenotyping and developments in statistical and modeling methods for the analysis of phenotypic data. Introgression of novel alleles from landraces and wild relatives widens genetic diversity and improves traits; transgenic methodologies allow for the introduction of novel genes from diverse sources, and gene editing approaches offer possibilities to manipulate gene in a precise manner.
Toxicogenomics concepts and applications to study hepatic effects of food additives and chemicals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stierum, Rob; Heijne, Wilbert; Kienhuis, Anne
2005-09-01
Transcriptomics, proteomics and metabolomics are genomics technologies with great potential in toxicological sciences. Toxicogenomics involves the integration of conventional toxicological examinations with gene, protein or metabolite expression profiles. An overview together with selected examples of the possibilities of genomics in toxicology is given. The expectations raised by toxicogenomics are earlier and more sensitive detection of toxicity. Furthermore, toxicogenomics will provide a better understanding of the mechanism of toxicity and may facilitate the prediction of toxicity of unknown compounds. Mechanism-based markers of toxicity can be discovered and improved interspecies and in vitro-in vivo extrapolations will drive model developments in toxicology. Toxicologicalmore » assessment of chemical mixtures will benefit from the new molecular biological tools. In our laboratory, toxicogenomics is predominantly applied for elucidation of mechanisms of action and discovery of novel pathway-supported mechanism-based markers of liver toxicity. In addition, we aim to integrate transcriptome, proteome and metabolome data, supported by bioinformatics to develop a systems biology approach for toxicology. Transcriptomics and proteomics studies on bromobenzene-mediated hepatotoxicity in the rat are discussed. Finally, an example is shown in which gene expression profiling together with conventional biochemistry led to the discovery of novel markers for the hepatic effects of the food additives butylated hydroxytoluene, curcumin, propyl gallate and thiabendazole.« less
Handa, Koichi; Nakagome, Izumi; Yamaotsu, Noriyuki; Gouda, Hiroaki; Hirono, Shuichi
2015-01-01
The pregnane X receptor [PXR (NR1I2)] induces the expression of xenobiotic metabolic genes and transporter genes. In this study, we aimed to establish a computational method for quantifying the enzyme-inducing potencies of different compounds via their ability to activate PXR, for the application in drug discovery and development. To achieve this purpose, we developed a three-dimensional quantitative structure-activity relationship (3D-QSAR) model using comparative molecular field analysis (CoMFA) for predicting enzyme-inducing potencies, based on computer-ligand docking to multiple PXR protein structures sampled from the trajectory of a molecular dynamics simulation. Molecular mechanics-generalized born/surface area scores representing the ligand-protein-binding free energies were calculated for each ligand. As a result, the predicted enzyme-inducing potencies for compounds generated by the CoMFA model were in good agreement with the experimental values. Finally, we concluded that this 3D-QSAR model has the potential to predict the enzyme-inducing potencies of novel compounds with high precision and therefore has valuable applications in the early stages of the drug discovery process. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.
Vatanparast, Mohammad; Powell, Adrian; Doyle, Jeff J; Egan, Ashley N
2018-03-01
The development of pipelines for locus discovery has spurred the use of target enrichment for plant phylogenomics. However, few studies have compared pipelines from locus discovery and bait design, through validation, to tree inference. We compared three methods within Leguminosae (Fabaceae) and present a workflow for future efforts. Using 30 transcriptomes, we compared Hyb-Seq, MarkerMiner, and the Yang and Smith (Y&S) pipelines for locus discovery, validated 7501 baits targeting 507 loci across 25 genera via Illumina sequencing, and inferred gene and species trees via concatenation- and coalescent-based methods. Hyb-Seq discovered loci with the longest mean length. MarkerMiner discovered the most conserved loci with the least flagged as paralogous. Y&S offered the most parsimony-informative sites and putative orthologs. Target recovery averaged 93% across taxa. We optimized our targeted locus set based on a workflow designed to minimize paralog/ortholog conflation and thus present 423 loci for legume phylogenomics. Methods differed across criteria important for phylogenetic marker development. We recommend Hyb-Seq as a method that may be useful for most phylogenomic projects. Our targeted locus set is a resource for future, community-driven efforts to reconstruct the legume tree of life.
Manivannan, Abinaya; Kim, Jin-Hee; Yang, Eun-Young; Ahn, Yul-Kyun; Lee, Eun-Su; Choi, Sena; Kim, Do-Sun
2018-01-01
Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS) approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP) indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.
An RNA-Seq-based reference transcriptome for Citrus.
Terol, Javier; Tadeo, Francisco; Ventimilla, Daniel; Talon, Manuel
2016-03-01
Previous RNA-Seq studies in citrus have been focused on physiological processes relevant to fruit quality and productivity of the major species, especially sweet orange. Less attention has been paid to vegetative or reproductive tissues, while most Citrus species have never been analysed. In this work, we characterized the transcriptome of vegetative and reproductive tissues from 12 Citrus species from all main phylogenetic groups. Our aims were to acquire a complete view of the citrus transcriptome landscape, to improve previous functional annotations and to obtain genetic markers associated with genes of agronomic interest. 28 samples were used for RNA-Seq analysis, obtained from 12 Citrus species: C. medica, C. aurantifolia, C. limon, C. bergamia, C. clementina, C. deliciosa, C. reshni, C. maxima, C. paradisi, C. aurantium, C. sinensis and Poncirus trifoliata. Four different organs were analysed: root, phloem, leaf and flower. A total of 3421 million Illumina reads were produced and mapped against the reference C. clementina genome sequence. Transcript discovery pipeline revealed 3326 new genes, the number of genes with alternative splicing was increased to 19,739, and a total of 73,797 transcripts were identified. Differential expression studies between the four tissues showed that gene expression is overall related to the physiological function of the specific organs above any other variable. Variants discovery analysis revealed the presence of indels and SNPs in genes associated with fruit quality and productivity. Pivotal pathways in citrus such as those of flavonoids, flavonols, ethylene and auxin were also analysed in detail. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.
Li, Qianqian; Liu, Jianguo; Zhang, Litao; Liu, Qian
2014-01-01
Background Algae in the order Trentepohliales have a broad geographic distribution and are generally characterized by the presence of abundant β-carotene. The many monographs published to date have mainly focused on their morphology, taxonomy, phylogeny, distribution and reproduction; molecular studies of this order are still rare. High-throughput RNA sequencing (RNA-Seq) technology provides a powerful and efficient method for transcript analysis and gene discovery in Trentepohlia jolithus. Methods/Principal Findings Illumina HiSeq 2000 sequencing generated 55,007,830 Illumina PE raw reads, which were assembled into 41,328 assembled unigenes. Based on NR annotation, 53.28% of the unigenes (22,018) could be assigned to gene ontology classes with 54 subcategories and 161,451 functional terms. A total of 26,217 (63.44%) assembled unigenes were mapped to 128 KEGG pathways. Furthermore, a set of 5,798 SSRs in 5,206 unigenes and 131,478 putative SNPs were identified. Moreover, the fact that all of the C4 photosynthesis genes exist in T. jolithus suggests a complex carbon acquisition and fixation system. Similarities and differences between T. jolithus and other algae in carotenoid biosynthesis are also described in depth. Conclusions/Significance This is the first broad transcriptome survey for T. jolithus, increasing the amount of molecular data available for the class Ulvophyceae. As well as providing resources for functional genomics studies, the functional genes and putative pathways identified here will contribute to a better understanding of carbon fixation and fatty acid and carotenoid biosynthesis in T. jolithus. PMID:25254555
Warehousing re-annotated cancer genes for biomarker meta-analysis.
Orsini, M; Travaglione, A; Capobianco, E
2013-07-01
Translational research in cancer genomics assigns a fundamental role to bioinformatics in support of candidate gene prioritization with regard to both biomarker discovery and target identification for drug development. Efforts in both such directions rely on the existence and constant update of large repositories of gene expression data and omics records obtained from a variety of experiments. Users who interactively interrogate such repositories may have problems in retrieving sample fields that present limited associated information, due for instance to incomplete entries or sometimes unusable files. Cancer-specific data sources present similar problems. Given that source integration usually improves data quality, one of the objectives is keeping the computational complexity sufficiently low to allow an optimal assimilation and mining of all the information. In particular, the scope of integrating intraomics data can be to improve the exploration of gene co-expression landscapes, while the scope of integrating interomics sources can be that of establishing genotype-phenotype associations. Both integrations are relevant to cancer biomarker meta-analysis, as the proposed study demonstrates. Our approach is based on re-annotating cancer-specific data available at the EBI's ArrayExpress repository and building a data warehouse aimed to biomarker discovery and validation studies. Cancer genes are organized by tissue with biomedical and clinical evidences combined to increase reproducibility and consistency of results. For better comparative evaluation, multiple queries have been designed to efficiently address all types of experiments and platforms, and allow for retrieval of sample-related information, such as cell line, disease state and clinical aspects. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Discovery of new candidate genes related to brain development using protein interaction information.
Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong
2015-01-01
Human brain development is a dramatic process composed of a series of complex and fine-tuned spatiotemporal gene expressions. A good comprehension of this process can assist us in developing the potential of our brain. However, we have only limited knowledge about the genes and gene functions that are involved in this biological process. Therefore, a substantial demand remains to discover new brain development-related genes and identify their biological functions. In this study, we aimed to discover new brain-development related genes by building a computational method. We referred to a series of computational methods used to discover new disease-related genes and developed a similar method. In this method, the shortest path algorithm was executed on a weighted graph that was constructed using protein-protein interactions. New candidate genes fell on at least one of the shortest paths connecting two known genes that are related to brain development. A randomization test was then adopted to filter positive discoveries. Of the final identified genes, several have been reported to be associated with brain development, indicating the effectiveness of the method, whereas several of the others may have potential roles in brain development.
Huang, Hung-Chung; Jupiter, Daniel; VanBuren, Vincent
2010-01-01
Background Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. Methodology/Principal Findings In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as ‘brain group’ and ‘non-brain group’; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. Conclusions/Significance The methodology employed here may be used to facilitate disease-specific biomarker discovery. PMID:20140228
Generation of transgenic mouse model using PTTG as an oncogene.
Kakar, Sham S; Kakar, Cohin
2015-01-01
The close physiological similarity between the mouse and human has provided tools to understanding the biological function of particular genes in vivo by introduction or deletion of a gene of interest. Using a mouse as a model has provided a wealth of resources, knowledge, and technology, helping scientists to understand the biological functions, translocation, trafficking, and interaction of a candidate gene with other intracellular molecules, transcriptional regulation, posttranslational modification, and discovery of novel signaling pathways for a particular gene. Most importantly, the generation of the mouse model for a specific human disease has provided a powerful tool to understand the etiology of a disease and discovery of novel therapeutics. This chapter describes in detail the step-by-step generation of the transgenic mouse model, which can be helpful in guiding new investigators in developing successful models. For practical purposes, we will describe the generation of a mouse model using pituitary tumor transforming gene (PTTG) as the candidate gene of interest.
Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M
2013-10-01
The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Wu, Yue; Jiang, Zhensheng; Li, Zhihong; Gu, Jing; You, Qi-Dong; Zhang, Xiaojin
2018-06-01
As a gene associated with anemia, the erythropoiesis gene is physiologically expressed under hypoxia regulated by hypoxia-inducing factor-α (HIF-α). Thus, stabilizing HIF-α is a potent strategy to stimulate the expression and secretion of erythropoiesis. In this study we applied click chemistry to the discovery of HIF prolyl hydroxylase 2 (HIF-PHD2) inhibitors for the first time and a series of triazole compounds showed preferable inhibitory activity in fluorescence polarization assay. Of particular note was the orally active HIF-PHD inhibitor 15i (IC50 = 62.23 nM), which was almost ten times more active than the phase III drug FG-4592 (IC50 = 591.4 nM). Furthermore, it can upregulate the hemoglobin of cisplatin induced anemia mice (120 g/L) to normal levels (160 g/L) with no apparent toxicity observed in vivo. These results confirm that triazole compound 15i is a promising candidate for the treatment of renal anemia.
Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong
2013-10-18
Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.
Turning publicly available gene expression data into discoveries using gene set context analysis.
Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai
2016-01-08
Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
An archaeal origin of eukaryotes supports only two primary domains of life.
Williams, Tom A; Foster, Peter G; Cox, Cymon J; Embley, T Martin
2013-12-12
The discovery of the Archaea and the proposal of the three-domains 'universal' tree, based on ribosomal RNA and core genes mainly involved in protein translation, catalysed new ideas for cellular evolution and eukaryotic origins. However, accumulating evidence suggests that the three-domains tree may be incorrect: evolutionary trees made using newer methods place eukaryotic core genes within the Archaea, supporting hypotheses in which an archaeon participated in eukaryotic origins by founding the host lineage for the mitochondrial endosymbiont. These results provide support for only two primary domains of life--Archaea and Bacteria--because eukaryotes arose through partnership between them.
IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, I-Min; Chu, Ken; Ratner, Anna
2014-10-28
In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorwaymore » to a new era in the discovery of novel molecules.« less
Transcriptional profiling in facioscapulohumeral muscular dystrophy to identify candidate biomarkers
Rahimov, Fedik; King, Oliver D.; Leung, Doris G.; Bibat, Genila M.; Emerson, Charles P.; Kunkel, Louis M.; Wagner, Kathryn R.
2012-01-01
Facioscapulohumeral muscular dystrophy (FSHD) is a progressive neuromuscular disorder caused by contractions of repetitive elements within the macrosatellite D4Z4 on chromosome 4q35. The pathophysiology of FSHD is unknown and, as a result, there is currently no effective treatment available for this disease. To better understand the pathophysiology of FSHD and develop mRNA-based biomarkers of affected muscles, we compared global analysis of gene expression in two distinct muscles obtained from a large number of FSHD subjects and their unaffected first-degree relatives. Gene expression in two muscle types was analyzed using GeneChip Gene 1.0 ST arrays: biceps, which typically shows an early and severe disease involvement; and deltoid, which is relatively uninvolved. For both muscle types, the expression differences were mild: using relaxed cutoffs for differential expression (fold change ≥1.2; nominal P value <0.01), we identified 191 and 110 genes differentially expressed between affected and control samples of biceps and deltoid muscle tissues, respectively, with 29 genes in common. Controlling for a false-discovery rate of <0.25 reduced the number of differentially expressed genes in biceps to 188 and in deltoid to 7. Expression levels of 15 genes altered in this study were used as a “molecular signature” in a validation study of an additional 26 subjects and predicted them as FSHD or control with 90% accuracy based on biceps and 80% accuracy based on deltoids. PMID:22988124
Genetic variation in cell death genes and risk of non-Hodgkin lymphoma.
Schuetz, Johanna M; Daley, Denise; Graham, Jinko; Berry, Brian R; Gallagher, Richard P; Connors, Joseph M; Gascoyne, Randy D; Spinelli, John J; Brooks-Wilson, Angela R
2012-01-01
Non-Hodgkin lymphomas are a heterogeneous group of solid tumours that constitute the 5(th) highest cause of cancer mortality in the United States and Canada. Poor control of cell death in lymphocytes can lead to autoimmune disease or cancer, making genes involved in programmed cell death of lymphocytes logical candidate genes for lymphoma susceptibility. We tested for genetic association with NHL and NHL subtypes, of SNPs in lymphocyte cell death genes using an established population-based study. 17 candidate genes were chosen based on biological function, with 123 SNPs tested. These included tagSNPs from HapMap and novel SNPs discovered by re-sequencing 47 cases in genes for which SNP representation was judged to be low. The main analysis, which estimated odds ratios by fitting data to an additive logistic regression model, used European ancestry samples that passed quality control measures (569 cases and 547 controls). A two-tiered approach for multiple testing correction was used: correction for number of tests within each gene by permutation-based methodology, followed by correction for the number of genes tested using the false discovery rate. Variant rs928883, near miR-155, showed an association (OR per A-allele: 2.80 [95% CI: 1.63-4.82]; p(F) = 0.027) with marginal zone lymphoma that is significant after correction for multiple testing. This is the first reported association between a germline polymorphism at a miRNA locus and lymphoma.
Duan, Qiaonan; Flynn, Corey; Niepel, Mario; Hafner, Marc; Muhlich, Jeremy L; Fernandez, Nicolas F; Rouillard, Andrew D; Tan, Christopher M; Chen, Edward Y; Golub, Todd R; Sorger, Peter K; Subramanian, Aravind; Ma'ayan, Avi
2014-07-01
For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C
2003-01-01
Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626
Li, Yuqin; You, Guirong; Jia, Baoxiu; Si, Hongzong; Yao, Xiaojun
2014-01-01
Quantitative structure-activity relationships (QSAR) were developed to predict the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase via heuristic method (HM) and gene expression programming (GEP). The descriptors of 33 pyrrolidine derivatives were calculated by the software CODESSA, which can calculate quantum chemical, topological, geometrical, constitutional, and electrostatic descriptors. HM was also used for the preselection of 5 appropriate molecular descriptors. Linear and nonlinear QSAR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R (2)) of 0.93 and 0.94. The two QSAR models are useful in predicting the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase during the discovery of new anticancer drugs and providing theory information for studying the new drugs.
Butler, Merlin G.; Rafi, Syed K.; Manzardo, Ann M.
2015-01-01
Recently, autism-related research has focused on the identification of various genes and disturbed pathways causing the genetically heterogeneous group of autism spectrum disorders (ASD). The list of autism-related genes has significantly increased due to better awareness with advances in genetic technology and expanding searchable genomic databases. We compiled a master list of known and clinically relevant autism spectrum disorder genes identified with supporting evidence from peer-reviewed medical literature sources by searching key words related to autism and genetics and from authoritative autism-related public access websites, such as the Simons Foundation Autism Research Institute autism genomic database dedicated to gene discovery and characterization. Our list consists of 792 genes arranged in alphabetical order in tabular form with gene symbols placed on high-resolution human chromosome ideograms, thereby enabling clinical and laboratory geneticists and genetic counsellors to access convenient visual images of the location and distribution of ASD genes. Meaningful correlations of the observed phenotype in patients with suspected/confirmed ASD gene(s) at the chromosome region or breakpoint band site can be made to inform diagnosis and gene-based personalized care and provide genetic counselling for families. PMID:25803107
Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles
2014-04-23
Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.
Piñero, Janet; Queralt-Rosinach, Núria; Bravo, Àlex; Deu-Pons, Jordi; Bauer-Mehren, Anna; Baron, Martin; Sanz, Ferran; Furlong, Laura I
2015-01-01
DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380,000 associations between >16,000 genes and 13,000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/ © The Author(s) 2015. Published by Oxford University Press.
Antisense antibiotics: a brief review of novel target discovery and delivery.
Bai, Hui; Xue, Xiaoyan; Hou, Zheng; Zhou, Ying; Meng, Jingru; Luo, Xiaoxing
2010-06-01
The nightmare of multi-drug resistant bacteria will still haunt if no panacea is ever found. Efforts on seeking desirable natural products with bactericidal property and screening chemically modified derivatives of traditional antibiotics have lagged behind the emergence of new multi-drug resistant bacteria. The concept of using antisense antibiotics, now as revolutionary as is on threshold has experienced ups and downs in the past decade. In the past five years, however, significant technology advances in the fields of microbial genomics, structural modification of oligonucleotides and efficient delivery system have led to fundamental progress in the research and in vivo application of this paradigm. The wealthy information provided in the microbial genomics era has allowed the identification and/or validation of a number of essential genes that may serve as possible targets for antisense inhibition; antisense oligodeoxynucleotides (ODNs) based on the 3rd generation of modified structures, e.g., peptide nucleic acids (PNAs) and phosphorodiamidate morpholino oligomers (PMOs) have shown great potency in gene expression inhibition in a sequence-specific and dosedependent manner at low micromolar concentrations; and cell penetrating peptide mediated delivery system has enabled the effective display of intracellular antisense inhibition of targeted genes both in vitro and in vivo. The new methods show promise in the discovery of novel gene-specific antisense antibiotics that will be useful in the future battle against drug-resistant bacterial infections. This review describes this promising paradigm, the targets that have been identified and the recent technologies on which it is delivered.
Personalized medicine in thrombosis: back to the future
Nagalla, Srikanth
2016-01-01
Most physicians believe they practiced personalized medicine prior to the genomics era that followed the sequencing of the human genome. The focus of personalized medicine has been primarily genomic medicine, wherein it is hoped that the nucleotide dissimilarities among different individuals would provide clinicians with more precise understanding of physiology, more refined diagnoses, better disease risk assessment, earlier detection and monitoring, and tailored treatments to the individual patient. However, to date, the “genomic bench” has not worked itself to the clinical thrombosis bedside. In fact, traditional plasma-based hemostasis-thrombosis laboratory testing, by assessing functional pathways of coagulation, may better help manage venous thrombotic disease than a single DNA variant with a small effect size. There are some new and exciting discoveries in the genetics of platelet reactivity pertaining to atherothrombotic disease. Despite a plethora of genetic/genomic data on platelet reactivity, there are relatively little actionable pharmacogenetic data with antiplatelet agents. Nevertheless, it is crucial for genome-wide DNA/RNA sequencing to continue in research settings for causal gene discovery, pharmacogenetic purposes, and gene-gene and gene-environment interactions. The potential of genomics to advance medicine will require integration of personal data that are obtained in the patient history: environmental exposures, diet, social data, etc. Furthermore, without the ritual of obtaining this information, we will have depersonalized medicine, which lacks the precision needed for the research required to eventually incorporate genomics into routine, optimal, and value-added clinical care. PMID:26847245
Bălăcescu, Loredana; Bălăcescu, O; Crişan, N; Fetica, B; Petruţ, B; Bungărdean, Cătălina; Rus, Meda; Tudoran, Oana; Meurice, G; Irimie, Al; Dragoş, N; Berindan-Neagoe, Ioana
2011-01-01
Prostate cancer represents the first leading cause of cancer among western male population, with different clinical behavior ranging from indolent to metastatic disease. Although many molecules and deregulated pathways are known, the molecular mechanisms involved in the development of prostate cancer are not fully understood. The aim of this study was to explore the molecular variation underlying the prostate cancer, based on microarray analysis and bioinformatics approaches. Normal and prostate cancer tissues were collected by macrodissection from prostatectomy pieces. All prostate cancer specimens used in our study were Gleason score 7. Gene expression microarray (Agilent Technologies) was used for Whole Human Genome evaluation. The bioinformatics and functional analysis were based on Limma and Ingenuity software. The microarray analysis identified 1119 differentially expressed genes between prostate cancer and normal prostate, which were up- or down-regulated at least 2-fold. P-values were adjusted for multiple testing using Benjamini-Hochberg method with a false discovery rate of 0.01. These genes were analyzed with Ingenuity Pathway Analysis software and were established 23 genetic networks. Our microarray results provide new information regarding the molecular networks in prostate cancer stratified as Gleason 7. These data highlighted gene expression profiles for better understanding of prostate cancer progression.
gene2drug: a computational tool for pathway-based rational drug repositioning.
Napolitano, Francesco; Carrella, Diego; Mandriani, Barbara; Pisonero-Vaquero, Sandra; Sirci, Francesco; Medina, Diego L; Brunetti-Pierri, Nicola; di Bernardo, Diego
2018-05-01
Drug repositioning has been proposed as an effective shortcut to drug discovery. The availability of large collections of transcriptional responses to drugs enables computational approaches to drug repositioning directly based on measured molecular effects. We introduce a novel computational methodology for rational drug repositioning, which exploits the transcriptional responses following treatment with small molecule. Specifically, given a therapeutic target gene, a prioritization of potential effective drugs is obtained by assessing their impact on the transcription of genes in the pathway(s) including the target. We performed in silico validation and comparison with a state-of-art technique based on similar principles. We next performed experimental validation in two different real-case drug repositioning scenarios: (i) upregulation of the glutamate-pyruvate transaminase (GPT), which has been shown to induce reduction of oxalate levels in a mouse model of primary hyperoxaluria, and (ii) activation of the transcription factor TFEB, a master regulator of lysosomal biogenesis and autophagy, whose modulation may be beneficial in neurodegenerative disorders. A web tool for Gene2drug is freely available at http://gene2drug.tigem.it. An R package is under development and can be obtained from https://github.com/franapoli/gep2pep. dibernardo@tigem.it. Supplementary data are available at Bioinformatics online.
Drug discovery strategies to outer membrane targets in Gram-negative pathogens.
Brown, Dean G
2016-12-15
This review will cover selected recent examples of drug discovery strategies which target the outer membrane (OM) of Gram-negative bacteria either by disruption of outer membrane function or by inhibition of essential gene products necessary for outer membrane assembly. Significant advances in pathway elucidation, structural biology and molecular inhibitor designs have created new opportunities for drug discovery within this target-class space. Copyright © 2016 Elsevier Ltd. All rights reserved.
DiscoverySpace: an interactive data analysis application
Robertson, Neil; Oveisi-Fordorei, Mehrdad; Zuyderduyn, Scott D; Varhol, Richard J; Fjell, Christopher; Marra, Marco; Jones, Steven; Siddiqui, Asim
2007-01-01
DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data. The application is freely available online. PMID:17210078
Top-K Interesting Subgraph Discovery in Information Networks
2014-03-03
Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining” at...Jiawei Han¶ ∗Microsoft, India . Email: gmanish@microsoft.com †State University of New York at Buffalo. Email: jing@buffalo.edu ‡University of California
Wei, Qingyi Wei
2012-01-01
Asbestos exposure is a known risk factor for lung cancer. Although recent genome-wide association studies (GWASs) have identified some novel loci for lung cancer risk, few addressed genome-wide gene–environment interactions. To determine gene–asbestos interactions in lung cancer risk, we conducted genome-wide gene–environment interaction analyses at levels of single nucleotide polymorphisms (SNPs), genes and pathways, using our published Texas lung cancer GWAS dataset. This dataset included 317 498 SNPs from 1154 lung cancer cases and 1137 cancer-free controls. The initial SNP-level P-values for interactions between genetic variants and self-reported asbestos exposure were estimated by unconditional logistic regression models with adjustment for age, sex, smoking status and pack-years. The P-value for the most significant SNP rs13383928 was 2.17×10–6, which did not reach the genome-wide statistical significance. Using a versatile gene-based test approach, we found that the top significant gene was C7orf54, located on 7q32.1 (P = 8.90×10–5). Interestingly, most of the other significant genes were located on 11q13. When we used an improved gene-set-enrichment analysis approach, we found that the Fas signaling pathway and the antigen processing and presentation pathway were most significant (nominal P < 0.001; false discovery rate < 0.05) among 250 pathways containing 17 572 genes. We believe that our analysis is a pilot study that first describes the gene–asbestos interaction in lung cancer risk at levels of SNPs, genes and pathways. Our findings suggest that immune function regulation-related pathways may be mechanistically involved in asbestos-associated lung cancer risk. Abbreviations:CIconfidence intervalEenvironmentFDRfalse discovery rateGgeneGSEAgene-set-enrichment analysisGWASgenome-wide association studiesi-GSEAimproved gene-set-enrichment analysis approachORodds ratioSNPsingle nucleotide polymorphism PMID:22637743
Metabolic traits of an uncultured archaeal lineage--MSBL1--from brine pools of the Red Sea.
Mwirichia, Romano; Alam, Intikhab; Rashid, Mamoon; Vinu, Manikandan; Ba-Alawi, Wail; Anthony Kamau, Allan; Kamanda Ngugi, David; Göker, Markus; Klenk, Hans-Peter; Bajic, Vladimir; Stingl, Ulrich
2016-01-13
The candidate Division MSBL1 (Mediterranean Sea Brine Lakes 1) comprises a monophyletic group of uncultured archaea found in different hypersaline environments. Previous studies propose methanogenesis as the main metabolism. Here, we describe a metabolic reconstruction of MSBL1 based on 32 single-cell amplified genomes from Brine Pools of the Red Sea (Atlantis II, Discovery, Nereus, Erba and Kebrit). Phylogeny based on rRNA genes as well as conserved single copy genes delineates the group as a putative novel lineage of archaea. Our analysis shows that MSBL1 may ferment glucose via the Embden-Meyerhof-Parnas pathway. However, in the absence of organic carbon, carbon dioxide may be fixed via the ribulose bisphosphate carboxylase, Wood-Ljungdahl pathway or reductive TCA cycle. Therefore, based on the occurrence of genes for glycolysis, absence of the core genes found in genomes of all sequenced methanogens and the phylogenetic position, we hypothesize that the MSBL1 are not methanogens, but probably sugar-fermenting organisms capable of autotrophic growth. Such a mixotrophic lifestyle would confer survival advantage (or possibly provide a unique narrow niche) when glucose and other fermentable sugars are not available.
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci.
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-02-14
Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). The PAX8-target gene set was ranked 1/615 in the discovery (P GSEA <0.001; FDR=0.21), 7/615 in the replication (P GSEA =0.004; FDR=0.37), and 1/615 in the combined (P GSEA <0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10 -5 (including six with P<5 × 10 -8 ). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (P GSEA =0.025) and IGROV1 (P GSEA =0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC.
Garland, Stephanie J.; Mohan, Swetha; Flibotte, Stephane; Muncaster, Quintin; Cai, Jerry; Rademakers, Suzanne; Moerman, Donald G.; Leroux, Michel R.
2016-01-01
Forward genetic screens represent powerful, unbiased approaches to uncover novel components in any biological process. Such screens suffer from a major bottleneck, however, namely the cloning of corresponding genes causing the phenotypic variation. Reverse genetic screens have been employed as a way to circumvent this issue, but can often be limited in scope. Here we demonstrate an innovative approach to gene discovery. Using C. elegans as a model system, we used a whole-genome sequenced multi-mutation library, from the Million Mutation Project, together with the Sequence Kernel Association Test (SKAT), to rapidly screen for and identify genes associated with a phenotype of interest, namely defects in dye-filling of ciliated sensory neurons. Such anomalies in dye-filling are often associated with the disruption of cilia, organelles which in humans are implicated in sensory physiology (including vision, smell and hearing), development and disease. Beyond identifying several well characterised dye-filling genes, our approach uncovered three genes not previously linked to ciliated sensory neuron development or function. From these putative novel dye-filling genes, we confirmed the involvement of BGNT-1.1 in ciliated sensory neuron function and morphogenesis. BGNT-1.1 functions at the trans-Golgi network of sheath cells (glia) to influence dye-filling and cilium length, in a cell non-autonomous manner. Notably, BGNT-1.1 is the orthologue of human B3GNT1/B4GAT1, a glycosyltransferase associated with Walker-Warburg syndrome (WWS). WWS is a multigenic disorder characterised by muscular dystrophy as well as brain and eye anomalies. Together, our work unveils an effective and innovative approach to gene discovery, and provides the first evidence that B3GNT1-associated Walker-Warburg syndrome may be considered a ciliopathy. PMID:27508411
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-01-01
Background: Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. Methods: All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). Results: The PAX8-target gene set was ranked 1/615 in the discovery (PGSEA<0.001; FDR=0.21), 7/615 in the replication (PGSEA=0.004; FDR=0.37), and 1/615 in the combined (PGSEA<0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10−5 (including six with P<5 × 10−8). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (PGSEA=0.025) and IGROV1 (PGSEA=0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Conclusions: Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC. PMID:28103614
Nearing saturation of cancer driver gene discovery.
Hsiehchen, David; Hsieh, Antony
2018-06-15
Extensive sequencing efforts of cancer genomes such as The Cancer Genome Atlas (TCGA) have been undertaken to uncover bona fide cancer driver genes which has enhanced our understanding of cancer and revealed therapeutic targets. However, the number of driver gene mutations is bounded, indicating that there must be a point when further sequencing efforts will be excessive. We found that there was a significant positive correlation between sample size and identified driver gene mutations across 33 cancers sequenced by the TCGA, which is expected if additional sequencing is still leading to the identification of more driver genes. However, the rate of new cancer driver genes being discovered with larger samples is declining rapidly. Our analysis provides a general guide for determining which cancer types would likely benefit from additional sequencing efforts, particularly those with relatively high rates of cancer driver gene discovery. Our results argue that past strategies of indiscriminately sequencing as many specimens as possible for all cancer types is becoming inefficient. In addition, without significant investments into applying our knowledge of cancer genomes, we risk sequencing more cancer genomes for the sake of sequencing rather than meaningful patient benefit.
Update on Genomic Databases and Resources at the National Center for Biotechnology Information.
Tatusova, Tatiana
2016-01-01
The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.
You, Yanqin; Sun, Yan; Li, Xuchao; Li, Yali; Wei, Xiaoming; Chen, Fang; Ge, Huijuan; Lan, Zhangzhang; Zhu, Qian; Tang, Ying; Wang, Shujuan; Gao, Ya; Jiang, Fuman; Song, Jiaping; Shi, Quan; Zhu, Xuan; Mu, Feng; Dong, Wei; Gao, Vince; Jiang, Hui; Yi, Xin; Wang, Wei; Gao, Zhiying
2014-08-01
This article demonstrates a prominent noninvasive prenatal approach to assist the clinical diagnosis of a single-gene disorder disease, maple syrup urine disease, using targeted sequencing knowledge from the affected family. The method reported here combines novel mutant discovery in known genes by targeted massively parallel sequencing with noninvasive prenatal testing. By applying this new strategy, we successfully revealed novel mutations in the gene BCKDHA (Ex2_4dup and c.392A>G) in this Chinese family and developed a prenatal haplotype-assisted approach to noninvasively detect the genotype of the fetus (transmitted from both parents). This is the first report of integration of targeted sequencing and noninvasive prenatal testing into clinical practice. Our study has demonstrated that this massively parallel sequencing-based strategy can potentially be used for single-gene disorder diagnosis in the future.
Hartnett, M Elizabeth; Morrison, Margaux A; Smith, Silvia; Yanovitch, Tammy L; Young, Terri L; Colaizy, Tarah; Momany, Allison; Dagle, John; Carlo, Waldemar A; Clark, Erin A S; Page, Grier; Murray, Jeff; DeAngelis, Margaret M; Cotten, C Michael
2014-08-12
To determine genetic variants associated with severe retinopathy of prematurity (ROP) in a candidate gene cohort study of US preterm infants. Preterm infants in the discovery cohort were enrolled through the Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network, and those in the replication cohort were from the University of Iowa. All infants were phenotyped for ROP severity. Because of differences in the durations of enrollment between cohorts, severe ROP was defined as threshold disease in the discovery cohort and as threshold disease or type 1 ROP in the replication cohort. Whole genome amplified DNA from stored blood spot samples from the Neonatal Research Network biorepository was genotyped using an Illumina GoldenGate platform for candidate gene single nucleotide polymorphisms (SNPs) involving angiogenic, developmental, inflammatory, and oxidative pathways. Three analyses were performed to determine significant epidemiologic variables and SNPs associated with levels of ROP severity. Analyses controlled for multiple comparisons, ancestral eigenvalues, family relatedness, and significant epidemiologic variables. Single nucleotide polymorphisms significantly associated with ROP severity from the discovery cohort were analyzed in the replication cohort and in meta-analysis. Eight hundred seventeen infants in the discovery cohort and 543 in the replication cohort were analyzed. Severe ROP occurred in 126 infants in the discovery and in 14 in the replication cohort. In both cohorts, ventilation days and seizure occurrence were associated with severe ROP. After controlling for significant factors and multiple comparisons, two intronic SNPs in the gene BDNF (rs7934165 and rs2049046, P < 3.1 × 10(-5)) were associated with severe ROP in the discovery cohort and were not associated with severe ROP in the replication cohort. However, when the cohorts were analyzed together in an exploratory meta-analysis, rs7934165 increased in associated significance with severe ROP (P = 2.9 × 10(-7)). Variants in BDNF encoding brain-derived neurotrophic factor were associated with severe ROP in a large candidate gene study of infants with threshold ROP. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Hutchins, James R. A.
2014-01-01
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry–based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set–wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery. PMID:24723265
False negative rates in Drosophila cell-based RNAi screens: a case study
2011-01-01
Background High-throughput screening using RNAi is a powerful gene discovery method but is often complicated by false positive and false negative results. Whereas false positive results associated with RNAi reagents has been a matter of extensive study, the issue of false negatives has received less attention. Results We performed a meta-analysis of several genome-wide, cell-based Drosophila RNAi screens, together with a more focused RNAi screen, and conclude that the rate of false negative results is at least 8%. Further, we demonstrate how knowledge of the cell transcriptome can be used to resolve ambiguous results and how the number of false negative results can be reduced by using multiple, independently-tested RNAi reagents per gene. Conclusions RNAi reagents that target the same gene do not always yield consistent results due to false positives and weak or ineffective reagents. False positive results can be partially minimized by filtering with transcriptome data. RNAi libraries with multiple reagents per gene also reduce false positive and false negative outcomes when inconsistent results are disambiguated carefully. PMID:21251254
Sugii, Yuh; Kasai, Tomonari; Ikeda, Masashi; Vaidyanath, Arun; Kumon, Kazuki; Mizutani, Akifumi; Seno, Akimasa; Tokutaka, Heizo; Kudoh, Takayuki; Seno, Masaharu
2016-01-01
To identify cell-specific markers, we designed a DNA microarray platform with oligonucleotide probes for human membrane-anchored proteins. Human glioma cell lines were analyzed using microarray and compared with normal and fetal brain tissues. For the microarray analysis, we employed a spherical self-organizing map, which is a clustering method suitable for the conversion of multidimensional data into two-dimensional data and displays the relationship on a spherical surface. Based on the gene expression profile, the cell surface characteristics were successfully mirrored onto the spherical surface, thereby distinguishing normal brain tissue from the disease model based on the strength of gene expression. The clustered glioma-specific genes were further analyzed by polymerase chain reaction procedure and immunocytochemical staining of glioma cells. Our platform and the following procedure were successfully demonstrated to categorize the genes coding for cell surface proteins that are specific to glioma cells. Our assessment demonstrates that a spherical self-organizing map is a valuable tool for distinguishing cell surface markers and can be employed in marker discovery studies for the treatment of cancer.
High-resolution genetic mapping of allelic variants associated with cell wall chemistry in Populus.
Muchero, Wellington; Guo, Jianjun; DiFazio, Stephen P; Chen, Jin-Gui; Ranjan, Priya; Slavov, Gancho T; Gunter, Lee E; Jawdy, Sara; Bryan, Anthony C; Sykes, Robert; Ziebell, Angela; Klápště, Jaroslav; Porth, Ilga; Skyba, Oleksandr; Unda, Faride; El-Kassaby, Yousry A; Douglas, Carl J; Mansfield, Shawn D; Martin, Joel; Schackwitz, Wendy; Evans, Luke M; Czarnecki, Olaf; Tuskan, Gerald A
2015-01-23
QTL cloning for the discovery of genes underlying polygenic traits has historically been cumbersome in long-lived perennial plants like Populus. Linkage disequilibrium-based association mapping has been proposed as a cloning tool, and recent advances in high-throughput genotyping and whole-genome resequencing enable marker saturation to levels sufficient for association mapping with no a priori candidate gene selection. Here, multiyear and multienvironment evaluation of cell wall phenotypes was conducted in an interspecific P. trichocarpa x P. deltoides pseudo-backcross mapping pedigree and two partially overlapping populations of unrelated P. trichocarpa genotypes using pyrolysis molecular beam mass spectrometry, saccharification, and/ or traditional wet chemistry. QTL mapping was conducted using a high-density genetic map with 3,568 SNP markers. As a fine-mapping approach, chromosome-wide association mapping targeting a QTL hot-spot on linkage group XIV was performed in the two P. trichocarpa populations. Both populations were genotyped using the 34 K Populus Infinium SNP array and whole-genome resequencing of one of the populations facilitated marker-saturation of candidate intervals for gene identification. Five QTLs ranging in size from 0.6 to 1.8 Mb were mapped on linkage group XIV for lignin content, syringyl to guaiacyl (S/G) ratio, 5- and 6-carbon sugars using the mapping pedigree. Six candidate loci exhibiting significant associations with phenotypes were identified within QTL intervals. These associations were reproducible across multiple environments, two independent genotyping platforms, and different plant growth stages. cDNA sequencing for allelic variants of three of the six loci identified polymorphisms leading to variable length poly glutamine (PolyQ) stretch in a transcription factor annotated as an ANGUSTIFOLIA C-terminus Binding Protein (CtBP) and premature stop codons in a KANADI transcription factor as well as a protein kinase. Results from protoplast transient expression assays suggested that each of the polymorphisms conferred allelic differences in the activation of cellulose, hemicelluloses, and lignin pathway marker genes. This study illustrates the utility of complementary QTL and association mapping as tools for gene discovery with no a priori candidate gene selection. This proof of concept in a perennial organism opens up opportunities for discovery of novel genetic determinants of economically important but complex traits in plants.
Du, Qingzhang; Tian, Jiaxing; Yang, Xiaohui; Pan, Wei; Xu, Baohua; Li, Bailian; Ingvarsson, Pär K.; Zhang, Deqiang
2015-01-01
Economically important traits in many species generally show polygenic, quantitative inheritance. The components of genetic variation (additive, dominant and epistatic effects) of these traits conferred by multiple genes in shared biological pathways remain to be defined. Here, we investigated 11 full-length genes in cellulose biosynthesis, on 10 growth and wood-property traits, within a population of 460 unrelated Populus tomentosa individuals, via multi-gene association. To validate positive associations, we conducted single-marker analysis in a linkage population of 1,200 individuals. We identified 118, 121, and 43 associations (P< 0.01) corresponding to additive, dominant, and epistatic effects, respectively, with low to moderate proportions of phenotypic variance (R2). Epistatic interaction models uncovered a combination of three non-synonymous sites from three unique genes, representing a significant epistasis for diameter at breast height and stem volume. Single-marker analysis validated 61 associations (false discovery rate, Q ≤ 0.10), representing 38 SNPs from nine genes, and its average effect (R2 = 3.8%) nearly 2-fold higher than that identified with multi-gene association, suggesting that multi-gene association can capture smaller individual variants. Moreover, a structural gene–gene network based on tissue-specific transcript abundances provides a better understanding of the multi-gene pathway affecting tree growth and lignocellulose biosynthesis. Our study highlights the importance of pathway-based multiple gene associations to uncover the nature of genetic variance for quantitative traits and may drive novel progress in molecular breeding. PMID:25428896
An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.
Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit
2016-05-26
Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.
The Next Step: 25 Discoveries That Could Change Our Lives.
ERIC Educational Resources Information Center
Science85, 1985
1985-01-01
Describes (in separate articles) 25 developments in science, technology, and medicine that have potential impact on the near future. They include discoveries related to space butterflies, drugs, twenty-first century software, experimental mathematics, brain drugs, egg development, ultrasmall microchips, the biology of birth, cancer-causing genes,…
Plant uncoupling mitochondrial proteins.
Vercesi, Aníbal Eugênio; Borecký, Jiri; Maia, Ivan de Godoy; Arruda, Paulo; Cuccovia, Iolanda Midea; Chaimovich, Hernan
2006-01-01
Uncoupling proteins (UCPs) are membrane proteins that mediate purine nucleotide-sensitive free fatty acid-activated H(+) flux through the inner mitochondrial membrane. After the discovery of UCP in higher plants in 1995, it was acknowledged that these proteins are widely distributed in eukaryotic organisms. The widespread presence of UCPs in eukaryotes implies that these proteins may have functions other than thermogenesis. In this review, we describe the current knowledge of plant UCPs, including their discovery, biochemical properties, distribution, gene family, gene expression profiles, regulation of gene expression, and evolutionary aspects. Expression analyses and functional studies on the plant UCPs under normal and stressful conditions suggest that UCPs regulate energy metabolism in the cellular responses to stress through regulation of the electrochemical proton potential (Deltamu(H)+) and production of reactive oxygen species.
Unsupervised automated high throughput phenotyping of RNAi time-lapse movies.
Failmezger, Henrik; Fröhlich, Holger; Tresch, Achim
2013-10-04
Gene perturbation experiments in combination with fluorescence time-lapse cell imaging are a powerful tool in reverse genetics. High content applications require tools for the automated processing of the large amounts of data. These tools include in general several image processing steps, the extraction of morphological descriptors, and the grouping of cells into phenotype classes according to their descriptors. This phenotyping can be applied in a supervised or an unsupervised manner. Unsupervised methods are suitable for the discovery of formerly unknown phenotypes, which are expected to occur in high-throughput RNAi time-lapse screens. We developed an unsupervised phenotyping approach based on Hidden Markov Models (HMMs) with multivariate Gaussian emissions for the detection of knockdown-specific phenotypes in RNAi time-lapse movies. The automated detection of abnormal cell morphologies allows us to assign a phenotypic fingerprint to each gene knockdown. By applying our method to the Mitocheck database, we show that a phenotypic fingerprint is indicative of a gene's function. Our fully unsupervised HMM-based phenotyping is able to automatically identify cell morphologies that are specific for a certain knockdown. Beyond the identification of genes whose knockdown affects cell morphology, phenotypic fingerprints can be used to find modules of functionally related genes.
Mesenchymal Stem Cells for Vascular Target Discovery in Breast Cancer-Associated Angiogenesis
2005-09-01
demonstrating this marker as demonstrated by flow cytometry . These GFP+ MSCs were subsequently analyzed for expression of commonly reported markers of...phenotypically and genotypically analyzed by flow cytometry and gene chip analysis, respectively. We have also shown that MSCs can then be stimulated to...positive MSCs retrieved by collagenase digestion of the Matrigel plug and sorted by flow cytometry . Sorting of these retrieved cells based on co-expression
Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.
Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor
2010-08-01
Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.
A new computational strategy for predicting essential genes.
Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng
2013-12-21
Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Identifying key genes in rheumatoid arthritis by weighted gene co-expression network analysis.
Ma, Chunhui; Lv, Qi; Teng, Songsong; Yu, Yinxian; Niu, Kerun; Yi, Chengqin
2017-08-01
This study aimed to identify rheumatoid arthritis (RA) related genes based on microarray data using the WGCNA (weighted gene co-expression network analysis) method. Two gene expression profile datasets GSE55235 (10 RA samples and 10 healthy controls) and GSE77298 (16 RA samples and seven healthy controls) were downloaded from Gene Expression Omnibus database. Characteristic genes were identified using metaDE package. WGCNA was used to find disease-related networks based on gene expression correlation coefficients, and module significance was defined as the average gene significance of all genes used to assess the correlation between the module and RA status. Genes in the disease-related gene co-expression network were subject to functional annotation and pathway enrichment analysis using Database for Annotation Visualization and Integrated Discovery. Characteristic genes were also mapped to the Connectivity Map to screen small molecules. A total of 599 characteristic genes were identified. For each dataset, characteristic genes in the green, red and turquoise modules were most closely associated with RA, with gene numbers of 54, 43 and 79, respectively. These genes were enriched in totally enriched in 17 Gene Ontology terms, mainly related to immune response (CD97, FYB, CXCL1, IKBKE, CCR1, etc.), inflammatory response (CD97, CXCL1, C3AR1, CCR1, LYZ, etc.) and homeostasis (C3AR1, CCR1, PLN, CCL19, PPT1, etc.). Two small-molecule drugs sanguinarine and papaverine were predicted to have a therapeutic effect against RA. Genes related to immune response, inflammatory response and homeostasis presumably have critical roles in RA pathogenesis. Sanguinarine and papaverine have a potential therapeutic effect against RA. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.
Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun
2015-01-01
Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with other avian species as useful tools to understand the goose immune system. PMID:25816068
2014-01-01
Background Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression. Results We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways. PMID:25221624
Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313
Demissie, Serkalem; Soranzo, Nicole; Bianchi, Estelle N.; Grundberg, Elin; Liang, Liming; Richards, J. Brent; Estrada, Karol; Zhou, Yanhua; van Nas, Atila; Moffatt, Miriam F.; Zhai, Guangju; Hofman, Albert; van Meurs, Joyce B.; Pols, Huibert A. P.; Price, Roger I.; Nilsson, Olle; Pastinen, Tomi; Cupples, L. Adrienne; Lusis, Aldons J.; Schadt, Eric E.; Ferrari, Serge; Uitterlinden, André G.
2010-01-01
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation. PMID:20548944
Zhang, Baixia; Li, Yanwen; Zhang, Yanling; Li, Zhiyong; Bi, Tian; He, Yusu; Song, Kuokui; Wang, Yun
2016-01-01
Identification of bioactive components is an important area of research in traditional Chinese medicine (TCM) formula. The reported identification methods only consider the interaction between the components and the target proteins, which is not sufficient to explain the influence of TCM on the gene expression. Here, we propose the Initial Transcription Process-based Identification (ITPI) method for the discovery of bioactive components that influence transcription factors (TFs). In this method, genome-wide chip detection technology was used to identify differentially expressed genes (DEGs). The TFs of DEGs were derived from GeneCards. The components influencing the TFs were derived from STITCH. The bioactive components in the formula were identified by evaluating the molecular similarity between the components in formula and the components that influence the TF of DEGs. Using the formula of Tian-Zhu-San (TZS) as an example, the reliability and limitation of ITPI were examined and 16 bioactive components that influence TFs were identified. PMID:27034696
Withers, Sydnor T.; Gottlieb, Shayin S.; Lieu, Bonny; Newman, Jack D.; Keasling, Jay D.
2007-01-01
We have developed a novel method to clone terpene synthase genes. This method relies on the inherent toxicity of the prenyl diphosphate precursors to terpenes, which resulted in a reduced-growth phenotype. When these precursors were consumed by a terpene synthase, normal growth was restored. We have demonstrated that this method is capable of enriching a population of engineered Escherichia coli for those clones that express the sesquiterpene-producing amorphadiene synthase. In addition, we enriched a library of genomic DNA from the isoprene-producing bacterium Bacillus subtilis strain 6051 in E. coli engineered to produce elevated levels of isopentenyl diphosphate and dimethylallyl diphosphate. The selection resulted in the discovery of two genes (yhfR and nudF) whose protein products acted directly on the prenyl diphosphate precursors and produced isopentenol. Expression of nudF in E. coli engineered with the mevalonate-based isopentenyl pyrophosphate biosynthetic pathway resulted in the production of isopentenol. PMID:17693564
Wunderlich, K R; Abbey, C A; Clayton, D R; Song, Y; Schein, J E; Georges, M; Coppieters, W; Adelson, D L; Taylor, J F; Davis, S L; Gill, C A
2006-12-01
The polled locus has been mapped by genetic linkage analysis to the proximal region of bovine chromosome 1. As an intermediate step in our efforts to identify the polled locus and the underlying causative mutation for the polled phenotype, we have constructed a BAC-based physical map of the interval containing the polled locus. Clones containing genes and markers in the critical interval were isolated from the TAMBT (constructed from Angus and Longhorn genomic DNA) and CHORI-240 (constructed from horned Hereford genomic DNA) BAC libraries and ordered based on fingerprinting and the presence or absence of 80 STS markers. A single contig spanning 2.5 Mb was assembled. Comparison of the physical order of STSs to the corresponding region of human chromosome 21 revealed the same order of genes within the polled critical interval. This contig of overlapping BAC clones from horned and polled breeds is a useful resource for SNP discovery and characterization of positional candidate genes.
BioProspecting: novel marker discovery obtained by mining the bibleome.
Elkin, Peter L; Tuttle, Mark S; Trusko, Brett E; Brown, Steven H
2009-02-05
BioProspecting is a novel approach that enabled our team to mine data related to genetic markers from the New England Journal of Medicine (NEJM) utilizing SNOMED CT and the Human Gene Onotology (HUGO). The Biomedical Informatics Research Collaborative was able to link genes and disorders using the Multi-threaded Clinical Vocabulary Server (MCVS) and natural language processing engine, whose output creates an ontology-network using the semantic encodings of the literature that is organized by these two terminologies. We identified relationships between (genes or proteins) and (diseases or drugs) as linked by metabolic functions and identified potentially novel functional relationships between, for example, genes and diseases (e.g. Article #1 ([Gene - IL27] = > {Enzyme - Dipeptidyl Carboxypeptidase 1}) and Article #2 ({Enzyme - Dipeptidyl Carboxypeptidase 1} < = [Disorder - Type II DM]) showing a metabolic link between IL27 and Type II DM). In this manuscript we describe our method for developing the database and its content as well as its potential to assist in the discovery of novel markers and drugs.
Epigenetic modulators, modifiers and mediators in cancer aetiology and progression
Feinberg, Andrew P.; Koldobskiy, Michael A.; Göndör, Anita
2016-01-01
This year is the tenth anniversary of the publication in this journal of a model suggesting the existence of ‘tumour progenitor genes’. These genes are epigenetically disrupted at the earliest stages of malignancies, even before mutations, and thus cause altered differentiation throughout tumour evolution. The past decade of discovery in cancer epigenetics has revealed a number of similarities between cancer genes and stem cell reprogramming genes, widespread mutations in epigenetic regulators, and the part played by chromatin structure in cellular plasticity in both development and cancer. In the light of these discoveries, we suggest here a framework for cancer epigenetics involving three types of genes: ‘epigenetic mediators’, corresponding to the tumour progenitor genes suggested earlier; ‘epigenetic modifiers’ of the mediators, which are frequently mutated in cancer; and ‘epigenetic modulators’ upstream of the modifiers, which are responsive to changes in the cellular environment and often linked to the nuclear architecture. We suggest that this classification is helpful in framing new diagnostic and therapeutic approaches to cancer. PMID:26972587
Unique disease heritage of the Dutch-German Mennonite population.
Orton, Noelle C; Innes, A Micheil; Chudley, Albert E; Bech-Hansen, N Torben
2008-04-15
The Dutch-German Mennonites are a religious isolate with foundational roots in the 16th century. A tradition of endogamy, large families, detailed genealogical records, and a unique disease history all contribute to making this a valuable population for genetic studies. Such studies in the Dutch-German Mennonite population have already contributed to the identification of the causative genes in several conditions such as the incomplete form of X-linked congenital stationary night blindness (CSNB2; previously iCSNB) and hypophosphatasia (HOPS), as well as the discovery of founder mutations within established disease genes (MYBPC1, CYP17alpha). The Dutch-German Mennonite population provides a strong resource for gene discovery and could lead to the identification of additional disease genes with relevance to the general population. In addition, further research developments should enhance delivery of clinical genetic services to this unique community. In the current review we discuss 31 genetic conditions, including 17 with identified gene mutations, within the Dutch-German Mennonite population. Copyright 2008 Wiley-Liss, Inc.
Mamrak, Nicholas E; Shimamura, Akiko; Howlett, Niall G
2017-05-01
Fanconi anemia (FA) is a rare autosomal and X-linked genetic disease characterized by congenital abnormalities, progressive bone marrow failure (BMF), and increased cancer risk during early adulthood. The median lifespan for FA patients is approximately 33years. The proteins encoded by the FA genes function together in the FA-BRCA pathway to repair DNA damage and to maintain genome stability. Within the past two years, five new FA genes have been identified-RAD51/FANCR, BRCA1/FANCS, UBE2T/FANCT, XRCC2/FANCU, and REV7/FANCV-bringing the total number of disease-causing genes to 21. This review summarizes the discovery of these new FA genes and describes how these proteins integrate into the FA-BRCA pathway to maintain genome stability and critically prevent early-onset BMF and cancer. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hu, Jianhua; Wright, Fred A
2007-03-01
The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.
Drier, Yotam; Domany, Eytan
2011-03-14
The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.
Vazquez, Miguel; Nogales-Cadenas, Ruben; Arroyo, Javier; Botías, Pedro; García, Raul; Carazo, Jose M; Tirado, Francisco; Pascual-Montano, Alberto; Carmona-Saez, Pedro
2010-07-01
The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.
Unsupervised text mining for assessing and augmenting GWAS results.
Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence
2016-04-01
Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
Seki, Akiko; Rutz, Sascha
2018-03-05
CRISPR (clustered, regularly interspaced, short palindromic repeats)/Cas9 (CRISPR-associated protein 9) has become the tool of choice for generating gene knockouts across a variety of species. The ability for efficient gene editing in primary T cells not only represents a valuable research tool to study gene function but also holds great promise for T cell-based immunotherapies, such as next-generation chimeric antigen receptor (CAR) T cells. Previous attempts to apply CRIPSR/Cas9 for gene editing in primary T cells have resulted in highly variable knockout efficiency and required T cell receptor (TCR) stimulation, thus largely precluding the study of genes involved in T cell activation or differentiation. Here, we describe an optimized approach for Cas9/RNP transfection of primary mouse and human T cells without TCR stimulation that results in near complete loss of target gene expression at the population level, mitigating the need for selection. We believe that this method will greatly extend the feasibly of target gene discovery and validation in primary T cells and simplify the gene editing process for next-generation immunotherapies. © 2018 Genentech.
Rafnar, Thorunn; Vermeulen, Sita H.; Sulem, Patrick; Thorleifsson, Gudmar; Aben, Katja K.; Witjes, J. Alfred; Grotenhuis, Anne J.; Verhaegh, Gerald W.; Hulsbergen-van de Kaa, Christina A.; Besenbacher, Soren; Gudbjartsson, Daniel; Stacey, Simon N.; Gudmundsson, Julius; Johannsdottir, Hrefna; Bjarnason, Hjordis; Zanon, Carlo; Helgadottir, Hafdis; Jonasson, Jon Gunnlaugur; Tryggvadottir, Laufey; Jonsson, Eirikur; Geirsson, Gudmundur; Nikulasson, Sigfus; Petursdottir, Vigdis; Bishop, D. Timothy; Chung-Sak, Sei; Choudhury, Ananya; Elliott, Faye; Barrett, Jennifer H.; Knowles, Margaret A.; de Verdier, Petra J.; Ryk, Charlotta; Lindblom, Annika; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; Vineis, Paolo; Polidoro, Silvia; Guarrera, Simonetta; Sacerdote, Carlotta; Panadero, Angeles; Sanz-Velez, José I.; Sanchez, Manuel; Valdivia, Gabriel; Garcia-Prats, Maria D.; Hengstler, Jan G.; Selinski, Silvia; Gerullis, Holger; Ovsiannikov, Daniel; Khezri, Abdolaziz; Aminsharifi, Alireza; Malekzadeh, Mahyar; van den Berg, Leonard H.; Ophoff, Roel A.; Veldink, Jan H.; Zeegers, Maurice P.; Kellen, Eliane; Fostinelli, Jacopo; Andreoli, Daniele; Arici, Cecilia; Porru, Stefano; Buntinx, Frank; Ghaderi, Abbas; Golka, Klaus; Mayordomo, José I.; Matullo, Giuseppe; Kumar, Rajiv; Steineck, Gunnar; Kiltie, Anne E.; Kong, Augustine; Thorsteinsdottir, Unnur; Stefansson, Kari; Kiemeney, Lambertus A.
2011-01-01
Three genome-wide association studies in Europe and the USA have reported eight urinary bladder cancer (UBC) susceptibility loci. Using extended case and control series and 1000 Genomes imputations of 5 340 737 single-nucleotide polymorphisms (SNPs), we searched for additional loci in the European GWAS. The discovery sample set consisted of 1631 cases and 3822 controls from the Netherlands and 603 cases and 37 781 controls from Iceland. For follow-up, we used 3790 cases and 7507 controls from 13 sample sets of European and Iranian ancestry. Based on the discovery analysis, we followed up signals in the urea transporter (UT) gene SLC14A. The strongest signal at this locus was represented by a SNP in intron 3, rs17674580, that reached genome-wide significance in the overall analysis of the discovery and follow-up groups: odds ratio = 1.17, P = 7.6 × 10−11. SLC14A1 codes for UTs that define the Kidd blood group and are crucial for the maintenance of a constant urea concentration gradient in the renal medulla and, through this, the kidney's ability to concentrate urine. It is speculated that rs17674580, or other sequence variants in LD with it, indirectly modifies UBC risk by affecting urine production. If confirmed, this would support the ‘urogenous contact hypothesis’ that urine production and voiding frequency modify the risk of UBC. PMID:21750109
Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W
2012-12-20
Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.
Discovery of Herpes B Virus-Encoded MicroRNAs▿
Besecker, Michael I.; Harden, Mallory E.; Li, Guanglin; Wang, Xiu-Jie; Griffiths, Anthony
2009-01-01
Herpes B virus (BV) naturally infects macaque monkeys and is a close relative of herpes simplex virus. BV can zoonotically infect humans to cause a rapidly ascending encephalitis with ∼80% mortality. Therefore, BV is a serious danger to those who come into contact with these monkeys or their tissues and cells. MicroRNAs are regulators of gene expression, and there have been reports of virus-encoded microRNAs. We hypothesize that BV-encoded microRNAs are important for the regulation of viral and cellular genes. Herein, we report the discovery of three herpes B virus-encoded microRNAs. PMID:19144716
Kuo, Kevin H M
2017-01-01
The issue of multiple testing, also termed multiplicity, is ubiquitous in studies where multiple hypotheses are tested simultaneously. Genome-wide association study (GWAS), a type of genetic association study that has gained popularity in the past decade, is most susceptible to the issue of multiple testing. Different methodologies have been employed to address the issue of multiple testing in GWAS. The purpose of the review is to examine the methodologies employed in dealing with multiple testing in the context of gene discovery using GWAS in sickle cell disease complications.
Chen, Ming-Huei; Yanek, Lisa R; Backman, Joshua D; Eicher, John D; Huffman, Jennifer E; Ben-Shlomo, Yoav; Beswick, Andrew D; Yerges-Armstrong, Laura M; Shuldiner, Alan R; O'Connell, Jeffrey R; Mathias, Rasika A; Becker, Diane M; Becker, Lewis C; Lewis, Joshua P; Johnson, Andrew D; Faraday, Nauder
2017-11-29
Previous genome-wide association studies (GWAS) have identified several variants associated with platelet function phenotypes; however, the proportion of variance explained by the identified variants is mostly small. Rare coding variants, particularly those with high potential for impact on protein structure/function, may have substantial impact on phenotype but are difficult to detect by GWAS. The main purpose of this study was to identify low frequency or rare variants associated with platelet function using genotype data from the Illumina HumanExome Bead Chip. Three family-based cohorts of European ancestry, including ~4,000 total subjects, comprised the discovery cohort and two independent cohorts, one of European and one of African American ancestry, were used for replication. Optical aggregometry in platelet-rich plasma was performed in all the discovery cohorts in response to adenosine diphosphate (ADP), epinephrine, and collagen. Meta-analyses were performed using both gene-based and single nucleotide variant association methods. The gene-based meta-analysis identified a significant association (P = 7.13 × 10 -7 ) between rare genetic variants in ANKRD26 and ADP-induced platelet aggregation. One of the ANKRD26 SNVs - rs191015656, encoding a threonine to isoleucine substitution predicted to alter protein structure/function, was replicated in Europeans. Aggregation increases of ~20-50% were observed in heterozygotes in all cohorts. Novel genetic signals in ABCG1 and HCP5 were also associated with platelet aggregation to ADP in meta-analyses, although only results for HCP5 could be replicated. The SNV in HCP5 intersects epigenetic signatures in CD41+ megakaryocytes suggesting a new functional role in platelet biology for HCP5. This is the first study to use gene-based association methods from SNV array genotypes to identify rare variants related to platelet function. The molecular mechanisms and pathophysiological relevance for the identified genetic associations requires further study.
Cecconi, Massimiliano; Parodi, Maria I.; Formisano, Francesco; Spirito, Paolo; Autore, Camillo; Musumeci, Maria B.; Favale, Stefano; Forleo, Cinzia; Rapezzi, Claudio; Biagini, Elena; Davì, Sabrina; Canepa, Elisabetta; Pennese, Loredana; Castagnetta, Mauro; Degiorgio, Dario; Coviello, Domenico A.
2016-01-01
Hypertrophic cardiomyopathy (HCM) is mainly associated with myosin, heavy chain 7 (MYH7) and myosin binding protein C, cardiac (MYBPC3) mutations. In order to better explain the clinical and genetic heterogeneity in HCM patients, in this study, we implemented a target-next generation sequencing (NGS) assay. An Ion AmpliSeq™ Custom Panel for the enrichment of 19 genes, of which 9 of these did not encode thick/intermediate and thin myofilament (TTm) proteins and, among them, 3 responsible of HCM phenocopy, was created. Ninety-two DNA samples were analyzed by the Ion Personal Genome Machine: 73 DNA samples (training set), previously genotyped in some of the genes by Sanger sequencing, were used to optimize the NGS strategy, whereas 19 DNA samples (discovery set) allowed the evaluation of NGS performance. In the training set, we identified 72 out of 73 expected mutations and 15 additional mutations: the molecular diagnosis was achieved in one patient with a previously wild-type status and the pre-excitation syndrome was explained in another. In the discovery set, we identified 20 mutations, 5 of which were in genes encoding non-TTm proteins, increasing the diagnostic yield by approximately 20%: a single mutation in genes encoding non-TTm proteins was identified in 2 out of 3 borderline HCM patients, whereas co-occuring mutations in genes encoding TTm and galactosidase alpha (GLA) altered proteins were characterized in a male with HCM and multiorgan dysfunction. Our combined targeted NGS-Sanger sequencing-based strategy allowed the molecular diagnosis of HCM with greater efficiency than using the conventional (Sanger) sequencing alone. Mutant alleles encoding non-TTm proteins may aid in the complete understanding of the genetic and phenotypic heterogeneity of HCM: co-occuring mutations of genes encoding TTm and non-TTm proteins could explain the wide variability of the HCM phenotype, whereas mutations in genes encoding only the non-TTm proteins are identifiable in patients with a milder HCM status. PMID:27600940
Predicting degree of benefit from adjuvant trastuzumab in NSABP trial B-31.
Pogue-Geile, Katherine L; Kim, Chungyeul; Jeong, Jong-Hyeon; Tanaka, Noriko; Bandos, Hanna; Gavin, Patrick G; Fumagalli, Debora; Goldstein, Lynn C; Sneige, Nour; Burandt, Eike; Taniyama, Yusuke; Bohn, Olga L; Lee, Ahwon; Kim, Seung-Il; Reilly, Megan L; Remillard, Matthew Y; Blackmon, Nicole L; Kim, Seong-Rim; Horne, Zachary D; Rastogi, Priya; Fehrenbacher, Louis; Romond, Edward H; Swain, Sandra M; Mamounas, Eleftherios P; Wickerham, D Lawrence; Geyer, Charles E; Costantino, Joseph P; Wolmark, Norman; Paik, Soonmyung
2013-12-04
National Surgical Adjuvant Breast and Bowel Project (NSABP) trial B-31 suggested the efficacy of adjuvant trastuzumab, even in HER2-negative breast cancer. This finding prompted us to develop a predictive model for degree of benefit from trastuzumab using archived tumor blocks from B-31. Case subjects with tumor blocks were randomly divided into discovery (n = 588) and confirmation cohorts (n = 991). A predictive model was built from the discovery cohort through gene expression profiling of 462 genes with nCounter assay. A predefined cut point for the predictive model was tested in the confirmation cohort. Gene-by-treatment interaction was tested with Cox models, and correlations between variables were assessed with Spearman correlation. Principal component analysis was performed on the final set of selected genes. All statistical tests were two-sided. Eight predictive genes associated with HER2 (ERBB2, c17orf37, GRB7) or ER (ESR1, NAT1, GATA3, CA12, IGF1R) were selected for model building. Three-dimensional subset treatment effect pattern plot using two principal components of these genes was used to identify a subset with no benefit from trastuzumab, characterized by intermediate-level ERBB2 and high-level ESR1 mRNA expression. In the confirmation set, the predefined cut points for this model classified patients into three subsets with differential benefit from trastuzumab with hazard ratios of 1.58 (95% confidence interval [CI] = 0.67 to 3.69; P = .29; n = 100), 0.60 (95% CI = 0.41 to 0.89; P = .01; n = 449), and 0.28 (95% CI = 0.20 to 0.41; P < .001; n = 442; P(interaction) between the model and trastuzumab < .001). We developed a gene expression-based predictive model for degree of benefit from trastuzumab and demonstrated that HER2-negative tumors belong to the moderate benefit group, thus providing justification for testing trastuzumab in HER2-negative patients (NSABP B-47).
Predicting Degree of Benefit From Adjuvant Trastuzumab in NSABP Trial B-31
Pogue-Geile, Katherine L.; Kim, Chungyeul; Jeong, Jong-Hyeon; Tanaka, Noriko; Bandos, Hanna; Gavin, Patrick G.; Fumagalli, Debora; Goldstein, Lynn C.; Sneige, Nour; Burandt, Eike; Taniyama, Yusuke; Bohn, Olga L.; Lee, Ahwon; Kim, Seung-Il; Reilly, Megan L.; Remillard, Matthew Y.; Blackmon, Nicole L.; Kim, Seong-Rim; Horne, Zachary D.; Rastogi, Priya; Fehrenbacher, Louis; Romond, Edward H.; Swain, Sandra M.; Mamounas, Eleftherios P.; Wickerham, D. Lawrence; Geyer, Charles E.; Costantino, Joseph P.; Wolmark, Norman
2013-01-01
Background National Surgical Adjuvant Breast and Bowel Project (NSABP) trial B-31 suggested the efficacy of adjuvant trastuzumab, even in HER2-negative breast cancer. This finding prompted us to develop a predictive model for degree of benefit from trastuzumab using archived tumor blocks from B-31. Methods Case subjects with tumor blocks were randomly divided into discovery (n = 588) and confirmation cohorts (n = 991). A predictive model was built from the discovery cohort through gene expression profiling of 462 genes with nCounter assay. A predefined cut point for the predictive model was tested in the confirmation cohort. Gene-by-treatment interaction was tested with Cox models, and correlations between variables were assessed with Spearman correlation. Principal component analysis was performed on the final set of selected genes. All statistical tests were two-sided. Results Eight predictive genes associated with HER2 (ERBB2, c17orf37, GRB7) or ER (ESR1, NAT1, GATA3, CA12, IGF1R) were selected for model building. Three-dimensional subset treatment effect pattern plot using two principal components of these genes was used to identify a subset with no benefit from trastuzumab, characterized by intermediate-level ERBB2 and high-level ESR1 mRNA expression. In the confirmation set, the predefined cut points for this model classified patients into three subsets with differential benefit from trastuzumab with hazard ratios of 1.58 (95% confidence interval [CI] = 0.67 to 3.69; P = .29; n = 100), 0.60 (95% CI = 0.41 to 0.89; P = .01; n = 449), and 0.28 (95% CI = 0.20 to 0.41; P < .001; n = 442; P interaction between the model and trastuzumab < .001). Conclusions We developed a gene expression–based predictive model for degree of benefit from trastuzumab and demonstrated that HER2-negative tumors belong to the moderate benefit group, thus providing justification for testing trastuzumab in HER2-negative patients (NSABP B-47). PMID:24262440
Comparative mRNA analysis of behavioral and genetic mouse models of aggression.
Malki, Karim; Tosto, Maria G; Pain, Oliver; Sluyter, Frans; Mineur, Yann S; Crusio, Wim E; de Boer, Sietse; Sandnabba, Kenneth N; Kesserwani, Jad; Robinson, Edward; Schalkwyk, Leonard C; Asherson, Philip
2016-04-01
Mouse models of aggression have traditionally compared strains, most notably BALB/cJ and C57BL/6. However, these strains were not designed to study aggression despite differences in aggression-related traits and distinct reactivity to stress. This study evaluated expression of genes differentially regulated in a stress (behavioral) mouse model of aggression with those from a recent genetic mouse model aggression. The study used a discovery-replication design using two independent mRNA studies from mouse brain tissue. The discovery study identified strain (BALB/cJ and C57BL/6J) × stress (chronic mild stress or control) interactions. Probe sets differentially regulated in the discovery set were intersected with those uncovered in the replication study, which evaluated differences between high and low aggressive animals from three strains specifically bred to study aggression. Network analysis was conducted on overlapping genes uncovered across both studies. A significant overlap was found with the genetic mouse study sharing 1,916 probe sets with the stress model. Fifty-one probe sets were found to be strongly dysregulated across both studies mapping to 50 known genes. Network analysis revealed two plausible pathways including one centered on the UBC gene hub which encodes ubiquitin, a protein well-known for protein degradation, and another on P38 MAPK. Findings from this study support the stress model of aggression, which showed remarkable molecular overlap with a genetic model. The study uncovered a set of candidate genes including the Erg2 gene, which has previously been implicated in different psychopathologies. The gene networks uncovered points at a Redox pathway as potentially being implicated in aggressive related behaviors. © 2016 Wiley Periodicals, Inc.
Epithelial-Mesenchymal Transition (EMT) Gene Variants and Epithelial Ovarian Cancer (EOC) Risk.
Amankwah, Ernest K; Lin, Hui-Yi; Tyrer, Jonathan P; Lawrenson, Kate; Dennis, Joe; Chornokur, Ganna; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Bruinsma, Fiona; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Bisogna, Maria; Bjorge, Line; Bogdanova, Natalia; Brinton, Louise A; Brooks-Wilson, Angela; Bunker, Clareann H; Butzow, Ralf; Campbell, Ian G; Carty, Karen; Chen, Zhihua; Chen, Y Ann; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; du Bois, Andreas; Despierre, Evelyn; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; Dürst, Matthias; Easton, Douglas F; Eccles, Diana M; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goodman, Marc T; Gronwald, Jacek; Harrington, Patricia; Harter, Philipp; Hasmad, Hanis N; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Claus K; Hogdall, Estrid; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Jim, Heather; Kellar, Melissa; Kiemeney, Lambertus A; Krakstad, Camilla; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lim, Boon Kiong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon F A G; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Ian; Menon, Usha; Milne, Roger L; Modugno, Francesmary; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Eilber, Ursula; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Pike, Malcolm C; Poole, Elizabeth M; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schernhammer, Eva; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Spiewankiewicz, Beata; Sucheston-Campbell, Lara; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Thomsen, Lotte; Tangen, Ingvild L; Tworoger, Shelley S; van Altena, Anne M; Vierkant, Robert A; Vergote, Ignace; Walsh, Christine S; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Wu, Anna H; Wu, Xifeng; Woo, Yin-Ling; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Kelemen, Linda E; Berchuck, Andrew; Schildkraut, Joellen M; Ramus, Susan J; Goode, Ellen L; Monteiro, Alvaro N A; Gayther, Simon A; Narod, Steven A; Pharoah, Paul D P; Sellers, Thomas A; Phelan, Catherine M
2015-12-01
Epithelial-mesenchymal transition (EMT) is a process whereby epithelial cells assume mesenchymal characteristics to facilitate cancer metastasis. However, EMT also contributes to the initiation and development of primary tumors. Prior studies that explored the hypothesis that EMT gene variants contribute to epithelial ovarian carcinoma (EOC) risk have been based on small sample sizes and none have sought replication in an independent population. We screened 15,816 single-nucleotide polymorphisms (SNPs) in 296 genes in a discovery phase using data from a genome-wide association study of EOC among women of European ancestry (1,947 cases and 2,009 controls) and identified 793 variants in 278 EMT-related genes that were nominally (P < 0.05) associated with invasive EOC. These SNPs were then genotyped in a larger study of 14,525 invasive-cancer patients and 23,447 controls. A P-value <0.05 and a false discovery rate (FDR) <0.2 were considered statistically significant. In the larger dataset, GPC6/GPC5 rs17702471 was associated with the endometrioid subtype among Caucasians (odds ratio (OR) = 1.16, 95% CI = 1.07-1.25, P = 0.0003, FDR = 0.19), whereas F8 rs7053448 (OR = 1.69, 95% CI = 1.27-2.24, P = 0.0003, FDR = 0.12), F8 rs7058826 (OR = 1.69, 95% CI = 1.27-2.24, P = 0.0003, FDR = 0.12), and CAPN13 rs1983383 (OR = 0.79, 95% CI = 0.69-0.90, P = 0.0005, FDR = 0.12) were associated with combined invasive EOC among Asians. In silico functional analyses revealed that GPC6/GPC5 rs17702471 coincided with DNA regulatory elements. These results suggest that EMT gene variants do not appear to play a significant role in the susceptibility to EOC. © 2015 WILEY PERIODICALS, INC.
Epithelial-Mesenchymal Transition (EMT) gene variants and Epithelial Ovarian Cancer (EOC) risk
Amankwah, Ernest K.; Lin, Hui-Yi; Tyrer, Jonathan P.; Lawrenson, Kate; Dennis, Joe; Chornokur, Ganna; Aben, Katja KH.; Anton-Culver, Hoda; Antonenkova, Natalia; Bruinsma, Fiona; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Bisogna, Maria; Bjorge, Line; Bogdanova, Natalia; Brinton, Louise A.; Brooks-Wilson, Angela; Bunker, Clareann H.; Butzow, Ralf; Campbell, Ian G.; Carty, Karen; Chen, Zhihua; Chen, Y. Ann; Chang-Claude, Jenny; Cook, Linda S.; Cramer, Daniel W.; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; du Bois, Andreas; Despierre, Evelyn; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; Dürst, Matthias; Easton, Douglas F.; Eccles, Diana M.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goodman, Marc T.; Gronwald, Jacek; Harrington, Patricia; Harter, Philipp; Hasmad, Hanis N.; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Claus K.; Hogdall, Estrid; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y.; Jim, Heather; Kellar, Melissa; Kiemeney, Lambertus A.; Krakstad, Camilla; Kjaer, Susanne K.; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lim, Boon Kiong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon F.A.G.; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Ian; Menon, Usha; Milne, Roger L.; Modugno, Francesmary; Moysich, Kirsten B.; Ness, Roberta B.; Nevanlinna, Heli; Eilber, Ursula; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Paul, James; Pearce, Celeste L.; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Pike, Malcolm C.; Poole, Elizabeth M.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schernhammer, Eva; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B.; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Spiewankiewicz, Beata; Sucheston-Campbell, Lara; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J.; Thomsen, Lotte; Tangen, Ingvild L.; Tworoger, Shelley S.; van Altena, Anne M.; Vierkant, Robert A.; Vergote, Ignace; Walsh, Christine S.; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Wu, Anna H.; Wu, Xifeng; Woo, Yin-Ling; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Kelemen, Linda E.; Berchuck, Andrew; Schildkraut, Joellen M.; Ramus, Susan J.; Goode, Ellen L.; Monteiro, Alvaro N.A.; Gayther, Simon A.; Narod, Steven A.; Pharoah, Paul D. P.; Sellers, Thomas A.; Phelan, Catherine M.
2016-01-01
Introduction Epithelial-mesenchymal transition (EMT) is a process whereby epithelial cells assume mesenchymal characteristics to facilitate cancer metastasis. However, EMT also contributes to the initiation and development of primary tumors. Prior studies that explored the hypothesis that EMT gene variants contribute to EOC risk have been based on small sample sizes and none have sought replication in an independent population. Methods We screened 1254 SNPs in 296 genes in a discovery phase using data from a genome-wide association study of EOC among women of European ancestry (1,947 cases and 2,009 controls) and identified 793 variants in 278 EMT-related genes that were nominally (p<0.05) associated with invasive EOC. These SNPs were then genotyped in a larger study of 14,525 invasive-cancer patients and 23,447 controls. A p-value <0.05 and a false discovery rate (FDR) <0.2 was considered statistically significant. Results In the larger dataset, GPC6/GPC5 rs17702471 was associated with the endometrioid subtype among Caucasians (OR=1.16, 95%CI=1.07–1.25, p=0.0003, FDR=0.19), while F8 rs7053448 (OR=1.69, 95%CI=1.27–2.24, p=0.0003, FDR=0.12), F8 rs7058826 (OR=1.69, 95%CI=1.27–2.24, p=0.0003, FDR=0.12), and CAPN13 rs1983383 (OR=0.79, 95%CI=0.69–0.90, p=0.0005, FDR=0.12) were associated with combined invasive EOC among Asians. In silico functional analyses revealed that GPC6/GPC5 rs17702471 coincided with DNA regulatory elements. Conclusion These results suggest that EMT gene variants do not appear to play a significant role in the susceptibility to EOC. PMID:26399219
Pep2Path: automated mass spectrometry-guided genome mining of peptidic natural products.
Medema, Marnix H; Paalvast, Yared; Nguyen, Don D; Melnik, Alexey; Dorrestein, Pieter C; Takano, Eriko; Breitling, Rainer
2014-09-01
Nonribosomally and ribosomally synthesized bioactive peptides constitute a source of molecules of great biomedical importance, including antibiotics such as penicillin, immunosuppressants such as cyclosporine, and cytostatics such as bleomycin. Recently, an innovative mass-spectrometry-based strategy, peptidogenomics, has been pioneered to effectively mine microbial strains for novel peptidic metabolites. Even though mass-spectrometric peptide detection can be performed quite fast, true high-throughput natural product discovery approaches have still been limited by the inability to rapidly match the identified tandem mass spectra to the gene clusters responsible for the biosynthesis of the corresponding compounds. With Pep2Path, we introduce a software package to fully automate the peptidogenomics approach through the rapid Bayesian probabilistic matching of mass spectra to their corresponding biosynthetic gene clusters. Detailed benchmarking of the method shows that the approach is powerful enough to correctly identify gene clusters even in data sets that consist of hundreds of genomes, which also makes it possible to match compounds from unsequenced organisms to closely related biosynthetic gene clusters in other genomes. Applying Pep2Path to a data set of compounds without known biosynthesis routes, we were able to identify candidate gene clusters for the biosynthesis of five important compounds. Notably, one of these clusters was detected in a genome from a different subphylum of Proteobacteria than that in which the molecule had first been identified. All in all, our approach paves the way towards high-throughput discovery of novel peptidic natural products. Pep2Path is freely available from http://pep2path.sourceforge.net/, implemented in Python, licensed under the GNU General Public License v3 and supported on MS Windows, Linux and Mac OS X.
Bringing RNA Interference (RNAi) into the High School Classroom
ERIC Educational Resources Information Center
Sengupta, Sibani
2013-01-01
RNA interference (abbreviated RNAi) is a relatively new discovery in the field of mechanisms that serve to regulate gene expression (a.k.a. protein synthesis). Gene expression can be regulated at the transcriptional level (mRNA production, processing, or stability) and at the translational level (protein synthesis). RNAi acts in a gene-specific…
USDA-ARS?s Scientific Manuscript database
Single nucleotide polymorphisms (SNPs) in immune response genes have been reported as markers of susceptibility to infectious diseases in human and livestock. A disease caused by cyprinid herpes virus 3 (CyHV-3) is highly contagious and virulent in common carp. With the aim to investigate the gene...
Global gene expression in channel catfish after vaccination with an attenuated Edwardsiella ictaluri
USDA-ARS?s Scientific Manuscript database
To understand the global gene expression in channel catfish after immersion vaccination with an attenuated Edwardsiella ictaluri (AquaVac ESCTM), microarray analysis of 65,182 UniGene transcripts were performed. With a filter of false-discovery rate less than 0.05 and fold change greater than 2, a t...
Genetics of Mitochondrial Disease.
Saneto, Russell P
2017-01-01
Mitochondria are intracellular organelles responsible for adenosine triphosphate production. The strict control of intracellular energy needs require proper mitochondrial functioning. The mitochondria are under dual controls of mitochondrial DNA (mtDNA) and nuclear DNA (nDNA). Mitochondrial dysfunction can arise from changes in either mtDNA or nDNA genes regulating function. There are an estimated ∼1500 proteins in the mitoproteome, whereas the mtDNA genome has 37 proteins. There are, to date, ∼275 genes shown to give rise to disease. The unique physiology of mitochondrial functioning contributes to diverse gene expression. The onset and range of phenotypic expression of disease is diverse, with onset from neonatal to seventh decade of life. The range of dysfunction is heterogeneous, ranging from single organ to multisystem involvement. The complexity of disease expression has severely limited gene discovery. Combining phenotypes with improvements in gene sequencing strategies are improving the diagnosis process. This chapter focuses on the interplay of the unique physiology and gene discovery in the current knowledge of genetically derived mitochondrial disease. Copyright © 2017 Elsevier Inc. All rights reserved.
Exome Sequencing in Suspected Monogenic Dyslipidemias
Stitziel, Nathan O.; Peloso, Gina M.; Abifadel, Marianne; Cefalu, Angelo B.; Fouchier, Sigrid; Motazacker, M. Mahdi; Tada, Hayato; Larach, Daniel B.; Awan, Zuhier; Haller, Jorge F.; Pullinger, Clive R.; Varret, Mathilde; Rabès, Jean-Pierre; Noto, Davide; Tarugi, Patrizia; Kawashiri, Masa-aki; Nohara, Atsushi; Yamagishi, Masakazu; Risman, Marjorie; Deo, Rahul; Ruel, Isabelle; Shendure, Jay; Nickerson, Deborah A.; Wilson, James G.; Rich, Stephen S.; Gupta, Namrata; Farlow, Deborah N.; Neale, Benjamin M.; Daly, Mark J.; Kane, John P.; Freeman, Mason W.; Genest, Jacques; Rader, Daniel J.; Mabuchi, Hiroshi; Kastelein, John J.P.; Hovingh, G. Kees; Averna, Maurizio R.; Gabriel, Stacey; Boileau, Catherine; Kathiresan, Sekar
2015-01-01
Background Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias. Methods and Results We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease. Conclusions We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies. PMID:25632026
The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities
Chong, Jessica X.; Buckingham, Kati J.; Jhangiani, Shalini N.; Boehm, Corinne; Sobreira, Nara; Smith, Joshua D.; Harrell, Tanya M.; McMillin, Margaret J.; Wiszniewski, Wojciech; Gambin, Tomasz; Coban Akdemir, Zeynep H.; Doheny, Kimberly; Scott, Alan F.; Avramopoulos, Dimitri; Chakravarti, Aravinda; Hoover-Fong, Julie; Mathews, Debra; Witmer, P. Dane; Ling, Hua; Hetrick, Kurt; Watkins, Lee; Patterson, Karynne E.; Reinier, Frederic; Blue, Elizabeth; Muzny, Donna; Kircher, Martin; Bilguvar, Kaya; López-Giráldez, Francesc; Sutton, V. Reid; Tabor, Holly K.; Leal, Suzanne M.; Gunel, Murat; Mane, Shrikant; Gibbs, Richard A.; Boerwinkle, Eric; Hamosh, Ada; Shendure, Jay; Lupski, James R.; Lifton, Richard P.; Valle, David; Nickerson, Deborah A.; Bamshad, Michael J.
2015-01-01
Discovering the genetic basis of a Mendelian phenotype establishes a causal link between genotype and phenotype, making possible carrier and population screening and direct diagnosis. Such discoveries also contribute to our knowledge of gene function, gene regulation, development, and biological mechanisms that can be used for developing new therapeutics. As of February 2015, 2,937 genes underlying 4,163 Mendelian phenotypes have been discovered, but the genes underlying ∼50% (i.e., 3,152) of all known Mendelian phenotypes are still unknown, and many more Mendelian conditions have yet to be recognized. This is a formidable gap in biomedical knowledge. Accordingly, in December 2011, the NIH established the Centers for Mendelian Genomics (CMGs) to provide the collaborative framework and infrastructure necessary for undertaking large-scale whole-exome sequencing and discovery of the genetic variants responsible for Mendelian phenotypes. In partnership with 529 investigators from 261 institutions in 36 countries, the CMGs assessed 18,863 samples from 8,838 families representing 579 known and 470 novel Mendelian phenotypes as of January 2015. This collaborative effort has identified 956 genes, including 375 not previously associated with human health, that underlie a Mendelian phenotype. These results provide insight into study design and analytical strategies, identify novel mechanisms of disease, and reveal the extensive clinical variability of Mendelian phenotypes. Discovering the gene underlying every Mendelian phenotype will require tackling challenges such as worldwide ascertainment and phenotypic characterization of families affected by Mendelian conditions, improvement in sequencing and analytical techniques, and pervasive sharing of phenotypic and genomic data among researchers, clinicians, and families. PMID:26166479
2013-01-01
Background The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. Results We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. Conclusions These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies. PMID:23496952
Francis, Warren R; Christianson, Lynne M; Kiko, Rainer; Powers, Meghan L; Shaner, Nathan C; Haddock, Steven H D
2013-03-12
The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies.
2014-01-01
Background Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. Results In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. Conclusions As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality. PMID:24755115
Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allgaier, M.; Reddy, A.; Park, J. I.
2009-11-15
Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Small-subunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence datamore » from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, {approx}10% were putative cellulases mostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50 C and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.« less
Targeted Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reddy, Amitha; Allgaier, Martin; Park, Joshua I.
2011-05-11
Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Smallsubunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence datamore » from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, ,10percent were putative cellulasesmostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50uC and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.« less
Genetic barcoding with fluorescent proteins for multiplexed applications.
Smurthwaite, Cameron A; Williams, Wesley; Fetsko, Alexandra; Abbadessa, Darin; Stolp, Zachary D; Reed, Connor W; Dharmawan, Andre; Wolkowicz, Roland
2015-04-14
Fluorescent proteins, fluorescent dyes and fluorophores in general have revolutionized the field of molecular cell biology. In particular, the discovery of fluorescent proteins and their genes have enabled the engineering of protein fusions for localization, the analysis of transcriptional activation and translation of proteins of interest, or the general tracking of individual cells and cell populations. The use of fluorescent protein genes in combination with retroviral technology has further allowed the expression of these proteins in mammalian cells in a stable and reliable manner. Shown here is how one can utilize these genes to give cells within a population of cells their own biosignature. As the biosignature is achieved with retroviral technology, cells are barcoded 'indefinitely'. As such, they can be individually tracked within a mixture of barcoded cells and utilized in more complex biological applications. The tracking of distinct populations in a mixture of cells is ideal for multiplexed applications such as discovery of drugs against a multitude of targets or the activation profile of different promoters. The protocol describes how to elegantly develop and amplify barcoded mammalian cells with distinct genetic fluorescent markers, and how to use several markers at once or one marker at different intensities. Finally, the protocol describes how the cells can be further utilized in combination with cell-based assays to increase the power of analysis through multiplexing.
Structure and evolution of cereal genomes.
Paterson, Andrew H; Bowers, John E; Peterson, Daniel G; Estill, James C; Chapman, Brad A
2003-12-01
The cereal species, of central importance to our diet, began to diverge 50-70 million years ago. For the past few thousand years, these species have undergone largely parallel selection regimes associated with domestication and improvement. The rice genome sequence provides a platform for organizing information about diverse cereals, and together with genetic maps and sequence samples from other cereals is yielding new insights into both the shared and the independent dimensions of cereal evolution. New data and population-based approaches are identifying genes that have been involved in cereal improvement. Reduced-representation sequencing promises to accelerate gene discovery in many large-genome cereals, and to better link the under-explored genomes of 'orphan' cereals with state-of-the-art knowledge.
2009-01-01
Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. PMID:19758426
Analysis and modelling of septic shock microarray data using Singular Value Decomposition.
Allanki, Srinivas; Dixit, Madhulika; Thangaraj, Paul; Sinha, Nandan Kumar
2017-06-01
Being a high throughput technique, enormous amounts of microarray data has been generated and there arises a need for more efficient techniques of analysis, in terms of speed and accuracy. Finding the differentially expressed genes based on just fold change and p-value might not extract all the vital biological signals that occur at a lower gene expression level. Besides this, numerous mathematical models have been generated to predict the clinical outcome from microarray data, while very few, if not none, aim at predicting the vital genes that are important in a disease progression. Such models help a basic researcher narrow down and concentrate on a promising set of genes which leads to the discovery of gene-based therapies. In this article, as a first objective, we have used the lesser known and used Singular Value Decomposition (SVD) technique to build a microarray data analysis tool that works with gene expression patterns and intrinsic structure of the data in an unsupervised manner. We have re-analysed a microarray data over the clinical course of Septic shock from Cazalis et al. (2014) and have shown that our proposed analysis provides additional information compared to the conventional method. As a second objective, we developed a novel mathematical model that predicts a set of vital genes in the disease progression that works by generating samples in the continuum between health and disease, using a simple normal-distribution-based random number generator. We also verify that most of the predicted genes are indeed related to septic shock. Copyright © 2017 Elsevier Inc. All rights reserved.
Informed walks: whispering hints to gene hunters inside networks' jungle.
Bourdakou, Marilena M; Spyrou, George M
2017-10-11
Systemic approaches offer a different point of view on the analysis of several types of molecular associations as well as on the identification of specific gene communities in several cancer types. However, due to lack of sufficient data needed to construct networks based on experimental evidence, statistical gene co-expression networks are widely used instead. Many efforts have been made to exploit the information hidden in these networks. However, these approaches still need to capitalize comprehensively the prior knowledge encrypted into molecular pathway associations and improve their efficiency regarding the discovery of both exclusive subnetworks as candidate biomarkers and conserved subnetworks that may uncover common origins of several cancer types. In this study we present the development of the Informed Walks model based on random walks that incorporate information from molecular pathways to mine candidate genes and gene-gene links. The proposed model has been applied to TCGA (The Cancer Genome Atlas) datasets from seven different cancer types, exploring the reconstructed co-expression networks of the whole set of genes and driving to highlighted sub-networks for each cancer type. In the sequel, we elucidated the impact of each subnetwork on the indication of underlying exclusive and common molecular mechanisms as well as on the short-listing of drugs that have the potential to suppress the corresponding cancer type through a drug-repurposing pipeline. We have developed a method of gene subnetwork highlighting based on prior knowledge, capable to give fruitful insights regarding the underlying molecular mechanisms and valuable input to drug-repurposing pipelines for a variety of cancer types.
Researchers use Modified CRISPR Systems to Modulate Gene Expression on a Genomic Scale
Cancer Target Discovery and Development Network (CTD2) researchers at the University of California, San Francisco, developed a CRISPR system that can regulate both gene repression and activation with fewer off-target effects.
Genetic manipulation and monitoring of autophagy in Drosophila.
Neufeld, Thomas P
2008-01-01
Drosophila melanogaster provides a model system useful for many aspects of the study of autophagy in vivo. These include testing and validation of genes potentially involved in autophagy, discovery of novel genes through genetic screening for mutations that affect autophagy, and analysis of potential roles of autophagy in specific developmental or physiological processes. In recent years, a number of techniques and transgenic and mutant fly strains have been developed to facilitate autophagy analysis in this system. Here, protocols are described for activating or inhibiting autophagy in Drosophila, and for examining the progression of autophagy in vivo through imaging-based assays. The goal of this chapter is to provide a resource both for autophagy investigators with limited familiarity with fly genetics, as well as for experienced Drosophila biologists who wish to test for connections between autophagy and a given gene, pathway or process.
Bullich, Gemma; Trujillano, Daniel; Santín, Sheila; Ossowski, Stephan; Mendizábal, Santiago; Fraga, Gloria; Madrid, Álvaro; Ariceta, Gema; Ballarín, José; Torra, Roser; Estivill, Xavier; Ars, Elisabet
2015-09-01
Genetic diagnosis of steroid-resistant nephrotic syndrome (SRNS) using Sanger sequencing is complicated by the high genetic heterogeneity and phenotypic variability of this disease. We aimed to improve the genetic diagnosis of SRNS by simultaneously sequencing 26 glomerular genes using massive parallel sequencing and to study whether mutations in multiple genes increase disease severity. High-throughput mutation analysis was performed in 50 SRNS and/or focal segmental glomerulosclerosis (FSGS) patients, a validation cohort of 25 patients with known pathogenic mutations, and a discovery cohort of 25 uncharacterized patients with probable genetic etiology. In the validation cohort, we identified the 42 previously known pathogenic mutations across NPHS1, NPHS2, WT1, TRPC6, and INF2 genes. In the discovery cohort, disease-causing mutations in SRNS/FSGS genes were found in nine patients. We detected three patients with mutations in an SRNS/FSGS gene and COL4A3. Two of them were familial cases and presented a more severe phenotype than family members with mutation in only one gene. In conclusion, our results show that massive parallel sequencing is feasible and robust for genetic diagnosis of SRNS/FSGS. Our results indicate that patients carrying mutations in an SRNS/FSGS gene and also in COL4A3 gene have increased disease severity.
Comparative Oncogenomics for Peripheral Nerve Sheath Cancer Gene Discovery
2015-06-01
neurofibromas and MPNSTs, establish gene signatures defining distinct tumor subtypes and functionally test the role of selected driver mutations ...allografted tumor cells, and a variety of in vitro functional assays. We will validate the relevance of these mutated mouse genes in human neurofibromas...and MPNSTs by determining whether these same genes are mutated in human tumors. 15. SUBJECT TERMS Nothing listed 16. SECURITY CLASSIFICATION OF: 17
A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.
Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J
2016-02-01
Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
2016-01-01
Covering: 2003 to 2016 The last decade has seen the first major discoveries regarding the genomic basis of plant natural product biosynthetic pathways. Four key computationally driven strategies have been developed to identify such pathways, which make use of physical clustering, co-expression, evolutionary co-occurrence and epigenomic co-regulation of the genes involved in producing a plant natural product. Here, we discuss how these approaches can be used for the discovery of plant biosynthetic pathways encoded by both chromosomally clustered and non-clustered genes. Additionally, we will discuss opportunities to prioritize plant gene clusters for experimental characterization, and end with a forward-looking perspective on how synthetic biology technologies will allow effective functional reconstitution of candidate pathways using a variety of genetic systems. PMID:27321668
The autoinflammatory diseases: a fashion with blurred boundaries!
Sarrabay, G; Barat-Houari, M; Annakib, S; Touitou, I
2015-07-01
Monogenic autoinflammatory diseases are defined as a group of conditions with a clinical and biological inflammatory syndrome but little or no evidence of autoimmunity. Over 17 years have passed since the discovery of the first autoinflammatory gene, MEFV, responsible for familial Mediterranean fever. Substantive progress has been made since then, highlighting the key role of the inflammasome in the maintenance of the cell homeostasis but also unravelling new pathophysiological pathways involved in these diseases. The history of autoinflammatory gene discovery demonstrates the powerfulness of next-generation sequencing approaches in linking inflammatory disorders with various overlapping phenotypes. It can be easily anticipated that new genes will be exponentially identified in the coming years. Integrating these new concepts should help to promote personalized patient care through novel therapeutic opportunities.
Bowerman, Bruce
2011-10-01
Molecular genetic investigation of the early Caenorhabditis elegans embryo has contributed substantially to the discovery and general understanding of the genes, pathways, and mechanisms that regulate and execute developmental and cell biological processes. Initially, worm geneticists relied exclusively on a classical genetics approach, isolating mutants with interesting phenotypes after mutagenesis and then determining the identity of the affected genes. Subsequently, the discovery of RNA interference (RNAi) led to a much greater reliance on a reverse genetics approach: reducing the function of known genes with RNAi and then observing the phenotypic consequences. Now the advent of next-generation DNA sequencing technologies and the ensuing ease and affordability of whole-genome sequencing are reviving the use of classical genetics to investigate early C. elegans embryogenesis.
Pathway-selective sensitization of Mycobacterium tuberculosis for target-based whole-cell screening
Abrahams, Garth L.; Kumar, Anuradha; Savvi, Suzana; Hung, Alvin W.; Wen, Shijun; Abell, Chris; Barry, Clifton E.; Sherman, David R.; Boshoff, Helena I.M.; Mizrahi, Valerie
2012-01-01
SUMMARY Whole-cell screening of Mycobacterium tuberculosis (Mtb) remains a mainstay of drug discovery but subsequent target elucidation often proves difficult. Conditional mutants that under-express essential genes have been used to identify compounds with known mechanism of action by target-based whole-cell screening (TB-WCS). Here, the feasibility of TB-WCS in Mtb was assessed by generating mutants that conditionally express pantothenate synthetase (panC), diaminopimelate decarboxylase (lysA) and isocitrate lyase (icl1). The essentiality of panC and lysA, and conditional essentiality of icl1 for growth on fatty acids, was confirmed. Depletion of PanC and Icl1 rendered the mutants hypersensitive to target-specific inhibitors. Stable reporter strains were generated for use in high-throughput screening, and their utility demonstrated by identifying compounds that display greater potency against a PanC-depleted strain. These findings illustrate the power of TB-WCS as a tool for tuberculosis drug discovery. PMID:22840772
Shameer, Khader; Dow, Garrett; Glicksberg, Benjamin S; Johnson, Kipp W; Ze, Yi; Tomlinson, Max S; Readhead, Ben; Dudley, Joel T; Kullo, Iftikhar J
2018-01-01
Currently, drug discovery approaches focus on the design of therapies that alleviate an index symptom by reengineering the underlying biological mechanism in agonistic or antagonistic fashion. For example, medicines are routinely developed to target an essential gene that drives the disease mechanism. Therapeutic overloading where patients get multiple medications to reduce the primary and secondary side effect burden is standard practice. This single-symptom based approach may not be scalable, as we understand that diseases are more connected than random and molecular interactions drive disease comorbidities. In this work, we present a proof-of-concept drug discovery strategy by combining network biology, disease comorbidity estimates, and computational drug repositioning, by targeting the risk factors and comorbidities of peripheral artery disease, a vascular disease associated with high morbidity and mortality. Individualized risk estimation and recommending disease sequelae based therapies may help to lower the mortality and morbidity of peripheral artery disease.
Shameer, Khader; Dow, Garrett; Glicksberg, Benjamin S.; Johnson, Kipp W.; Ze, Yi; Tomlinson, Max S.; Readhead, Ben; Dudley, Joel T.; Kullo, Iftikhar J.
2018-01-01
Currently, drug discovery approaches focus on the design of therapies that alleviate an index symptom by reengineering the underlying biological mechanism in agonistic or antagonistic fashion. For example, medicines are routinely developed to target an essential gene that drives the disease mechanism. Therapeutic overloading where patients get multiple medications to reduce the primary and secondary side effect burden is standard practice. This single-symptom based approach may not be scalable, as we understand that diseases are more connected than random and molecular interactions drive disease comorbidities. In this work, we present a proof-of-concept drug discovery strategy by combining network biology, disease comorbidity estimates, and computational drug repositioning, by targeting the risk factors and comorbidities of peripheral artery disease, a vascular disease associated with high morbidity and mortality. Individualized risk estimation and recommending disease sequelae based therapies may help to lower the mortality and morbidity of peripheral artery disease. PMID:29888052
Differentiation and Transplantation of Human Embryonic Stem Cell-Derived Hepatocytes
Basma, Hesham; Soto-Gutiérrez, Alejandro; Yannam, Govardhana Rao; Liu, Liping; Ito, Ryotaro; Yamamoto, Toshiyuki; Ellis, Ewa; Carson, Steven D.; Sato, Shintaro; Chen, Yong; Muirhead, David; Navarro-Álvarez, Nalu; Wong, Ron; Roy-Chowdhury, Jayanta; Platt, Jeffrey L.; Mercer, David F.; Miller, John D.; Strom, Stephen C.; Kobayashi, Noaya; Fox, Ira J.
2009-01-01
Background & Aims The ability to obtain unlimited numbers of human hepatocytes would improve development of cell-based therapies for liver diseases, facilitate the study of liver biology and improve the early stages of drug discovery. Embryonic stem cells are pluripotent, can potentially differentiate into any cell type and could therefore be developed as a source of human hepatocytes. Methods To generate human hepatocytes, human embryonic stem cells were differentiated by sequential culture in fibroblast growth factor 2 and human Activin-A, hepatocyte growth factor, and dexamethasone. Functional hepatocytes were isolated by sorting for surface asialoglycoprotein receptor expression. Characterization was performed by real-time PCR, imunohistochemistry, immunoblot, functional assays and transplantation. Results Embryonic stem cell-derived hepatocytes expressed liver-specific genes but not genes representing other lineages, secreted functional human liver-specific proteins similar to those of primary human hepatocytes and demonstrated human hepatocyte cytochrome P450 metabolic activity. Serum from rodents given injections of embryonic stem cell-derived hepatocytes contained significant amounts of human albumin and alpha-1-antitrypsin. Colonies of cytokeratin-18 and human albumin-expressing cells were present in the livers of recipient animals. Conclusion Human embryonic stem cells can be differentiated into cells with many characteristics of primary human hepatocytes. Hepatocyte-like cells can be enriched and recovered based on asialoglycoprotein receptor expression and could potentially be used in drug discovery research and developed as therapeutics. PMID:19026649