PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources.
Kahanda, Indika; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa
2015-01-01
The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.
Notaro, Marco; Schubach, Max; Robinson, Peter N; Valentini, Giorgio
2017-10-12
The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
A comparative analysis of soft computing techniques for gene prediction.
Goel, Neelam; Singh, Shailendra; Aseri, Trilok Chand
2013-07-01
The rapid growth of genomic sequence data for both human and nonhuman species has made analyzing these sequences, especially predicting genes in them, very important and is currently the focus of many research efforts. Beside its scientific interest in the molecular biology and genomics community, gene prediction is of considerable importance in human health and medicine. A variety of gene prediction techniques have been developed for eukaryotes over the past few years. This article reviews and analyzes the application of certain soft computing techniques in gene prediction. First, the problem of gene prediction and its challenges are described. These are followed by different soft computing techniques along with their application to gene prediction. In addition, a comparative analysis of different soft computing techniques for gene prediction is given. Finally some limitations of the current research activities and future research directions are provided. Copyright © 2013 Elsevier Inc. All rights reserved.
Probability-based collaborative filtering model for predicting gene-disease associations.
Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan
2017-12-28
Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.
Gao, J; Naglich, J G; Laidlaw, J; Whaley, J M; Seizinger, B R; Kley, N
1995-02-15
The human von Hippel-Lindau disease (VHL) gene has recently been identified and, based on the nucleotide sequence of a partial cDNA clone, has been predicted to encode a novel protein with as yet unknown functions [F. Latif et al., Science (Washington DC), 260: 1317-1320, 1993]. The length of the encoded protein and the characteristics of the cellular expressed protein are as yet unclear. Here we report the cloning and characterization of a mouse gene (mVHLh1) that is widely expressed in different mouse tissues and shares high homology with the human VHL gene. It predicts a protein 181 residues long (and/or 162 amino acids, considering a potential alternative start codon), which across a core region of approximately 140 residues displays a high degree of sequence identity (98%) to the predicted human VHL protein. High stringency DNA and RNA hybridization experiments and protein expression analyses indicate that this gene is the most highly VHL-related mouse gene, suggesting that it represents the mouse VHL gene homologue rather than a related gene sharing a conserved functional domain. These findings provide new insights into the potential organization of the VHL gene and nature of its encoded protein.
Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis
Grassi, Elena; Damasco, Christian; Silengo, Lorenzo; Oti, Martin; Provero, Paolo; Di Cunto, Ferdinando
2008-01-01
Background Even in the post-genomic era, the identification of candidate genes within loci associated with human genetic diseases is a very demanding task, because the critical region may typically contain hundreds of positional candidates. Since genes implicated in similar phenotypes tend to share very similar expression profiles, high throughput gene expression data may represent a very important resource to identify the best candidates for sequencing. However, so far, gene coexpression has not been used very successfully to prioritize positional candidates. Methodology/Principal Findings We show that it is possible to reliably identify disease-relevant relationships among genes from massive microarray datasets by concentrating only on genes sharing similar expression profiles in both human and mouse. Moreover, we show systematically that the integration of human-mouse conserved coexpression with a phenotype similarity map allows the efficient identification of disease genes in large genomic regions. Finally, using this approach on 850 OMIM loci characterized by an unknown molecular basis, we propose high-probability candidates for 81 genetic diseases. Conclusion Our results demonstrate that conserved coexpression, even at the human-mouse phylogenetic distance, represents a very strong criterion to predict disease-relevant relationships among human genes. PMID:18369433
Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data
2013-01-01
Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression. PMID:23369200
Hériché, Jean-Karim; Lees, Jon G.; Morilla, Ian; Walter, Thomas; Petrova, Boryana; Roberti, M. Julia; Hossain, M. Julius; Adler, Priit; Fernández, José M.; Krallinger, Martin; Haering, Christian H.; Vilo, Jaak; Valencia, Alfonso; Ranea, Juan A.; Orengo, Christine; Ellenberg, Jan
2014-01-01
The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest. PMID:24943848
Predicting neuroblastoma using developmental signals and a logic-based model.
Kasemeier-Kulesa, Jennifer C; Schnell, Santiago; Woolley, Thomas; Spengler, Jennifer A; Morrison, Jason A; McKinney, Mary C; Pushel, Irina; Wolfe, Lauren A; Kulesa, Paul M
2018-07-01
Genomic information from human patient samples of pediatric neuroblastoma cancers and known outcomes have led to specific gene lists put forward as high risk for disease progression. However, the reliance on gene expression correlations rather than mechanistic insight has shown limited potential and suggests a critical need for molecular network models that better predict neuroblastoma progression. In this study, we construct and simulate a molecular network of developmental genes and downstream signals in a 6-gene input logic model that predicts a favorable/unfavorable outcome based on the outcome of the four cell states including cell differentiation, proliferation, apoptosis, and angiogenesis. We simulate the mis-expression of the tyrosine receptor kinases, trkA and trkB, two prognostic indicators of neuroblastoma, and find differences in the number and probability distribution of steady state outcomes. We validate the mechanistic model assumptions using RNAseq of the SHSY5Y human neuroblastoma cell line to define the input states and confirm the predicted outcome with antibody staining. Lastly, we apply input gene signatures from 77 published human patient samples and show that our model makes more accurate disease outcome predictions for early stage disease than any current neuroblastoma gene list. These findings highlight the predictive strength of a logic-based model based on developmental genes and offer a better understanding of the molecular network interactions during neuroblastoma disease progression. Copyright © 2018. Published by Elsevier B.V.
Factors affecting interactome-based prediction of human genes associated with clinical signs.
González-Pérez, Sara; Pazos, Florencio; Chagoyen, Mónica
2017-07-17
Clinical signs are a fundamental aspect of human pathologies. While disease diagnosis is problematic or impossible in many cases, signs are easier to perceive and categorize. Clinical signs are increasingly used, together with molecular networks, to prioritize detected variants in clinical genomics pipelines, even if the patient is still undiagnosed. Here we analyze the ability of these network-based methods to predict genes that underlie clinical signs from the human interactome. Our analysis reveals that these approaches can locate genes associated with clinical signs with variable performance that depends on the sign and associated disease. We analyzed several clinical and biological factors that explain these variable results, including number of genes involved (mono- vs. oligogenic diseases), mode of inheritance, type of clinical sign and gene product function. Our results indicate that the characteristics of the clinical signs and their related diseases should be considered for interpreting the results of network-prediction methods, such as those aimed at discovering disease-related genes and variants. These results are important due the increasing use of clinical signs as an alternative to diseases for studying the molecular basis of human pathologies.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A
2006-01-01
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.
Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M
2013-06-21
Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.
Prediction of gene expression with cis-SNPs using mixed models and regularization methods.
Zeng, Ping; Zhou, Xiang; Huang, Shuiping
2017-05-11
It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
A critical assessment of Mus musculus gene function prediction using integrated genomic evidence
Peña-Castillo, Lourdes; Tasan, Murat; Myers, Chad L; Lee, Hyunju; Joshi, Trupti; Zhang, Chao; Guan, Yuanfang; Leone, Michele; Pagnani, Andrea; Kim, Wan Kyu; Krumpelman, Chase; Tian, Weidong; Obozinski, Guillaume; Qi, Yanjun; Mostafavi, Sara; Lin, Guan Ning; Berriz, Gabriel F; Gibbons, Francis D; Lanckriet, Gert; Qiu, Jian; Grant, Charles; Barutcuoglu, Zafer; Hill, David P; Warde-Farley, David; Grouios, Chris; Ray, Debajyoti; Blake, Judith A; Deng, Minghua; Jordan, Michael I; Noble, William S; Morris, Quaid; Klein-Seetharaman, Judith; Bar-Joseph, Ziv; Chen, Ting; Sun, Fengzhu; Troyanskaya, Olga G; Marcotte, Edward M; Xu, Dong; Hughes, Timothy R; Roth, Frederick P
2008-01-01
Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized. PMID:18613946
Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma.
Fowles, Jared S; Brown, Kristen C; Hess, Ann M; Duval, Dawn L; Gustafson, Daniel L
2016-02-19
Genomics-based predictors of drug response have the potential to improve outcomes associated with cancer therapy. Osteosarcoma (OS), the most common primary bone cancer in dogs, is commonly treated with adjuvant doxorubicin or carboplatin following amputation of the affected limb. We evaluated the use of gene-expression based models built in an intra- or interspecies manner to predict chemosensitivity and treatment outcome in canine OS. Models were built and evaluated using microarray gene expression and drug sensitivity data from human and canine cancer cell lines, and canine OS tumor datasets. The "COXEN" method was utilized to filter gene signatures between human and dog datasets based on strong co-expression patterns. Models were built using linear discriminant analysis via the misclassification penalized posterior algorithm. The best doxorubicin model involved genes identified in human lines that were co-expressed and trained on canine OS tumor data, which accurately predicted clinical outcome in 73 % of dogs (p = 0.0262, binomial). The best carboplatin model utilized canine lines for gene identification and model training, with canine OS tumor data for co-expression. Dogs whose treatment matched our predictions had significantly better clinical outcomes than those that didn't (p = 0.0006, Log Rank), and this predictor significantly associated with longer disease free intervals in a Cox multivariate analysis (hazard ratio = 0.3102, p = 0.0124). Our data show that intra- and interspecies gene expression models can successfully predict response in canine OS, which may improve outcome in dogs and serve as pre-clinical validation for similar methods in human cancer research.
The truth about mouse, human, worms and yeast
2004-01-01
Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years. PMID:15601543
The truth about mouse, human, worms and yeast.
Nelson, David R; Nebert, Daniel W
2004-01-01
Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years.
Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.
2013-01-01
Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943
Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J
2013-01-01
Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.
An integrative approach to ortholog prediction for disease-focused and other functional studies.
Hu, Yanhui; Flockhart, Ian; Vinayagam, Arunachalam; Bergwitz, Clemens; Berger, Bonnie; Perrimon, Norbert; Mohr, Stephanie E
2011-08-31
Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Integrative analyses shed new light on human ribosomal protein gene regulation
Li, Xin; Zheng, Yiyu; Hu, Haiyan; Li, Xiaoman
2016-01-01
Ribosomal protein genes (RPGs) are important house-keeping genes that are well-known for their coordinated expression. Previous studies on RPGs are largely limited to their promoter regions. Recent high-throughput studies provide an unprecedented opportunity to study how human RPGs are transcriptionally modulated and how such transcriptional regulation may contribute to the coordinate gene expression in various tissues and cell types. By analyzing the DNase I hypersensitive sites under 349 experimental conditions, we predicted 217 RPG regulatory regions in the human genome. More than 86.6% of these computationally predicted regulatory regions were partially corroborated by independent experimental measurements. Motif analyses on these predicted regulatory regions identified 31 DNA motifs, including 57.1% of experimentally validated motifs in literature that regulate RPGs. Interestingly, we observed that the majority of the predicted motifs were shared by the predicted distal and proximal regulatory regions of the same RPGs, a likely general mechanism for enhancer-promoter interactions. We also found that RPGs may be differently regulated in different cells, indicating that condition-specific RPG regulatory regions still need to be discovered and investigated. Our study advances the understanding of how RPGs are coordinately modulated, which sheds light to the general principles of gene transcriptional regulation in mammals. PMID:27346035
Integrative analyses shed new light on human ribosomal protein gene regulation.
Li, Xin; Zheng, Yiyu; Hu, Haiyan; Li, Xiaoman
2016-06-27
Ribosomal protein genes (RPGs) are important house-keeping genes that are well-known for their coordinated expression. Previous studies on RPGs are largely limited to their promoter regions. Recent high-throughput studies provide an unprecedented opportunity to study how human RPGs are transcriptionally modulated and how such transcriptional regulation may contribute to the coordinate gene expression in various tissues and cell types. By analyzing the DNase I hypersensitive sites under 349 experimental conditions, we predicted 217 RPG regulatory regions in the human genome. More than 86.6% of these computationally predicted regulatory regions were partially corroborated by independent experimental measurements. Motif analyses on these predicted regulatory regions identified 31 DNA motifs, including 57.1% of experimentally validated motifs in literature that regulate RPGs. Interestingly, we observed that the majority of the predicted motifs were shared by the predicted distal and proximal regulatory regions of the same RPGs, a likely general mechanism for enhancer-promoter interactions. We also found that RPGs may be differently regulated in different cells, indicating that condition-specific RPG regulatory regions still need to be discovered and investigated. Our study advances the understanding of how RPGs are coordinately modulated, which sheds light to the general principles of gene transcriptional regulation in mammals.
The prediction of biogenic magnetic nanoparticles biomineralization in human tissues and organs
NASA Astrophysics Data System (ADS)
Medviediev, O.; Gorobets, O. Yu; Gorobets, S. V.; Yadrykhins'ky, V. S.
2017-10-01
In this study, human homologs of magnetosome island proteins basing on pairwise and multiple alignment of amino acid sequences were found. The expression levels of genes, which encode magnetosome island proteins of M. gryphiswaldense MSR-1, that were cultured under oxygen deficiency conditions and also under microaerobic conditions were compared to the expression levels of genes that encode the relevant homologs in human organism. The possibility of BMN biomineralization in human tissues and organs, in which BMN were not experimentally found before, was predicted.
Systematic Characterization and Prediction of Human Hypertension Genes.
Li, Yan-Hui; Zhang, Gai-Gai; Wang, Nanping
2017-02-01
Hypertension is a major cardiovascular risk factor and accounts for a large part of cardiovascular mortality. In this work, we analyzed the properties of hypertension genes and found that when compared with genes not yet known to be involved in hypertension regulation, known hypertension genes display distinguishing features: (1) hypertension genes tend to be located at network center; (2) hypertension genes tend to interact with each other; and (3) hypertension genes tend to enrich in certain biological processes and show certain phenotypes. Based on these features, we developed a machine-learning algorithm to predict new hypertension genes. One hundred and seventy-seven candidates were predicted with a posterior probability >0.9. Evidence supporting 17 of the predictions has been found. © 2016 American Heart Association, Inc.
EGASP: the human ENCODE Genome Annotation Assessment Project
Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G
2006-01-01
Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836
Putnam, Christopher D.; Srivatsan, Anjana; Nene, Rahul V.; Martinez, Sandra L.; Clotfelter, Sarah P.; Bell, Sara N.; Somach, Steven B.; E.S. de Souza, Jorge; Fonseca, André F.; de Souza, Sandro J.; Kolodner, Richard D.
2016-01-01
Gross chromosomal rearrangements (GCRs) play an important role in human diseases, including cancer. The identity of all Genome Instability Suppressing (GIS) genes is not currently known. Here multiple Saccharomyces cerevisiae GCR assays and query mutations were crossed into arrays of mutants to identify progeny with increased GCR rates. One hundred eighty two GIS genes were identified that suppressed GCR formation. Another 438 cooperatively acting GIS genes were identified that were not GIS genes, but suppressed the increased genome instability caused by individual query mutations. Analysis of TCGA data using the human genes predicted to act in GIS pathways revealed that a minimum of 93% of ovarian and 66% of colorectal cancer cases had defects affecting one or more predicted GIS gene. These defects included loss-of-function mutations, copy-number changes associated with reduced expression, and silencing. In contrast, acute myeloid leukaemia cases did not appear to have defects affecting the predicted GIS genes. PMID:27071721
Zhang, Yaogong; Liu, Jiahui; Liu, Xiaohu; Hong, Yuxiang; Fan, Xin; Huang, Yalou; Wang, Yuan; Xie, Maoqiang
2018-04-24
Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs.
Holmes, Roger S; Goldberg, Erwin
2009-10-01
Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals.
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs
Holmes, Roger S; Goldberg, Erwin
2009-01-01
Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals. PMID:19679512
2011-01-01
Background Allergic contact dermatitis is an inflammatory skin disease that affects a significant proportion of the population. This disease is caused by an adverse immune response towards chemical haptens, and leads to a substantial economic burden for society. Current test of sensitizing chemicals rely on animal experimentation. New legislations on the registration and use of chemicals within pharmaceutical and cosmetic industries have stimulated significant research efforts to develop alternative, human cell-based assays for the prediction of sensitization. The aim is to replace animal experiments with in vitro tests displaying a higher predictive power. Results We have developed a novel cell-based assay for the prediction of sensitizing chemicals. By analyzing the transcriptome of the human cell line MUTZ-3 after 24 h stimulation, using 20 different sensitizing chemicals, 20 non-sensitizing chemicals and vehicle controls, we have identified a biomarker signature of 200 genes with potent discriminatory ability. Using a Support Vector Machine for supervised classification, the prediction performance of the assay revealed an area under the ROC curve of 0.98. In addition, categorizing the chemicals according to the LLNA assay, this gene signature could also predict sensitizing potency. The identified markers are involved in biological pathways with immunological relevant functions, which can shed light on the process of human sensitization. Conclusions A gene signature predicting sensitization, using a human cell line in vitro, has been identified. This simple and robust cell-based assay has the potential to completely replace or drastically reduce the utilization of test systems based on experimental animals. Being based on human biology, the assay is proposed to be more accurate for predicting sensitization in humans, than the traditional animal-based tests. PMID:21824406
A polynomial based model for cell fate prediction in human diseases.
Ma, Lichun; Zheng, Jie
2017-12-21
Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.
Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.
2009-01-01
Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint. PMID:19506001
Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.
2013-01-01
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Rohde, Palle Duun; Gaertner, Bryn; Ward, Kirsty; Sørensen, Peter; Mackay, Trudy F C
2017-08-01
Human psychiatric disorders such as schizophrenia, bipolar disorder, and attention-deficit/hyperactivity disorder often include adverse behaviors including increased aggressiveness. Individuals with psychiatric disorders often exhibit social withdrawal, which can further increase the probability of conducting a violent act. Here, we used the inbred, sequenced lines of the Drosophila Genetic Reference Panel (DGRP) to investigate the genetic basis of variation in male aggressive behavior for flies reared in a socialized and socially isolated environment. We identified genetic variation for aggressive behavior, as well as significant genotype-by-social environmental interaction (GSEI); i.e. , variation among DGRP genotypes in the degree to which social isolation affected aggression. We performed genome-wide association (GWA) analyses to identify genetic variants associated with aggression within each environment. We used genomic prediction to partition genetic variants into gene ontology (GO) terms and constituent genes, and identified GO terms and genes with high prediction accuracies in both social environments and for GSEI. The top predictive GO terms significantly increased the proportion of variance explained, compared to prediction models based on all segregating variants. We performed genomic prediction across environments, and identified genes in common between the social environments that turned out to be enriched for genome-wide associated variants. A large proportion of the associated genes have previously been associated with aggressive behavior in Drosophila and mice. Further, many of these genes have human orthologs that have been associated with neurological disorders, indicating partially shared genetic mechanisms underlying aggression in animal models and human psychiatric disorders. Copyright © 2017 by the Genetics Society of America.
Schmitz, Ulf; Lai, Xin; Winter, Felix; Wolkenhauer, Olaf; Vera, Julio; Gupta, Shailendra K.
2014-01-01
MicroRNAs (miRNAs) are an integral part of gene regulation at the post-transcriptional level. Recently, it has been shown that pairs of miRNAs can repress the translation of a target mRNA in a cooperative manner, which leads to an enhanced effectiveness and specificity in target repression. However, it remains unclear which miRNA pairs can synergize and which genes are target of cooperative miRNA regulation. In this paper, we present a computational workflow for the prediction and analysis of cooperating miRNAs and their mutual target genes, which we refer to as RNA triplexes. The workflow integrates methods of miRNA target prediction; triplex structure analysis; molecular dynamics simulations and mathematical modeling for a reliable prediction of functional RNA triplexes and target repression efficiency. In a case study we analyzed the human genome and identified several thousand targets of cooperative gene regulation. Our results suggest that miRNA cooperativity is a frequent mechanism for an enhanced target repression by pairs of miRNAs facilitating distinctive and fine-tuned target gene expression patterns. Human RNA triplexes predicted and characterized in this study are organized in a web resource at www.sbi.uni-rostock.de/triplexrna/. PMID:24875477
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.
2015-01-01
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131
Li, Yongsheng; Sahni, Nidhi; Yi, Song
2016-11-29
Comprehensive understanding of human cancer mechanisms requires the identification of a thorough list of cancer-associated genes, which could serve as biomarkers for diagnoses and therapies in various types of cancer. Although substantial progress has been made in functional studies to uncover genes involved in cancer, these efforts are often time-consuming and costly. Therefore, it remains challenging to comprehensively identify cancer candidate genes. Network-based methods have accelerated this process through the analysis of complex molecular interactions in the cell. However, the extent to which various interactome networks can contribute to prediction of candidate genes responsible for cancer is still enigmatic. In this study, we evaluated different human protein-protein interactome networks and compared their application to cancer gene prioritization. Our results indicate that network analyses can increase the power to identify novel cancer genes. In particular, such predictive power can be enhanced with the use of unbiased systematic protein interaction maps for cancer gene prioritization. Functional analysis reveals that the top ranked genes from network predictions co-occur often with cancer-related terms in literature, and further, these candidate genes are indeed frequently mutated across cancers. Finally, our study suggests that integrating interactome networks with other omics datasets could provide novel insights into cancer-associated genes and underlying molecular mechanisms.
Prediction of C. elegans Longevity Genes by Human and Worm Longevity Networks
de Magalhães, João Pedro; Ruvkun, Gary; Fraifeld, Vadim E.; Curran, Sean P.
2012-01-01
Intricate and interconnected pathways modulate longevity, but screens to identify the components of these pathways have not been saturating. Because biological processes are often executed by protein complexes and fine-tuned by regulatory factors, the first-order protein-protein interactors of known longevity genes are likely to participate in the regulation of longevity. Data-rich maps of protein interactions have been established for many cardinal organisms such as yeast, worms, and humans. We propose that these interaction maps could be mined for the identification of new putative regulators of longevity. For this purpose, we have constructed longevity networks in both humans and worms. We reasoned that the essential first-order interactors of known longevity-associated genes in these networks are more likely to have longevity phenotypes than randomly chosen genes. We have used C. elegans to determine whether post-developmental inactivation of these essential genes modulates lifespan. Our results suggest that the worm and human longevity networks are functionally relevant and possess a high predictive power for identifying new longevity regulators. PMID:23144747
Akram, Pakeeza; Liao, Li
2017-12-06
Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome. Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data. Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.
Gryglewski, Gregor; Seiger, René; James, Gregory Miles; Godbersen, Godber Mathis; Komorowski, Arkadiusz; Unterholzner, Jakob; Michenthaler, Paul; Hahn, Andreas; Wadsak, Wolfgang; Mitterhauser, Markus; Kasper, Siegfried; Lanzenberger, Rupert
2018-08-01
The quantification of big pools of diverse molecules provides important insights on brain function, but is often restricted to a limited number of observations, which impairs integration with other modalities. To resolve this issue, a method allowing for the prediction of mRNA expression in the entire brain based on microarray data provided in the Allen Human Brain Atlas was developed. Microarray data of 3702 samples from 6 brain donors was registered to MNI and cortical surface space using FreeSurfer. For each of 18,686 genes, spatial dependence of transcription was assessed using variogram modelling. Variogram models were employed in Gaussian process regression to calculate best linear unbiased predictions for gene expression at all locations represented in well-established imaging atlases for cortex, subcortical structures and cerebellum. For validation, predicted whole-brain transcription of the HTR1A gene was correlated with [carbonyl- 11 C]WAY-100635 positron emission tomography data collected from 30 healthy subjects. Prediction results showed minimal bias ranging within ±0.016 (cortical surface), ±0.12 (subcortical regions) and ±0.14 (cerebellum) in units of log2 expression intensity for all genes. Across genes, the correlation of predicted and observed mRNA expression in leave-one-out cross-validation correlated with the strength of spatial dependence (cortical surface: r = 0.91, subcortical regions: r = 0.85, cerebellum: r = 0.84). 816 out of 18,686 genes exhibited a high spatial dependence accounting for more than 50% of variance in the difference of gene expression on the cortical surface. In subcortical regions and cerebellum, different sets of genes were implicated by high spatially structured variability. For the serotonin 1A receptor, correlation between PET binding potentials and predicted comprehensive mRNA expression was markedly higher (Spearman ρ = 0.72 for cortical surface, ρ = 0.84 for subcortical regions) than correlation of PET and discrete samples only (ρ = 0.55 and ρ = 0.63, respectively). Prediction of mRNA expression in the entire human brain allows for intuitive visualization of gene transcription and seamless integration in multimodal analysis without bias arising from non-uniform distribution of available samples. Extension of this methodology promises to facilitate translation of omics research and enable investigation of human brain function at a systems level. Copyright © 2018 Elsevier Inc. All rights reserved.
Advances in biomarker development have improved our ability to detect early changes at the molecular, cellular and pre-clinical level that are often predictive of adverse health outcomes. Integration of human and animal studies addresses key concerns about animal-human extrapolat...
Suh, Yeunsu; Davis, Michael E.; Lee, Kichoon
2013-01-01
Understanding the tissue-specific pattern of gene expression is critical in elucidating the molecular mechanisms of tissue development, gene function, and transcriptional regulations of biological processes. Although tissue-specific gene expression information is available in several databases, follow-up strategies to integrate and use these data are limited. The objective of the current study was to identify and evaluate novel tissue-specific genes in human and mouse tissues by performing comparative microarray database analysis and semi-quantitative PCR analysis. We developed a powerful approach to predict tissue-specific genes by analyzing existing microarray data from the NCBI′s Gene Expression Omnibus (GEO) public repository. We investigated and confirmed tissue-specific gene expression in the human and mouse kidney, liver, lung, heart, muscle, and adipose tissue. Applying our novel comparative microarray approach, we confirmed 10 kidney, 11 liver, 11 lung, 11 heart, 8 muscle, and 8 adipose specific genes. The accuracy of this approach was further verified by employing semi-quantitative PCR reaction and by searching for gene function information in existing publications. Three novel tissue-specific genes were discovered by this approach including AMDHD1 (amidohydrolase domain containing 1) in the liver, PRUNE2 (prune homolog 2) in the heart, and ACVR1C (activin A receptor, type IC) in adipose tissue. We further confirmed the tissue-specific expression of these 3 novel genes by real-time PCR. Among them, ACVR1C is adipose tissue-specific and adipocyte-specific in adipose tissue, and can be used as an adipocyte developmental marker. From GEO profiles, we predicted the processes in which AMDHD1 and PRUNE2 may participate. Our approach provides a novel way to identify new sets of tissue-specific genes and to predict functions in which they may be involved. PMID:23741331
Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C
2003-01-01
Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626
Predicting human genetic interactions from cancer genome evolution.
Lu, Xiaowen; Megchelenbrink, Wout; Notebaart, Richard A; Huynen, Martijn A
2015-01-01
Synthetic Lethal (SL) genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75) for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.
Hunt, C; Morimoto, R I
1985-01-01
We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
First Pass Annotation of Promoters on Human Chromosome 22
Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas
2001-01-01
The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158
Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I
2013-01-01
Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.
Yeast Phenomics: An Experimental Approach for Modeling Gene Interaction Networks that Buffer Disease
Hartman, John L.; Stisher, Chandler; Outlaw, Darryl A.; Guo, Jingyu; Shah, Najaf A.; Tian, Dehua; Santos, Sean M.; Rodgers, John W.; White, Richard A.
2015-01-01
The genome project increased appreciation of genetic complexity underlying disease phenotypes: many genes contribute each phenotype and each gene contributes multiple phenotypes. The aspiration of predicting common disease in individuals has evolved from seeking primary loci to marginal risk assignments based on many genes. Genetic interaction, defined as contributions to a phenotype that are dependent upon particular digenic allele combinations, could improve prediction of phenotype from complex genotype, but it is difficult to study in human populations. High throughput, systematic analysis of S. cerevisiae gene knockouts or knockdowns in the context of disease-relevant phenotypic perturbations provides a tractable experimental approach to derive gene interaction networks, in order to deduce by cross-species gene homology how phenotype is buffered against disease-risk genotypes. Yeast gene interaction network analysis to date has revealed biology more complex than previously imagined. This has motivated the development of more powerful yeast cell array phenotyping methods to globally model the role of gene interaction networks in modulating phenotypes (which we call yeast phenomic analysis). The article illustrates yeast phenomic technology, which is applied here to quantify gene X media interaction at higher resolution and supports use of a human-like media for future applications of yeast phenomics for modeling human disease. PMID:25668739
Inter-species pathway perturbation prediction via data-driven detection of functional homology.
Hafemeister, Christoph; Romero, Roberto; Bilal, Erhan; Meyer, Pablo; Norel, Raquel; Rhrissorrakrai, Kahn; Bonneau, Richard; Tarca, Adi L
2015-02-15
Experiments in animal models are often conducted to infer how humans will respond to stimuli by assuming that the same biological pathways will be affected in both organisms. The limitations of this assumption were tested in the IMPROVER Species Translation Challenge, where 52 stimuli were applied to both human and rat cells and perturbed pathways were identified. In the Inter-species Pathway Perturbation Prediction sub-challenge, multiple teams proposed methods to use rat transcription data from 26 stimuli to predict human gene set and pathway activity under the same perturbations. Submissions were evaluated using three performance metrics on data from the remaining 26 stimuli. We present two approaches, ranked second in this challenge, that do not rely on sequence-based orthology between rat and human genes to translate pathway perturbation state but instead identify transcriptional response orthologs across a set of training conditions. The translation from rat to human accomplished by these so-called direct methods is not dependent on the particular analysis method used to identify perturbed gene sets. In contrast, machine learning-based methods require performing a pathway analysis initially and then mapping the pathway activity between organisms. Unlike most machine learning approaches, direct methods can be used to predict the activation of a human pathway for a new (test) stimuli, even when that pathway was never activated by a training stimuli. Gene expression data are available from ArrayExpress (accession E-MTAB-2091), while software implementations are available from http://bioinformaticsprb.med.wayne.edu?p=50 and http://goo.gl/hJny3h. christoph.hafemeister@nyu.edu or atarca@med.wayne.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Phenome-driven disease genetics prediction toward drug discovery.
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-06-15
Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.
Draft Map of Human Proteome Published | Office of Cancer Clinical Proteomics Research
In a recently published article in the journal Nature, researchers have developed a draft map of the human proteome. Striving for the protein equivalent of the Human Genome Project, an international team of researchers has created an initial catalog of the human proteome. In total, using 30 different human tissues, the researchers identified proteins encoded by 17,294 genes, which is approximately 84 percent of all of the genes in the human genome predicted to encode proteins.
Dowell, Karen G; Simons, Allen K; Bai, Hao; Kell, Braden; Wang, Zack Z; Yun, Kyuson; Hibbs, Matthew A
2014-05-01
Embryonic stem cells (ESCs), characterized by their ability to both self-renew and differentiate into multiple cell lineages, are a powerful model for biomedical research and developmental biology. Human and mouse ESCs share many features, yet have distinctive aspects, including fundamental differences in the signaling pathways and cell cycle controls that support self-renewal. Here, we explore the molecular basis of human ESC self-renewal using Bayesian network machine learning to integrate cell-type-specific, high-throughput data for gene function discovery. We integrated high-throughput ESC data from 83 human studies (~1.8 million data points collected under 1,100 conditions) and 62 mouse studies (~2.4 million data points collected under 1,085 conditions) into separate human and mouse predictive networks focused on ESC self-renewal to analyze shared and distinct functional relationships among protein-coding gene orthologs. Computational evaluations show that these networks are highly accurate, literature validation confirms their biological relevance, and reverse transcriptase polymerase chain reaction (RT-PCR) validation supports our predictions. Our results reflect the importance of key regulatory genes known to be strongly associated with self-renewal and pluripotency in both species (e.g., POU5F1, SOX2, and NANOG), identify metabolic differences between species (e.g., threonine metabolism), clarify differences between human and mouse ESC developmental signaling pathways (e.g., leukemia inhibitory factor (LIF)-activated JAK/STAT in mouse; NODAL/ACTIVIN-A-activated fibroblast growth factor in human), and reveal many novel genes and pathways predicted to be functionally associated with self-renewal in each species. These interactive networks are available online at www.StemSight.org for stem cell researchers to develop new hypotheses, discover potential mechanisms involving sparsely annotated genes, and prioritize genes of interest for experimental validation. © 2013 AlphaMed Press.
The Functional Human C-Terminome
Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.
2016-01-01
All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421
A cluster of novel serotonin receptor 3-like genes on human chromosome 3.
Karnovsky, Alla M; Gotow, Lisa F; McKinley, Denise D; Piechan, Julie L; Ruble, Cara L; Mills, Cynthia J; Schellin, Kathleen A B; Slightom, Jerry L; Fitzgerald, Laura R; Benjamin, Christopher W; Roberds, Steven L
2003-11-13
The ligand-gated ion channel family includes receptors for serotonin (5-hydroxytryptamine, 5-HT), acetylcholine, GABA, and glutamate. Drugs targeting subtypes of these receptors have proven useful for the treatment of various neuropsychiatric and neurological disorders. To identify new ligand-gated ion channels as potential therapeutic targets, drafts of human genome sequence were interrogated. Portions of four novel genes homologous to 5-HT(3A) and 5-HT(3B) receptors were identified within human sequence databases. We named the genes 5-HT(3C1)-5-HT(3C4). Radiation hybrid (RH) mapping localized these genes to chromosome 3q27-28. All four genes shared similar intron-exon organizations and predicted protein secondary structure with 5-HT(3A) and 5-HT(3B). Orthologous genes were detected by Southern blotting in several species including dog, cow, and chicken, but not in rodents, suggesting that these novel genes are not present in rodents or are very poorly conserved. Two of the novel genes are predicted to be pseudogenes, but two other genes are transcribed and spliced to form appropriate open reading frames. The 5-HT(3C1) transcript is expressed almost exclusively in small intestine and colon, suggesting a possible role in the serotonin-responsiveness of the gut.
Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk
2016-11-16
Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Liu, Bin; Jin, Min; Zeng, Pan
2015-10-01
The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015 Elsevier Inc. All rights reserved.
Mercatanti, Alberto; Lodovichi, Samuele; Cervelli, Tiziana; Galli, Alvaro
2017-12-01
Evaluation of the functional impact of cancer-associated missense variants is more difficult than for protein-truncating mutations and consequently standard guidelines for the interpretation of sequence variants have been recently proposed. A number of algorithms and software products were developed to predict the impact of cancer-associated missense mutations on protein structure and function. Importantly, direct assessment of the variants using high-throughput functional assays using simple genetic systems can help in speeding up the functional evaluation of newly identified cancer-associated variants. We developed the web tool CRIMEtoYHU (CTY) to help geneticists in the evaluation of the functional impact of cancer-associated missense variants. Humans and the yeast Saccharomyces cerevisiae share thousands of protein-coding genes although they have diverged for a billion years. Therefore, yeast humanization can be helpful in deciphering the functional consequences of human genetic variants found in cancer and give information on the pathogenicity of missense variants. To humanize specific positions within yeast genes, human and yeast genes have to share functional homology. If a mutation in a specific residue is associated with a particular phenotype in humans, a similar substitution in the yeast counterpart may reveal its effect at the organism level. CTY simultaneously finds yeast homologous genes, identifies the corresponding variants and determines the transferability of human variants to yeast counterparts by assigning a reliability score (RS) that may be predictive for the validity of a functional assay. CTY analyzes newly identified mutations or retrieves mutations reported in the COSMIC database, provides information about the functional conservation between yeast and human and shows the mutation distribution in human genes. CTY analyzes also newly found mutations and aborts when no yeast homologue is found. Then, on the basis of the protein domain localization and functional conservation between yeast and human, the selected variants are ranked by the RS. The RS is assigned by an algorithm that computes functional data, type of mutation, chemistry of amino acid substitution and the degree of mutation transferability between human and yeast protein. Mutations giving a positive RS are highly transferable to yeast and, therefore, yeast functional assays will be more predictable. To validate the web application, we have analyzed 8078 cancer-associated variants located in 31 genes that have a yeast homologue. More than 50% of variants are transferable to yeast. Incidentally, 88% of all transferable mutations have a reliability score >0. Moreover, we analyzed by CTY 72 functionally validated missense variants located in yeast genes at positions corresponding to the human cancer-associated variants. All these variants gave a positive RS. To further validate CTY, we analyzed 3949 protein variants (with positive RS) by the predictive algorithm PROVEAN. This analysis shows that yeast-based functional assays will be more predictable for the variants with positive RS. We believe that CTY could be an important resource for the cancer research community by providing information concerning the functional impact of specific mutations, as well as for the design of functional assays useful for decision support in precision medicine. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Osato, Naoki
2018-01-19
Transcriptional target genes show functional enrichment of genes. However, how many and how significantly transcriptional target genes include functional enrichments are still unclear. To address these issues, I predicted human transcriptional target genes using open chromatin regions, ChIP-seq data and DNA binding sequences of transcription factors in databases, and examined functional enrichment and gene expression level of putative transcriptional target genes. Gene Ontology annotations showed four times larger numbers of functional enrichments in putative transcriptional target genes than gene expression information alone, independent of transcriptional target genes. To compare the number of functional enrichments of putative transcriptional target genes between cells or search conditions, I normalized the number of functional enrichment by calculating its ratios in the total number of transcriptional target genes. With this analysis, native putative transcriptional target genes showed the largest normalized number of functional enrichments, compared with target genes including 5-60% of randomly selected genes. The normalized number of functional enrichments was changed according to the criteria of enhancer-promoter interactions such as distance from transcriptional start sites and orientation of CTCF-binding sites. Forward-reverse orientation of CTCF-binding sites showed significantly higher normalized number of functional enrichments than the other orientations. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types. The median expression level of transcriptional target genes changed according to the criteria of enhancer-promoter assignments (i.e. interactions) and was correlated with the changes of the normalized number of functional enrichments of transcriptional target genes. Human putative transcriptional target genes showed significant functional enrichments. Functional enrichments were related to the cellular functions. The normalized number of functional enrichments of human putative transcriptional target genes changed according to the criteria of enhancer-promoter assignments and correlated with the median expression level of the target genes. These analyses and characters of human putative transcriptional target genes would be useful to examine the criteria of enhancer-promoter assignments and to predict the novel mechanisms and factors such as DNA binding proteins and DNA sequences of enhancer-promoter interactions.
Weil, D; Levy, G; Sahly, I; Levi-Acobas, F; Blanchard, S; El-Amraoui, A; Crozet, F; Philippe, H; Abitbol, M; Petit, C
1996-04-16
The gene encoding human myosin VIIA is responsible for Usher syndrome type III (USH1B), a disease which associates profound congenital sensorineural deafness, vestibular dysfunction, and retinitis pigmentosa. The reconstituted cDNA sequence presented here predicts a 2215 amino acid protein with a typical unconventional myosin structure. This protein is expected to dimerize into a two-headed molecule. The C terminus of its tail shares homology with the membrane-binding domain of the band 4.1 protein superfamily. The gene consists of 48 coding exons. It encodes several alternatively spliced forms. In situ hybridization analysis in human embryos demonstrates that the myosin VIIA gene is expressed in the pigment epithelium and the photoreceptor cells of the retina, thus indicating that both cell types may be involved in the USH1B retinal degenerative process. In addition, the gene is expressed in the human embryonic cochlear and vestibular neuroepithelia. We suggest that deafness and vestibular dysfunction in USH1B patients result from a defect in the morphogenesis of the inner ear sensory cell stereocilia.
GIANT 2.0: genome-scale integrated analysis of gene networks in tissues.
Wong, Aaron K; Krishnan, Arjun; Troyanskaya, Olga G
2018-05-25
GIANT2 (Genome-wide Integrated Analysis of gene Networks in Tissues) is an interactive web server that enables biomedical researchers to analyze their proteins and pathways of interest and generate hypotheses in the context of genome-scale functional maps of human tissues. The precise actions of genes are frequently dependent on their tissue context, yet direct assay of tissue-specific protein function and interactions remains infeasible in many normal human tissues and cell-types. With GIANT2, researchers can explore predicted tissue-specific functional roles of genes and reveal changes in those roles across tissues, all through interactive multi-network visualizations and analyses. Additionally, the NetWAS approach available through the server uses tissue-specific/cell-type networks predicted by GIANT2 to re-prioritize statistical associations from GWAS studies and identify disease-associated genes. GIANT2 predicts tissue-specific interactions by integrating diverse functional genomics data from now over 61 400 experiments for 283 diverse tissues and cell-types. GIANT2 does not require any registration or installation and is freely available for use at http://giant-v2.princeton.edu.
The human genome and sport, including epigenetics, gene doping, and athleticogenomics.
Sharp, N C Craig
2010-03-01
Hugh Montgomery's discovery of the first of more than 239 fitness genes together with rapid advances in human gene therapy have created a prospect of using genes, genetic elements, and cells that have the capacity to enhance athletic performance (to paraphrase the World Anti-Doping Agency's definition of gene doping). This brief overview covers the main areas of interface between genetics and sport, attempts to provide a context against which gene doping may be viewed, and predicts a futuristic legitimate use of genomic (and possibly epigenetic) information in sport. Copyright 2010 Elsevier Inc. All rights reserved.
Yu, Liang; Wang, Bingbo; Ma, Xiaoke; Gao, Lin
2016-12-23
Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.
Cheung, Connie; Gonzalez, Frank J
2008-01-01
Cytochrome P450s (P450s) are important enzymes involved in the metabolism of xenobiotics, particularly clinically used drugs, and are also responsible for metabolic activation of chemical carcinogens and toxins. Many xenobiotics can activate nuclear receptors that in turn induce the expression of genes encoding xenobiotic metabolizing enzymes and drug transporters. Marked species differences in the expression and regulation of cytochromes P450 and xenobiotic nuclear receptors exist. Thus obtaining reliable rodent models to accurately reflect human drug and carcinogen metabolism is severely limited. Humanized transgenic mice were developed in an effort to create more reliable in vivo systems to study and predict human responses to xenobiotics. Human P450s or human xenobiotic-activated nuclear receptors were introduced directly or replaced the corresponding mouse gene, thus creating “humanized” transgenic mice. Mice expressing human CYP1A1/CYP1A2, CYP2E1, CYP2D6, CYP3A4, CY3A7, PXR, PPARα were generated and characterized. These humanized mouse models offers a broad utility in the evaluation and prediction of toxicological risk that may aid in the development of safer drugs. PMID:18682571
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
A signature inferred from Drosophila mitotic genes predicts survival of breast cancer patients.
Damasco, Christian; Lembo, Antonio; Somma, Maria Patrizia; Gatti, Maurizio; Di Cunto, Ferdinando; Provero, Paolo
2011-02-28
The classification of breast cancer patients into risk groups provides a powerful tool for the identification of patients who will benefit from aggressive systemic therapy. The analysis of microarray data has generated several gene expression signatures that improve diagnosis and allow risk assessment. There is also evidence that cell proliferation-related genes have a high predictive power within these signatures. We thus constructed a gene expression signature (the DM signature) using the human orthologues of 108 Drosophila melanogaster genes required for either the maintenance of chromosome integrity (36 genes) or mitotic division (72 genes). The DM signature has minimal overlap with the extant signatures and is highly predictive of survival in 5 large breast cancer datasets. In addition, we show that the DM signature outperforms many widely used breast cancer signatures in predictive power, and performs comparably to other proliferation-based signatures. For most genes of the DM signature, an increased expression is negatively correlated with patient survival. The genes that provide the highest contribution to the predictive power of the DM signature are those involved in cytokinesis. This finding highlights cytokinesis as an important marker in breast cancer prognosis and as a possible target for antimitotic therapies.
The role of gene-gene interaction in the prediction of criminal behavior.
Boutwell, Brian B; Menard, Scott; Barnes, J C; Beaver, Kevin M; Armstrong, Todd A; Boisvert, Danielle
2014-04-01
A host of research has examined the possibility that environmental risk factors might condition the influence of genes on various outcomes. Less research, however, has been aimed at exploring the possibility that genetic factors might interact to impact the emergence of human traits. Even fewer studies exist examining the interaction of genes in the prediction of behavioral outcomes. The current study expands this body of research by testing the interaction between genes involved in neural transmission. Our findings suggest that certain dopamine genes interact to increase the odds of criminogenic outcomes in a national sample of Americans. Copyright © 2014 Elsevier Inc. All rights reserved.
Defining a Cancer Dependency Map.
Tsherniak, Aviad; Vazquez, Francisca; Montgomery, Phil G; Weir, Barbara A; Kryukov, Gregory; Cowley, Glenn S; Gill, Stanley; Harrington, William F; Pantel, Sasha; Krill-Burger, John M; Meyers, Robin M; Ali, Levi; Goodale, Amy; Lee, Yenarae; Jiang, Guozhi; Hsiao, Jessica; Gerath, William F J; Howell, Sara; Merkel, Erin; Ghandi, Mahmoud; Garraway, Levi A; Root, David E; Golub, Todd R; Boehm, Jesse S; Hahn, William C
2017-07-27
Most human epithelial tumors harbor numerous alterations, making it difficult to predict which genes are required for tumor survival. To systematically identify cancer dependencies, we analyzed 501 genome-scale loss-of-function screens performed in diverse human cancer cell lines. We developed DEMETER, an analytical framework that segregates on- from off-target effects of RNAi. 769 genes were differentially required in subsets of these cell lines at a threshold of six SDs from the mean. We found predictive models for 426 dependencies (55%) by nonlinear regression modeling considering 66,646 molecular features. Many dependencies fall into a limited number of classes, and unexpectedly, in 82% of models, the top biomarkers were expression based. We demonstrated the basis behind one such predictive model linking hypermethylation of the UBB ubiquitin gene to a dependency on UBC. Together, these observations provide a foundation for a cancer dependency map that facilitates the prioritization of therapeutic targets. Copyright © 2017 Elsevier Inc. All rights reserved.
Mouse Models as Predictors of Human Responses: Evolutionary Medicine.
Uhl, Elizabeth W; Warner, Natalie J
Mice offer a number of advantages and are extensively used to model human diseases and drug responses. Selective breeding and genetic manipulation of mice have made many different genotypes and phenotypes available for research. However, in many cases, mouse models have failed to be predictive. Important sources of the prediction problem have been the failure to consider the evolutionary basis for species differences, especially in drug metabolism, and disease definitions that do not reflect the complexity of gene expression underlying disease phenotypes. Incorporating evolutionary insights into mouse models allow for unique opportunities to characterize the effects of diet, different gene expression profiles, and microbiomics underlying human drug responses and disease phenotypes.
NASA Technical Reports Server (NTRS)
Sundaresan, A.; Pellis, N. R.
2005-01-01
Genetic response suites in human lymphocytes in response to microgravity are important to identify and further study in order to augment human physiological adaptation to novel environments. Emerging technologies, such as DNA micro array profiling, have the potential to identify novel genes that are involved in mediating adaptation to these environments. These genes may prove to be therapeutically valuable as new targets for countermeasures, or as predictive biomarkers of response to these new environments. Human lymphocytes cultured in lg and microgravity analog culture were analyzed for their differential gene expression response. Different groups of genes related to the immune response, cardiovascular system and stress response were then analyzed. Analysis of cells from multiple donors reveals a small shared set that are likely to be essential to adaptation. These three groups focus on human adaptation to new environments. The shared set contains genes related to T cell activation, immune response and stress response to analog microgravity.
Inductive reasoning about causally transmitted properties.
Shafto, Patrick; Kemp, Charles; Bonawitz, Elizabeth Baraff; Coley, John D; Tenenbaum, Joshua B
2008-11-01
Different intuitive theories constrain and guide inferences in different contexts. Formalizing simple intuitive theories as probabilistic processes operating over structured representations, we present a new computational model of category-based induction about causally transmitted properties. A first experiment demonstrates undergraduates' context-sensitive use of taxonomic and food web knowledge to guide reasoning about causal transmission and shows good qualitative agreement between model predictions and human inferences. A second experiment demonstrates strong quantitative and qualitative fits to inferences about a more complex artificial food web. A third experiment investigates human reasoning about complex novel food webs where species have known taxonomic relations. Results demonstrate a double-dissociation between the predictions of our causal model and a related taxonomic model [Kemp, C., & Tenenbaum, J. B. (2003). Learning domain structures. In Proceedings of the 25th annual conference of the cognitive science society]: the causal model predicts human inferences about diseases but not genes, while the taxonomic model predicts human inferences about genes but not diseases. We contrast our framework with previous models of category-based induction and previous formal instantiations of intuitive theories, and outline challenges in developing a complete model of context-sensitive reasoning.
Functional analysis and transcriptional output of the Göttingen minipig genome.
Heckel, Tobias; Schmucki, Roland; Berrera, Marco; Ringshandl, Stephan; Badi, Laura; Steiner, Guido; Ravon, Morgane; Küng, Erich; Kuhn, Bernd; Kratochwil, Nicole A; Schmitt, Georg; Kiialainen, Anna; Nowaczyk, Corinne; Daff, Hamina; Khan, Azinwi Phina; Lekolool, Isaac; Pelle, Roger; Okoth, Edward; Bishop, Richard; Daubenberger, Claudia; Ebeling, Martin; Certa, Ulrich
2015-11-14
In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development. Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we could demonstrate similar expression levels for most phase I drug-metabolizing enzymes. Higher expression levels and metabolic activities were found for FMO1, AKR/CRs and for phase II drug metabolizing enzymes in minipig as compared to human. The variability of gene expression in equivalent human and minipig tissues is considerably higher in minipig organs, which is important for study design in case a human target belongs to this variable category in the minipig. The first analysis of gene expression in multiple tissues during development from young to adult shows that the majority of transcriptional programs are concluded four weeks after birth. This finding is in line with the advanced state of human postnatal organ development at comparative age categories and further supports the minipig as model for pediatric drug safety studies. Genome based assessment of sequence conservation combined with gene expression data in several tissues improves the translational value of the minipig for human drug development. The genome and gene expression data presented here are important resources for researchers using the minipig as model for biomedical research or commercial breeding. Potential impact of our data for comparative genomics, translational research, and experimental medicine are discussed.
Cao, HuanHuan; Zhang, YuHang; Zhao, Jia; Zhu, Liucun; Wang, Yi; Li, JiaRui; Feng, Yuan-Ming; Zhang, Ning
2017-01-01
Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. According to the "guilt by association" (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Designing oligo libraries taking alternative splicing into account
NASA Astrophysics Data System (ADS)
Shoshan, Avi; Grebinskiy, Vladimir; Magen, Avner; Scolnicov, Ariel; Fink, Eyal; Lehavi, David; Wasserman, Alon
2001-06-01
We have designed sequences for DNA microarrays and oligo libraries, taking alternative splicing into account. Alternative splicing is a common phenomenon, occurring in more than 25% of the human genes. In many cases, different splice variants have different functions, are expressed in different tissues or may indicate different stages of disease. When designing sequences for DNA microarrays or oligo libraries, it is very important to take into account the sequence information of all the mRNA transcripts. Therefore, when a gene has more than one transcript (as a result of alternative splicing, alternative promoter sites or alternative poly-adenylation sites), it is very important to take all of them into account in the design. We have used the LEADS transcriptome prediction system to cluster and assemble the human sequences in GenBank and design optimal oligonucleotides for all the human genes with a known mRNA sequence based on the LEADS predictions.
Gene Signature for Predicting Solid Tumors Patient Prognosis | NCI Technology Transfer Center | TTC
The National Cancer Institute’s Laboratory of Human Carcinogenesis seeks parties to license or co-develop a method of predicting the prognosis of a patient diagnosed with hepatocellular carcinoma (HCC) or breast cancer by detecting expression of one or more cancer-associated genes, and a method of identifying an agent for use in treating HCC.
Liang, Ping; Nair, Jayakumar R; Song, Lei; McGuire, John J; Dolnick, Bruce J
2005-01-01
Background The rTS gene (ENOSF1), first identified in Homo sapiens as a gene complementary to the thymidylate synthase (TYMS) mRNA, is known to encode two protein isoforms, rTSα and rTSβ. The rTSβ isoform appears to be an enzyme responsible for the synthesis of signaling molecules involved in the down-regulation of thymidylate synthase, but the exact cellular functions of rTS genes are largely unknown. Results Through comparative genomic sequence analysis, we predicted the existence of a novel protein isoform, rTS, which has a 27 residue longer N-terminus by virtue of utilizing an alternative start codon located upstream of the start codon in rTSβ. We observed that a similar extended N-terminus could be predicted in all rTS genes for which genomic sequences are available and the extended regions are conserved from bacteria to human. Therefore, we reasoned that the protein with the extended N-terminus might represent an ancestral form of the rTS protein. Sequence analysis strongly predicts a mitochondrial signal sequence in the extended N-terminal of human rTSγ, which is absent in rTSβ. We confirmed the existence of rTS in human mitochondria experimentally by demonstrating the presence of both rTSγ and rTSβ proteins in mitochondria isolated by subcellular fractionation. In addition, our comprehensive analysis of rTS orthologous sequences reveals an unusual phylogenetic distribution of this gene, which suggests the occurrence of one or more horizontal gene transfer events. Conclusion The presence of two rTS isoforms in mitochondria suggests that the rTS signaling pathway may be active within mitochondria. Our report also presents an example of identifying novel protein isoforms and for improving gene annotation through comparative genomic analysis. PMID:16162288
nGASP - the nematode genome annotation assessment project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coghlan, A; Fiedler, T J; McKay, S J
2008-12-19
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner'more » algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.« less
Phenome-driven disease genetics prediction toward drug discovery
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-01-01
Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Liu, Zhongliang; Hui, Yi; Shi, Lei; Chen, Zhenyu; Xu, Xiangjie; Chi, Liankai; Fan, Beibei; Fang, Yujiang; Liu, Yang; Ma, Lin; Wang, Yiran; Xiao, Lei; Zhang, Quanbin; Jin, Guohua; Liu, Ling; Zhang, Xiaoqing
2016-09-13
Loss-of-function studies in human pluripotent stem cells (hPSCs) require efficient methodologies for lesion of genes of interest. Here, we introduce a donor-free paired gRNA-guided CRISPR/Cas9 knockout strategy (paired-KO) for efficient and rapid gene ablation in hPSCs. Through paired-KO, we succeeded in targeting all genes of interest with high biallelic targeting efficiencies. More importantly, during paired-KO, the cleaved DNA was repaired mostly through direct end joining without insertions/deletions (precise ligation), and thus makes the lesion product predictable. The paired-KO remained highly efficient for one-step targeting of multiple genes and was also efficient for targeting of microRNA, while for long non-coding RNA over 8 kb, cleavage of a short fragment of the core promoter region was sufficient to eradicate downstream gene transcription. This work suggests that the paired-KO strategy is a simple and robust system for loss-of-function studies for both coding and non-coding genes in hPSCs. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
A translatable predictor of human radiation exposure.
Lucas, Joseph; Dressman, Holly K; Suchindran, Sunil; Nakamura, Mai; Chao, Nelson J; Himburg, Heather; Minor, Kerry; Phillips, Gary; Ross, Joel; Abedi, Majid; Terbrueggen, Robert; Chute, John P
2014-01-01
Terrorism using radiological dirty bombs or improvised nuclear devices is recognized as a major threat to both public health and national security. In the event of a radiological or nuclear disaster, rapid and accurate biodosimetry of thousands of potentially affected individuals will be essential for effective medical management to occur. Currently, health care providers lack an accurate, high-throughput biodosimetric assay which is suitable for the triage of large numbers of radiation injury victims. Here, we describe the development of a biodosimetric assay based on the analysis of irradiated mice, ex vivo-irradiated human peripheral blood (PB) and humans treated with total body irradiation (TBI). Interestingly, a gene expression profile developed via analysis of murine PB radiation response alone was inaccurate in predicting human radiation injury. In contrast, generation of a gene expression profile which incorporated data from ex vivo irradiated human PB and human TBI patients yielded an 18-gene radiation classifier which was highly accurate at predicting human radiation status and discriminating medically relevant radiation dose levels in human samples. Although the patient population was relatively small, the accuracy of this classifier in discriminating radiation dose levels in human TBI patients was not substantially confounded by gender, diagnosis or prior exposure to chemotherapy. We have further incorporated genes from this human radiation signature into a rapid and high-throughput chemical ligation-dependent probe amplification assay (CLPA) which was able to discriminate radiation dose levels in a pilot study of ex vivo irradiated human blood and samples from human TBI patients. Our results illustrate the potential for translation of a human genetic signature for the diagnosis of human radiation exposure and suggest the basis for further testing of CLPA as a candidate biodosimetric assay.
Schmouth, Jean-François; Castellarin, Mauro; Laprise, Stéphanie; Banks, Kathleen G; Bonaguro, Russell J; McInerny, Simone C; Borretta, Lisa; Amirabbasi, Mahsa; Korecki, Andrea J; Portales-Casamar, Elodie; Wilson, Gary; Dreolini, Lisa; Jones, Steven J M; Wasserman, Wyeth W; Goldowitz, Daniel; Holt, Robert A; Simpson, Elizabeth M
2013-10-14
The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX ('high-throughput human genes on the X chromosome') strategy to expand our understanding of human gene regulation in vivo. In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression.
Guo, Nancy L; Wan, Ying-Wooi; Denvir, James; Porter, Dale W; Pacurari, Maricica; Wolfarth, Michael G; Castranova, Vincent; Qian, Yong
2012-01-01
Concerns over the potential for multi-walled carbon nanotubes (MWCNT) to induce lung carcinogenesis have emerged. This study sought to (1) identify gene expression signatures in the mouse lungs following pharyngeal aspiration of well-dispersed MWCNT and (2) determine if these genes were associated with human lung cancer risk and progression. Genome-wide mRNA expression profiles were analyzed in mouse lungs (n=160) exposed to 0, 10, 20, 40, or 80 µg of MWCNT by pharyngeal aspiration at 1, 7, 28, and 56 days post-exposure. By using pairwise-Statistical Analysis of Microarray (SAM) and linear modeling, 24 genes were selected, which have significant changes in at least two time points, have a more than 1.5 fold change at all doses, and are significant in the linear model for the dose or the interaction of time and dose. Additionally, a 38-gene set was identified as related to cancer from 330 genes differentially expressed at day 56 post-exposure in functional pathway analysis. Using the expression profiles of the cancer-related gene set in 8 mice at day 56 post-exposure to 10 µg of MWCNT, a nearest centroid classification accurately predicts human lung cancer survival with a significant hazard ratio in training set (n=256) and test set (n=186). Furthermore, both gene signatures were associated with human lung cancer risk (n=164) with significant odds ratios. These results may lead to development of a surveillance approach for early detection of lung cancer and prognosis associated with MWCNT in the workplace. PMID:22891886
A genomic lifespan program that reorganises the young adult brain is targeted in schizophrenia.
Skene, Nathan G; Roy, Marcia; Grant, Seth Gn
2017-09-12
The genetic mechanisms regulating the brain and behaviour across the lifespan are poorly understood. We found that lifespan transcriptome trajectories describe a calendar of gene regulatory events in the brain of humans and mice. Transcriptome trajectories defined a sequence of gene expression changes in neuronal, glial and endothelial cell-types, which enabled prediction of age from tissue samples. A major lifespan landmark was the peak change in trajectories occurring in humans at 26 years and in mice at 5 months of age. This species-conserved peak was delayed in females and marked a reorganization of expression of synaptic and schizophrenia-susceptibility genes. The lifespan calendar predicted the characteristic age of onset in young adults and sex differences in schizophrenia. We propose a genomic program generates a lifespan calendar of gene regulation that times age-dependent molecular organization of the brain and mutations that interrupt the program in young adults cause schizophrenia.
Alternative splicing of the tyrosinase gene transcript in normal human melanocytes and lymphocytes.
Fryer, J P; Oetting, W S; Brott, M J; King, R A
2001-11-01
We have identified and isolated ectopically expressed tyrosinase transcripts in normal human melanocytes and lymphocytes and in a human melanoma (MNT-1) cell line to establish a baseline for the expression pattern of this gene in normal tissue. Tyrosinase mRNA from human lymphoblastoid cell lines was reverse transcribed and amplified using specific "nested" primers. This amplification yielded eight identifiable transcripts; five that resulted from alternative splicing patterns arising from the utilization of normal and alternative splice sequences. Identical splicing patterns were found in transcripts from human primary melanocytes in culture and a melanoma cell line, indicating that lymphoblastoid cell lines provide an accurate reflection of transcript processing in melanocytes. Similar splicing patterns have also been found with murine melanocyte tyrosinase transcripts. Our results demonstrate that alternative splicing of human tyrosinase gene transcript produces a number of predictable and identifiable transcripts, and that human lymphoblastoid cell lines provide a source of ectopically expressed transcripts that can be used to study the biology of tyrosinase gene expression in humans.
Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J
2016-02-01
Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Genome editing for human gene therapy.
Meissner, Torsten B; Mandal, Pankaj K; Ferreira, Leonardo M R; Rossi, Derrick J; Cowan, Chad A
2014-01-01
The rapid advancement of genome-editing techniques holds much promise for the field of human gene therapy. From bacteria to model organisms and human cells, genome editing tools such as zinc-finger nucleases (ZNFs), TALENs, and CRISPR/Cas9 have been successfully used to manipulate the respective genomes with unprecedented precision. With regard to human gene therapy, it is of great interest to test the feasibility of genome editing in primary human hematopoietic cells that could potentially be used to treat a variety of human genetic disorders such as hemoglobinopathies, primary immunodeficiencies, and cancer. In this chapter, we explore the use of the CRISPR/Cas9 system for the efficient ablation of genes in two clinically relevant primary human cell types, CD4+ T cells and CD34+ hematopoietic stem and progenitor cells. By using two guide RNAs directed at a single locus, we achieve highly efficient and predictable deletions that ablate gene function. The use of a Cas9-2A-GFP fusion protein allows FACS-based enrichment of the transfected cells. The ease of designing, constructing, and testing guide RNAs makes this dual guide strategy an attractive approach for the efficient deletion of clinically relevant genes in primary human hematopoietic stem and effector cells and enables the use of CRISPR/Cas9 for gene therapy.
Gene Expression Analysis to Assess the Relevance of Rodent Models to Human Lung Injury.
Sweeney, Timothy E; Lofgren, Shane; Khatri, Purvesh; Rogers, Angela J
2017-08-01
The relevance of animal models to human diseases is an area of intense scientific debate. The degree to which mouse models of lung injury recapitulate human lung injury has never been assessed. Integrating data from both human and animal expression studies allows for increased statistical power and identification of conserved differential gene expression across organisms and conditions. We sought comprehensive integration of gene expression data in experimental acute lung injury (ALI) in rodents compared with humans. We performed two separate gene expression multicohort analyses to determine differential gene expression in experimental animal and human lung injury. We used correlational and pathway analyses combined with external in vitro gene expression data to identify both potential drivers of underlying inflammation and therapeutic drug candidates. We identified 21 animal lung tissue datasets and three human lung injury bronchoalveolar lavage datasets. We show that the metasignatures of animal and human experimental ALI are significantly correlated despite these widely varying experimental conditions. The gene expression changes among mice and rats across diverse injury models (ozone, ventilator-induced lung injury, LPS) are significantly correlated with human models of lung injury (Pearson r = 0.33-0.45, P < 1E -16 ). Neutrophil signatures are enriched in both animal and human lung injury. Predicted therapeutic targets, peptide ligand signatures, and pathway analyses are also all highly overlapping. Gene expression changes are similar in animal and human experimental ALI, and provide several physiologic and therapeutic insights to the disease.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments
Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic
2001-01-01
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
Lipovich, Leonard; Hou, Zhuo-Cheng; Jia, Hui; Sinkler, Christopher; McGowen, Michael; Sterner, Kirstin N; Weckle, Amy; Sugalski, Amara B; Pipes, Lenore; Gatti, Domenico L; Mason, Christopher E; Sherwood, Chet C; Hof, Patrick R; Kuzawa, Christopher W; Grossman, Lawrence I; Goodman, Morris; Wildman, Derek E
2016-02-01
The human brain and human cognitive abilities are strikingly different from those of other great apes despite relatively modest genome sequence divergence. However, little is presently known about the interspecies divergence in gene structure and transcription that might contribute to these phenotypic differences. To date, most comparative studies of gene structure in the brain have examined humans, chimpanzees, and macaque monkeys. To add to this body of knowledge, we analyze here the brain transcriptome of the western lowland gorilla (Gorilla gorilla gorilla), an African great ape species that is phylogenetically closely related to humans, but with a brain that is approximately one-third the size. Manual transcriptome curation from a sample of the planum temporale region of the neocortex revealed 12 protein-coding genes and one noncoding-RNA gene with exons in the gorilla unmatched by public transcriptome data from the orthologous human loci. These interspecies gene structure differences accounted for a total of 134 amino acids in proteins found in the gorilla that were absent from protein products of the orthologous human genes. Proteins varying in structure between human and gorilla were involved in immunity and energy metabolism, suggesting their relevance to phenotypic differences. This gorilla neocortical transcriptome comprises an empirical, not homology- or prediction-driven, resource for orthologous gene comparisons between human and gorilla. These findings provide a unique repository of the sequences and structures of thousands of genes transcribed in the gorilla brain, pointing to candidate genes that may contribute to the traits distinguishing humans from other closely related great apes. © 2015 Wiley Periodicals, Inc.
Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia
2012-01-01
Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521
Dweep, Harsh; Sticht, Carsten; Pandey, Priyanka; Gretz, Norbert
2011-10-01
MicroRNAs are small, non-coding RNA molecules that can complementarily bind to the mRNA 3'-UTR region to regulate the gene expression by transcriptional repression or induction of mRNA degradation. Increasing evidence suggests a new mechanism by which miRNAs may regulate target gene expression by binding in promoter and amino acid coding regions. Most of the existing databases on miRNAs are restricted to mRNA 3'-UTR region. To address this issue, we present miRWalk, a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat. All mRNAs, mitochondrial genes and 10 kb upstream flanking regions of all known genes of human, mouse and rat were analyzed by using a newly developed algorithm named 'miRWalk' as well as with eight already established programs for putative miRNA binding sites. An automated and extensive text-mining search was performed on PubMed database to extract validated information on miRNAs. Combined information was put into a MySQL database. miRWalk presents predicted and validated information on miRNA-target interaction. Such a resource enables researchers to validate new targets of miRNA not only on 3'-UTR, but also on the other regions of all known genes. The 'Validated Target module' is updated every month and the 'Predicted Target module' is updated every 6 months. miRWalk is freely available at http://mirwalk.uni-hd.de/. Copyright © 2011 Elsevier Inc. All rights reserved.
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P
2015-04-14
Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for choline fermentation (the cut gene cluster) have been recently identified, there has been no characterization of these genes in human gut isolates and microbial communities. In this work, we use multiple approaches to demonstrate that the pathway encoded by the cut genes is present and functional in a diverse range of human gut bacteria and is also widespread in stool metagenomes. We also developed a PCR-based strategy to detect a key functional gene (cutC) involved in this pathway and applied it to characterize newly isolated choline-utilizing strains. Both our analyses of the cut gene cluster and this molecular tool will aid efforts to further understand the role of choline metabolism in the human gut microbiota and its link to disease. Copyright © 2015 Martínez-del Campo et al.
Elliot, Michael G.; Crespi, Bernard J.
2015-01-01
The relationship between phenotypic variation arising through individual development and phenotypic variation arising through diversification of species has long been a central question in evolutionary biology. Among humans, reduced placental invasion into endometrial tissues is associated with diseases of pregnancy, especially pre-eclampsia, and reduced placental invasiveness has also evolved, convergently, in at least 10 lineages of eutherian mammals. We tested the hypothesis that a common genetic basis underlies both reduced placental invasion arising through a developmental process in human placental disease and reduced placental invasion found as a derived trait in the diversification of Euarchontoglires (rodents, lagomorphs, tree shrews, colugos and primates). Based on whole-genome analyses across 18 taxa, we identified 1254 genes as having evolved adaptively across all three lineages exhibiting independent evolutionary transitions towards reduced placental invasion. These genes showed strong evidence of enrichment for associations with pre-eclampsia, based on genetic-association studies, gene-expression analyses and gene ontology. We further used in silico prediction to identify a subset of 199 genes that are likely targets of natural selection during transitions in placental invasiveness and which are predicted to also underlie human placental disorders. Our results indicate that abnormal ontogenies can recapitulate major phylogenetic shifts in mammalian evolution, identify new candidate genes for involvement in pre-eclampsia, imply that study of species with less-invasive placentation will provide useful insights into the regulation of placental invasion and pre-eclampsia, and recommend a novel comparative functional-evolutionary approach to the study of genetically based human disease and mammalian diversification. PMID:25602073
Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans
Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo
2012-01-01
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122
Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P
2014-06-01
With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.
Identification of Viscum album L. miRNAs and prediction of their medicinal values
Adolf, Jacob; Melzig, Matthias F.
2017-01-01
MicroRNAs (miRNAs) are a class of approximately 22 nucleotides single-stranded non-coding RNA molecules that play crucial roles in gene expression. It has been reported that the plant miRNAs might enter mammalian bloodstream and have a functional role in human metabolism, indicating that miRNAs might be one of the hidden bioactive ingredients in medicinal plants. Viscum album L. (Loranthaceae, European mistletoe) has been widely used for the treatment of cancer and cardiovascular diseases, but its functional compounds have not been well characterized. We considered that miRNAs might be involved in the pharmacological activities of V. album. High-throughput Illumina sequencing was performed to identify the novel and conserved miRNAs of V. album. The putative human targets were predicted. In total, 699 conserved miRNAs and 1373 novel miRNAs have been identified from V. album. Based on the combined use of TargetScan, miRanda, PITA, and RNAhybrid methods, the intersection of 30697 potential human genes have been predicted as putative targets of 29 novel miRNAs, while 14559 putative targets were highly enriched in 33 KEGG pathways. Interestingly, these highly enriched KEGG pathways were associated with some human diseases, especially cancer, cardiovascular diseases and neurological disorders, which might explain the clinical use as well as folk medicine use of mistletoe. However, further experimental validation is necessary to confirm these human targets of mistletoe miRNAs. Additionally, target genes involved in bioactive components synthesis in V. album were predicted as well. A total of 68 miRNAs were predicted to be involved in terpenoid biosynthesis, while two miRNAs including val-miR152 and miR9738 were predicted to target viscotoxins and lectins, respectively, which increased the knowledge regarding miRNA-based regulation of terpenoid biosynthesis, lectin and viscotoxin expressions in V. album. PMID:29112983
NASA Astrophysics Data System (ADS)
Sundaresan, A.; Pellis, N. R.
2005-08-01
Genetic response suites in human lymphocytes in response to microgravity are important to identify and further study in order to augment human physiological adaptation to novel environments. Emerging technologies, such as DNA micro array profiling, have the potential to identify novel genes that are involved in mediating adaptation to these environments. These genes may prove to be therapeutically valuable as new targets for countermeasures, or as predictive biomarkers of response to these new environments. Human lymphocytes cultured in 1g and microgravity analog culture were analyzed for their differential gene expression response. Different groups of genes related to the immune response, cardiovascular system and stress response were then analyzed. Analysis of cells from multiple donors reveals a small shared set that are likely to be essential to adaptation. These three groups focus on human adaptation to new environments. The shared set contains genes related to T cell activation, immune response and stress response to analog microgravity.
Rrp1b, a New Candidate Susceptibility Gene for Breast Cancer Progression and Metastasis
Crawford, Nigel P. S; Qian, Xiaolan; Ziogas, Argyrios; Papageorge, Alex G; Boersma, Brenda J; Walker, Renard C; Lukes, Luanne; Rowe, William L; Zhang, Jinghui; Ambs, Stefan; Lowy, Douglas R; Anton-Culver, Hoda; Hunter, Kent W
2007-01-01
A novel candidate metastasis modifier, ribosomal RNA processing 1 homolog B (Rrp1b), was identified through two independent approaches. First, yeast two-hybrid, immunoprecipitation, and functional assays demonstrated a physical and functional interaction between Rrp1b and the previous identified metastasis modifier Sipa1. In parallel, using mouse and human metastasis gene expression data it was observed that extracellular matrix (ECM) genes are common components of metastasis predictive signatures, suggesting that ECM genes are either important markers or causal factors in metastasis. To investigate the relationship between ECM genes and poor prognosis in breast cancer, expression quantitative trait locus analysis of polyoma middle-T transgene-induced mammary tumor was performed. ECM gene expression was found to be consistently associated with Rrp1b expression. In vitro expression of Rrp1b significantly altered ECM gene expression, tumor growth, and dissemination in metastasis assays. Furthermore, a gene signature induced by ectopic expression of Rrp1b in tumor cells predicted survival in a human breast cancer gene expression dataset. Finally, constitutional polymorphism within RRP1B was found to be significantly associated with tumor progression in two independent breast cancer cohorts. These data suggest that RRP1B may be a novel susceptibility gene for breast cancer progression and metastasis. PMID:18081427
New support vector machine-based method for microRNA target prediction.
Li, L; Gao, Q; Mao, X; Cao, Y
2014-06-09
MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.
In vitro transcriptomic prediction of hepatotoxicity for early drug discovery
Cheng, Feng; Theodorescu, Dan; Schulman, Ira G.; Lee, Jae K.
2012-01-01
Liver toxicity (hepatotoxicity) is a critical issue in drug discovery and development. Standard preclinical evaluation of drug hepatotoxicity is generally performed using in vivo animal systems. However, only a small number of preselected compounds can be examined in vivo due to high experimental costs. A more efficient yet accurate screening technique which can identify potentially hepatotoxic compounds in the early stages of drug development would thus be valuable. Here, we develop and apply a novel genomic prediction technique for screening hepatotoxic compounds based on in vitro human liver cell tests. Using a training set of in vivo rodent experiments for drug hepatotoxicity evaluation, we discovered common biomarkers of drug-induced liver toxicity among six heterogeneous compounds. This gene set was further triaged to a subset of 32 genes that can be used as a multi-gene expression signature to predict hepatotoxicity. This multi-gene predictor was independently validated and showed consistently high prediction performance on five test sets of in vitro human liver cell and in vivo animal toxicity experiments. The predictor also demonstrated utility in evaluating different degrees of toxicity in response to drug concentrations which may be useful not only for discerning a compound’s general hepatotoxicity but also for determining its toxic concentration. PMID:21884709
Accurate and sensitive quantification of protein-DNA binding affinity.
Rastogi, Chaitanya; Rube, H Tomas; Kribelbauer, Judith F; Crocker, Justin; Loker, Ryan E; Martini, Gabriella D; Laptenko, Oleg; Freed-Pastor, William A; Prives, Carol; Stern, David L; Mann, Richard S; Bussemaker, Harmen J
2018-04-17
Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes. Copyright © 2018 the Author(s). Published by PNAS.
Accurate and sensitive quantification of protein-DNA binding affinity
Rastogi, Chaitanya; Rube, H. Tomas; Kribelbauer, Judith F.; Crocker, Justin; Loker, Ryan E.; Martini, Gabriella D.; Laptenko, Oleg; Freed-Pastor, William A.; Prives, Carol; Stern, David L.; Mann, Richard S.; Bussemaker, Harmen J.
2018-01-01
Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes. PMID:29610332
Comprehensive human transcription factor binding site map for combinatory binding motifs discovery.
Müller-Molina, Arnoldo J; Schöler, Hans R; Araúzo-Bravo, Marcos J
2012-01-01
To know the map between transcription factors (TFs) and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs) have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.
Comprehensive Human Transcription Factor Binding Site Map for Combinatory Binding Motifs Discovery
Müller-Molina, Arnoldo J.; Schöler, Hans R.; Araúzo-Bravo, Marcos J.
2012-01-01
To know the map between transcription factors (TFs) and their binding sites is essential to reverse engineer the regulation process. Only about 10%–20% of the transcription factor binding motifs (TFBMs) have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory “DNA words.” From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%—far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of “DNA words,” newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters. PMID:23209563
Genomic analysis of a new mammalian distal-less gene: Dlx7.
Nakamura, S; Stock, D W; Wydner, K L; Bollekens, J A; Takeshita, K; Nagai, B M; Chiba, S; Kitamura, T; Freeland, T M; Zhao, Z; Minowada, J; Lawrence, J B; Weiss, K M; Ruddle, F H
1996-12-15
We have cloned a new Dlx gene (Dlx7) from human and mouse that may represent the mammalian orthologue of the newt gene NvHBox-5. The homeodomains of these genes are highly similar to all other vertebrate Dlx genes, and regions of similarity also exist between mammalian Dlx7 and a subset of vertebrate Dlx genes downstream of the homeodomain. The sequence divergence between human and mouse Dlx7 in these regions is greater than that predicted from comparisons of other vertebrate Dlx genes, however, and there is little sequence similarity upstream of the homeodomain both between these two genes and with other Dlx genes. We present evidence for alternative splicing of mouse Dlx7 upstream of the homeodomain that may account for some of this divergence. We have mapped human DLX7 distal to the 5' end of the HOXB cluster at an estimated distance of between 1 and 2 Mb by FISH. Both the human and the mouse Dlx7 are shown to be closely linked to Dlx3 in a convergently transcribed orientation. These mapping results support the possibility that vertebrate distal-less genes have been duplicated in concert with the Hox clusters.
A computational method for predicting regulation of human microRNAs on the influenza virus genome
2013-01-01
Background While it has been suggested that host microRNAs (miRNAs) may downregulate viral gene expression as an antiviral defense mechanism, such a mechanism has not been explored in the influenza virus for human flu studies. As it is difficult to conduct related experiments on humans, computational studies can provide some insight. Although many computational tools have been designed for miRNA target prediction, there is a need for cross-species prediction, especially for predicting viral targets of human miRNAs. However, finding putative human miRNAs targeting influenza virus genome is still challenging. Results We developed machine-learning features and conducted comprehensive data training for predicting interactions between H1N1 genome segments and host miRNA. We defined our seed region as the first ten nucleotides from the 5' end of the miRNA to the 3' end of the miRNA and integrated various features including the number of consecutive matching bases in the seed region of 10 bases, a triplet feature in seed regions, thermodynamic energy, penalty of bulges and wobbles at binding sites, and the secondary structure of viral RNA for the prediction. Conclusions Compared to general predictive models, our model fully takes into account the conservation patterns and features of viral RNA secondary structures, and greatly improves the prediction accuracy. Our model identified some key miRNAs including hsa-miR-489, hsa-miR-325, hsa-miR-876-3p and hsa-miR-2117, which target HA, PB2, MP and NS of H1N1, respectively. Our study provided an interesting hypothesis concerning the miRNA-based antiviral defense mechanism against influenza virus in human, i.e., the binding between human miRNA and viral RNAs may not result in gene silencing but rather may block the viral RNA replication. PMID:24565017
Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells
Freire-Pritchett, Paula; Schoenfelder, Stefan; Várnai, Csilla; Wingett, Steven W; Cairns, Jonathan; Collier, Amanda J; García-Vílchez, Raquel; Furlan-Magaril, Mayra; Osborne, Cameron S; Fraser, Peter; Rugg-Gunn, Peter J; Spivakov, Mikhail
2017-01-01
Long-range cis-regulatory elements such as enhancers coordinate cell-specific transcriptional programmes by engaging in DNA looping interactions with target promoters. Deciphering the interplay between the promoter connectivity and activity of cis-regulatory elements during lineage commitment is crucial for understanding developmental transcriptional control. Here, we use Promoter Capture Hi-C to generate a high-resolution atlas of chromosomal interactions involving ~22,000 gene promoters in human pluripotent and lineage-committed cells, identifying putative target genes for known and predicted enhancer elements. We reveal extensive dynamics of cis-regulatory contacts upon lineage commitment, including the acquisition and loss of promoter interactions. This spatial rewiring occurs preferentially with predicted changes in the activity of cis-regulatory elements and is associated with changes in target gene expression. Our results provide a global and integrated view of promoter interactome dynamics during lineage commitment of human pluripotent cells. DOI: http://dx.doi.org/10.7554/eLife.21926.001 PMID:28332981
Human-specific features of spatial gene expression and regulation in eight brain regions.
Xu, Chuan; Li, Qian; Efimova, Olga; He, Liu; Tatsumoto, Shoji; Stepanova, Vita; Oishi, Takao; Udono, Toshifumi; Yamaguchi, Katsushi; Shigenobu, Shuji; Kakita, Akiyoshi; Nawa, Hiroyuki; Khaitovich, Philipp; Go, Yasuhiro
2018-06-13
Molecular maps of the human brain alone do not inform us of the features unique to humans. Yet, the identification of these features is important for understanding both the evolution and nature of human cognition. Here, we approached this question by analyzing gene expression and H3K27ac chromatin modification data collected in eight brain regions of humans, chimpanzees, gorillas, a gibbon and macaques. An analysis of spatial transcriptome trajectories across eight brain regions in four primate species revealed 1,851 genes showing human-specific transcriptome differences in one or multiple brain regions, in contrast to 240 chimpanzee-specific ones. More than half of these human-specific differences represented elevated expression of genes enriched in neuronal and astrocytic markers in the human hippocampus, while the rest were enriched in microglial markers and displayed human-specific expression in several frontal cortical regions and the cerebellum. An analysis of the predicted regulatory interactions driving these differences revealed the role of transcription factors in species-specific transcriptome changes, while epigenetic modifications were linked to spatial expression differences conserved across species. Published by Cold Spring Harbor Laboratory Press.
Westhoff, Connie M.; Uy, Jon Michael; Aguad, Maria; Smeland‐Wagman, Robin; Kaufman, Richard M.; Rehm, Heidi L.; Green, Robert C.; Silberstein, Leslie E.
2015-01-01
BACKGROUND There are 346 serologically defined red blood cell (RBC) antigens and 33 serologically defined platelet (PLT) antigens, most of which have known genetic changes in 45 RBC or six PLT genes that correlate with antigen expression. Polymorphic sites associated with antigen expression in the primary literature and reference databases are annotated according to nucleotide positions in cDNA. This makes antigen prediction from next‐generation sequencing data challenging, since it uses genomic coordinates. STUDY DESIGN AND METHODS The conventional cDNA reference sequences for all known RBC and PLT genes that correlate with antigen expression were aligned to the human reference genome. The alignments allowed conversion of conventional cDNA nucleotide positions to the corresponding genomic coordinates. RBC and PLT antigen prediction was then performed using the human reference genome and whole genome sequencing (WGS) data with serologic confirmation. RESULTS Some major differences and alignment issues were found when attempting to convert the conventional cDNA to human reference genome sequences for the following genes: ABO, A4GALT, RHD, RHCE, FUT3, ACKR1 (previously DARC), ACHE, FUT2, CR1, GCNT2, and RHAG. However, it was possible to create usable alignments, which facilitated the prediction of all RBC and PLT antigens with a known molecular basis from WGS data. Traditional serologic typing for 18 RBC antigens were in agreement with the WGS‐based antigen predictions, providing proof of principle for this approach. CONCLUSION Detailed mapping of conventional cDNA annotated RBC and PLT alleles can enable accurate prediction of RBC and PLT antigens from whole genomic sequencing data. PMID:26634332
Roles for text mining in protein function prediction.
Verspoor, Karin M
2014-01-01
The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.
NASA Astrophysics Data System (ADS)
Park, Solip; Yang, Jae-Seong; Kim, Jinho; Shin, Young-Eun; Hwang, Jihye; Park, Juyong; Jang, Sung Key; Kim, Sanguk
2012-10-01
The extent to which evolutionary changes have impacted the phenotypic relationships among human diseases remains unclear. In this work, we report that phenotypically similar diseases are connected by the evolutionary constraints on human disease genes. Human disease groups can be classified into slowly or rapidly evolving classes, where the diseases in the slowly evolving class are enriched with morphological phenotypes and those in the rapidly evolving class are enriched with physiological phenotypes. Our findings establish a clear evolutionary connection between disease classes and disease phenotypes for the first time. Furthermore, the high comorbidity found between diseases connected by similar evolutionary constraints enables us to improve the predictability of the relative risk of human diseases. We find the evolutionary constraints on disease genes are a new layer of molecular connection in the network-based exploration of human diseases.
Park, Solip; Yang, Jae-Seong; Kim, Jinho; Shin, Young-Eun; Hwang, Jihye; Park, Juyong; Jang, Sung Key; Kim, Sanguk
2012-01-01
The extent to which evolutionary changes have impacted the phenotypic relationships among human diseases remains unclear. In this work, we report that phenotypically similar diseases are connected by the evolutionary constraints on human disease genes. Human disease groups can be classified into slowly or rapidly evolving classes, where the diseases in the slowly evolving class are enriched with morphological phenotypes and those in the rapidly evolving class are enriched with physiological phenotypes. Our findings establish a clear evolutionary connection between disease classes and disease phenotypes for the first time. Furthermore, the high comorbidity found between diseases connected by similar evolutionary constraints enables us to improve the predictability of the relative risk of human diseases. We find the evolutionary constraints on disease genes are a new layer of molecular connection in the network-based exploration of human diseases.
Many human accelerated regions are developmental enhancers
Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.
2013-01-01
The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
Predicting effects of structural stress in a genome-reduced model bacterial metabolism
NASA Astrophysics Data System (ADS)
Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles
2012-08-01
Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.
Defining the optimal animal model for translational research using gene set enrichment analysis.
Weidner, Christopher; Steinfath, Matthias; Opitz, Elisa; Oelgeschläger, Michael; Schönfelder, Gilbert
2016-08-01
The mouse is the main model organism used to study the functions of human genes because most biological processes in the mouse are highly conserved in humans. Recent reports that compared identical transcriptomic datasets of human inflammatory diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. To reduce susceptibility to biased interpretation, all genes of interest for the biological question under investigation should be considered. Thus, standardized approaches for systematic data analysis are needed. We analyzed the same datasets using gene set enrichment analysis focusing on pathways assigned to inflammatory processes in either humans or mice. The analyses revealed a moderate overlap between all human and mouse datasets, with average positive and negative predictive values of 48 and 57% significant correlations. Subgroups of the septic mouse models (i.e., Staphylococcus aureus injection) correlated very well with most human studies. These findings support the applicability of targeted strategies to identify the optimal animal model and protocol to improve the success of translational research. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.
Identification and characterization of human GUKH2 gene in silico.
Katoh, Masuko; Katoh, Masaru
2004-04-01
Drosophila Guanylate-kinase holder (Gukh) is an adaptor molecule bridging Discs large (Dlg) and Scribble (Scrib), which are implicated in the establishment and maintenance of epithelial polarity. Here, we searched for human homologs of Drosophila gukh by using bioinformatics, and identified GUKH1 and GUKH2 genes. GUKH1 was identical to Nance-Horan syndrome (NHS) gene, while GUKH2 was a novel gene. FLJ35425 (AK092744.1), DKFZp686P1949 (BX647246.1) and KIAA1357 (AB037778.1) cDNAs were derived from human GUKH2 gene. Nucleotide sequence of GUKH2 cDNA was determined by assembling 5'-part of FLJ35425 cDNA and entire region of DKFZp686P1949 cDNA. Human GUKH2 gene consists of 8 exons. Exon 5 (132 bp) of GUKH2 gene was spliced out in GUKH2 cDNA due to alternative splicing. GUKH2-REPS1 locus at human chromosome 6q24.1 and GUKH1-REPS2 locus at human chromosome Xp22.22-p22.13 are paralogous regions within the human genome. Mouse Gukh2 and zebrafish gukh2 genes were also identified. N-terminal part of human GUKH2, mouse Gukh2 and zebrafish gukh2 proteins were completely divergent from human GUKH1 protein. Human GUKH2 and GUKH1, consisting of eight GUKH homology (GKH1-GKH8) domains and Proline-rich domain, showed 28.5% total-amino-acid identity. GKH1, GKH4, GKH5, GKH7 and GKH8 domains were conserved among human GUKH1, human GUKH2 and Drosophila Gukh. Because human homologs of Drosophila dlg (DLG1-DLG7) as well as human homologs of Drosophila scrib (SCRIB, ERBB2IP and Densin-180) are cancer-associated genes, human homologs of Drosophila gukh (GUKH1 and GUKH2) are predicted cancer-associated genes.
Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.
2010-01-01
SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513
Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F
2010-12-01
Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.
Microarray analysis in rat liver slices correctly predicts in vivo hepatotoxicity.
Elferink, M G L; Olinga, P; Draaisma, A L; Merema, M T; Bauerschmidt, S; Polman, J; Schoonen, W G; Groothuis, G M M
2008-06-15
The microarray technology, developed for the simultaneous analysis of a large number of genes, may be useful for the detection of toxicity in an early stage of the development of new drugs. The effect of different hepatotoxins was analyzed at the gene expression level in the rat liver both in vivo and in vitro. As in vitro model system the precision-cut liver slice model was used, in which all liver cell types are present in their natural architecture. This is important since drug-induced toxicity often is a multi-cellular process involving not only hepatocytes but also other cell types such as Kupffer and stellate cells. As model toxic compounds lipopolysaccharide (LPS, inducing inflammation), paracetamol (necrosis), carbon tetrachloride (CCl(4), fibrosis and necrosis) and gliotoxin (apoptosis) were used. The aim of this study was to validate the rat liver slice system as in vitro model system for drug-induced toxicity studies. The results of the microarray studies show that the in vitro profiles of gene expression cluster per compound and incubation time, and when analyzed in a commercial gene expression database, can predict the toxicity and pathology observed in vivo. Each toxic compound induces a specific pattern of gene expression changes. In addition, some common genes were up- or down-regulated with all toxic compounds. These data show that the rat liver slice system can be an appropriate tool for the prediction of multi-cellular liver toxicity. The same experiments and analyses are currently performed for the prediction of human specific toxicity using human liver slices.
Microarray analysis in rat liver slices correctly predicts in vivo hepatotoxicity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elferink, M.G.L.; Olinga, P.; Draaisma, A.L.
2008-06-15
The microarray technology, developed for the simultaneous analysis of a large number of genes, may be useful for the detection of toxicity in an early stage of the development of new drugs. The effect of different hepatotoxins was analyzed at the gene expression level in the rat liver both in vivo and in vitro. As in vitro model system the precision-cut liver slice model was used, in which all liver cell types are present in their natural architecture. This is important since drug-induced toxicity often is a multi-cellular process involving not only hepatocytes but also other cell types such asmore » Kupffer and stellate cells. As model toxic compounds lipopolysaccharide (LPS, inducing inflammation), paracetamol (necrosis), carbon tetrachloride (CCl{sub 4}, fibrosis and necrosis) and gliotoxin (apoptosis) were used. The aim of this study was to validate the rat liver slice system as in vitro model system for drug-induced toxicity studies. The results of the microarray studies show that the in vitro profiles of gene expression cluster per compound and incubation time, and when analyzed in a commercial gene expression database, can predict the toxicity and pathology observed in vivo. Each toxic compound induces a specific pattern of gene expression changes. In addition, some common genes were up- or down-regulated with all toxic compounds. These data show that the rat liver slice system can be an appropriate tool for the prediction of multi-cellular liver toxicity. The same experiments and analyses are currently performed for the prediction of human specific toxicity using human liver slices.« less
APADB: a database for alternative polyadenylation and microRNA regulation events
Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn
2014-01-01
Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703
Fatakia, Sarosh N.; Mehta, Ishita S.; Rao, Basuthkar J.
2016-01-01
Forty-six chromosome territories (CTs) are positioned uniquely in human interphase nuclei, wherein each of their positions can range from the centre of the nucleus to its periphery. A non-empirical basis for their non-random arrangement remains unreported. Here, we derive a suprachromosomal basis of that overall arrangement (which we refer to as a CT constellation), and report a hierarchical nature of the same. Using matrix algebra, we unify intrinsic chromosomal parameters (e.g., chromosomal length, gene density, the number of genes per chromosome), to derive an extrinsic effective gene density matrix, the hierarchy of which is dominated largely by extrinsic mathematical coupling of HSA19, followed by HSA17 (human chromosome 19 and 17, both preferentially interior CTs) with all CTs. We corroborate predicted constellations and effective gene density hierarchy with published reports from fluorescent in situ hybridization based microscopy and Hi-C techniques, and delineate analogous hierarchy in disparate vertebrates. Our theory accurately predicts CTs localised to the nuclear interior, which interestingly share conserved synteny with HSA19 and/or HSA17. Finally, the effective gene density hierarchy dictates how permutations among CT position represents the plasticity within its constellations, based on which we suggest that a differential mix of coding with noncoding genome modulates the same. PMID:27845379
Co-clustering phenome–genome for phenotype classification and disease gene discovery
Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui
2012-01-01
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways. PMID:22735708
nGASP--the nematode genome annotation assessment project.
Coghlan, Avril; Fiedler, Tristan J; McKay, Sheldon J; Flicek, Paul; Harris, Todd W; Blasiar, Darin; Stein, Lincoln D
2008-12-19
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
Nayak, Renuka R.; Kearns, Michael; Spielman, Richard S.; Cheung, Vivian G.
2009-01-01
Genes interact in networks to orchestrate cellular processes. Analysis of these networks provides insights into gene interactions and functions. Here, we took advantage of normal variation in human gene expression to infer gene networks, which we constructed using correlations in expression levels of more than 8.5 million gene pairs in immortalized B cells from three independent samples. The resulting networks allowed us to identify biological processes and gene functions. Among the biological pathways, we found processes such as translation and glycolysis that co-occur in the same subnetworks. We predicted the functions of poorly characterized genes, including CHCHD2 and TMEM111, and provided experimental evidence that TMEM111 is part of the endoplasmic reticulum-associated secretory pathway. We also found that IFIH1, a susceptibility gene of type 1 diabetes, interacts with YES1, which plays a role in glucose transport. Furthermore, genes that predispose to the same diseases are clustered nonrandomly in the coexpression network, suggesting that networks can provide candidate genes that influence disease susceptibility. Therefore, our analysis of gene coexpression networks offers information on the role of human genes in normal and disease processes. PMID:19797678
Major Shifts in Glial Regional Identity Are a Transcriptional Hallmark of Human Brain Aging.
Soreq, Lilach; Rose, Jamie; Soreq, Eyal; Hardy, John; Trabzuni, Daniah; Cookson, Mark R; Smith, Colin; Ryten, Mina; Patani, Rickie; Ule, Jernej
2017-01-10
Gene expression studies suggest that aging of the human brain is determined by a complex interplay of molecular events, although both its region- and cell-type-specific consequences remain poorly understood. Here, we extensively characterized aging-altered gene expression changes across ten human brain regions from 480 individuals ranging in age from 16 to 106 years. We show that astrocyte- and oligodendrocyte-specific genes, but not neuron-specific genes, shift their regional expression patterns upon aging, particularly in the hippocampus and substantia nigra, while the expression of microglia- and endothelial-specific genes increase in all brain regions. In line with these changes, high-resolution immunohistochemistry demonstrated decreased numbers of oligodendrocytes and of neuronal subpopulations in the aging brain cortex. Finally, glial-specific genes predict age with greater precision than neuron-specific genes, thus highlighting the need for greater mechanistic understanding of neuron-glia interactions in aging and late-life diseases. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders
Mutch, David M.; Temanni, M. Ramzi; Henegar, Corneliu; Combes, Florence; Pelloux, Véronique; Holst, Claus; Sørensen, Thorkild I. A.; Astrup, Arne; Martinez, J. Alfredo; Saris, Wim H. M.; Viguerie, Nathalie; Langin, Dominique; Zucker, Jean-Daniel; Clément, Karine
2007-01-01
Background The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. Methodology/Principal Findings The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8–12 kgs weight loss) could always be differentiated from non-responders (<4 kgs weight loss). We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box) approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%±8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier) improved prediction accuracy to 80.9%±2.2%. Conclusion Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition. PMID:18094752
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.
Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen
2016-02-02
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A.; Marks, Jonathan A.; Haiser, Henry J.; Turnbaugh, Peter J.
2015-01-01
ABSTRACT Elucidation of the molecular mechanisms underlying the human gut microbiota’s effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. PMID:25873372
Rella, Monika; Elliot, Joann L; Revett, Timothy J; Lanfear, Jerry; Phelan, Anne; Jackson, Richard M; Turner, Anthony J; Hooper, Nigel M
2007-01-01
Background Mammalian angiotensin converting enzyme (ACE) plays a key role in blood pressure regulation. Although multiple ACE-like proteins exist in non-mammalian organisms, to date only one other ACE homologue, ACE2, has been identified in mammals. Results Here we report the identification and characterisation of the gene encoding a third homologue of ACE, termed ACE3, in several mammalian genomes. The ACE3 gene is located on the same chromosome downstream of the ACE gene. Multiple sequence alignment and molecular modelling have been employed to characterise the predicted ACE3 protein. In mouse, rat, cow and dog, the predicted protein has mutations in some of the critical residues involved in catalysis, including the catalytic Glu in the HEXXH zinc binding motif which is Gln, and ESTs or reverse-transcription PCR indicate that the gene is expressed. In humans, the predicted ACE3 protein has an intact HEXXH motif, but there are other deletions and insertions in the gene and no ESTs have been identified. Conclusion In the genomes of several mammalian species there is a gene that encodes a novel, single domain ACE-like protein, ACE3. In mouse, rat, cow and dog ACE3, the catalytic Glu is replaced by Gln in the putative zinc binding motif, indicating that in these species ACE3 would lack catalytic activity as a zinc metalloprotease. In humans, no evidence was found that the ACE3 gene is expressed and the presence of deletions and insertions in the sequence indicate that ACE3 is a pseudogene. PMID:17597519
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes
Kurokawa, Ken; Itoh, Takehiko; Kuwahara, Tomomi; Oshima, Kenshiro; Toh, Hidehiro; Toyoda, Atsushi; Takami, Hideto; Morita, Hidetoshi; Sharma, Vineet K.; Srivastava, Tulika P.; Taylor, Todd D.; Noguchi, Hideki; Mori, Hiroshi; Ogura, Yoshitoshi; Ehrlich, Dusko S.; Itoh, Kikuji; Takagi, Toshihisa; Sakaki, Yoshiyuki; Hayashi, Tetsuya; Hattori, Masahira
2007-01-01
Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal microbiomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes. PMID:17916580
Ensemble positive unlabeled learning for disease gene identification.
Yang, Peng; Li, Xiaoli; Chua, Hon-Nian; Kwoh, Chee-Keong; Ng, See-Kiong
2014-01-01
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Ritchie, Marylyn D; White, Bill C; Parker, Joel S; Hahn, Lance W; Moore, Jason H
2003-01-01
Background Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. Results Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. Conclusion This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases. PMID:12846935
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags
de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.
2000-01-01
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
HEMATOPOIETIC STEM CELL GENE THERAPY: ASSESSING THE RELEVANCE OF PRE-CLINICAL MODELS
Larochelle, Andre; Dunbar, Cynthia E.
2013-01-01
The modern laboratory mouse has become a central tool for biomedical research with a notable influence in the field of hematopoiesis. Application of retroviral-based gene transfer approaches to mouse hematopoietic stem cells (HSCs) has led to a sophisticated understanding of the hematopoietic hierarchy in this model. However, the assumption that gene transfer methodologies developed in the mouse could be similarly applied to human HSCs for the treatment of human diseases left the field of gene therapy in a decade-long quandary. It is not until more relevant humanized xenograft mouse models and phylogenetically related large animal species were used to optimize gene transfer methodologies that unequivocal clinical successes were achieved. However, the subsequent reporting of severe adverse events in these clinical trials casted doubts on the predictive value of conventional pre-clinical testing, and encouraged the development of new assays for assessing the relative genotoxicity of various vector designs. PMID:24014892
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Regulation of human genome expression and RNA splicing by human papillomavirus 16 E2 protein.
Gauson, Elaine J; Windle, Brad; Donaldson, Mary M; Caffarel, Maria M; Dornan, Edward S; Coleman, Nicholas; Herzyk, Pawel; Henderson, Scott C; Wang, Xu; Morgan, Iain M
2014-11-01
Human papillomavirus 16 (HPV16) is causative in human cancer. The E2 protein regulates transcription from and replication of the viral genome; the role of E2 in regulating the host genome has been less well studied. We have expressed HPV16 E2 (E2) stably in U2OS cells; these cells tolerate E2 expression well and gene expression analysis identified 74 genes showing differential expression specific to E2. Analysis of published gene expression data sets during cervical cancer progression identified 20 of the genes as being altered in a similar direction as the E2 specific genes. In addition, E2 altered the splicing of many genes implicated in cancer and cell motility. The E2 expressing cells showed no alteration in cell growth but were altered in cell motility, consistent with the E2 induced altered splicing predicted to affect this cellular function. The results present a model system for investigating E2 regulation of the host genome. Copyright © 2014 Elsevier Inc. All rights reserved.
Zhou, Hang; Yang, Yang; Shen, Hong-Bin
2017-03-15
Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Zhao, Dejian; Lin, Mingyan; Pedrosa, Erika; Lachman, Herbert M; Zheng, Deyou
2017-11-10
Monoallelic expression of autosomal genes has been implicated in human psychiatric disorders. However, there is a paucity of allelic expression studies in human brain cells at the single cell and genome wide levels. In this report, we reanalyzed a previously published single-cell RNA-seq dataset from several postmortem human brains and observed pervasive monoallelic expression in individual cells, largely in a random manner. Examining single nucleotide variants with a predicted functional disruption, we found that the "damaged" alleles were overall expressed in fewer brain cells than their counterparts, and at a lower level in cells where their expression was detected. We also identified many brain cell type-specific monoallelically expressed genes. Interestingly, many of these cell type-specific monoallelically expressed genes were enriched for functions important for those brain cell types. In addition, function analysis showed that genes displaying monoallelic expression and correlated expression across neuronal cells from different individual brains were implicated in the regulation of synaptic function. Our findings suggest that monoallelic gene expression is prevalent in human brain cells, which may play a role in generating cellular identity and neuronal diversity and thus increasing the complexity and diversity of brain cell functions.
Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni
2005-01-01
Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
PRISM offers a comprehensive genomic approach to transcription factor function prediction
Wenger, Aaron M.; Clarke, Shoa L.; Guturu, Harendra; Chen, Jenny; Schaar, Bruce T.; McLean, Cory Y.; Bejerano, Gill
2013-01-01
The human genome encodes 1500–2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells. PMID:23382538
Kohonen, Pekka; Parkkinen, Juuso A.; Willighagen, Egon L.; Ceder, Rebecca; Wennerberg, Krister; Kaski, Samuel; Grafström, Roland C.
2017-01-01
Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a ‘big data compacting and data fusion’—concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a ‘predictive toxicogenomics space’ (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving ∼2.5 × 108 data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy. PMID:28671182
Evaluating the evaluation of cancer driver genes
Tokheim, Collin J.; Papadopoulos, Nickolas; Kinzler, Kenneth W.; Vogelstein, Bert; Karchin, Rachel
2016-01-01
Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machine-learning–based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future. PMID:27911828
Current understanding of mdig/MINA in human cancers
Thakur, Chitra; Chen, Fei
2015-01-01
Mineral dust-induced gene, mdig has recently been identified and is known to be overexpressed in a majority of human cancers and holds predictive power in the poor prognosis of the disease. Mdig is an environmentally expressed gene that is involved in cell proliferation, neoplastic transformation and immune regulation. With the advancement in deciphering the prognostic role of mdig in human cancers, our understanding on how mdig renders a normal cell to undergo malignant transformation is still very limited. This article reviews the current knowledge of the mdig gene in context to human neoplasias and its relation to the clinico-pathologic factors predicting the outcome of the disease in patients. It also emphasizes on the promising role of mdig that can serve as a potential candidate for biomarker discovery and as a therapeutic target in inflammation and cancers. Considering the recent advances in understanding the underlying mechanisms of tumor formation, more preclinical and clinical research is required to validate the potential of using mdig as a novel biological target of therapeutic and diagnostic value. Summary Expression level of mdig influences the prognosis of several human cancers especially cancers of the breast and lung. Evaluation of mdig in cancers can offer novel biomarker with potential therapeutic interventions for the early assessment of cancer development in patients. PMID:26413213
Current understanding of mdig/MINA in human cancers.
Thakur, Chitra; Chen, Fei
2015-07-01
Mineral dust-induced gene, mdig has recently been identified and is known to be overexpressed in a majority of human cancers and holds predictive power in the poor prognosis of the disease. Mdig is an environmentally expressed gene that is involved in cell proliferation, neoplastic transformation and immune regulation. With the advancement in deciphering the prognostic role of mdig in human cancers, our understanding on how mdig renders a normal cell to undergo malignant transformation is still very limited. This article reviews the current knowledge of the mdig gene in context to human neoplasias and its relation to the clinico-pathologic factors predicting the outcome of the disease in patients. It also emphasizes on the promising role of mdig that can serve as a potential candidate for biomarker discovery and as a therapeutic target in inflammation and cancers. Considering the recent advances in understanding the underlying mechanisms of tumor formation, more preclinical and clinical research is required to validate the potential of using mdig as a novel biological target of therapeutic and diagnostic value. Expression level of mdig influences the prognosis of several human cancers especially cancers of the breast and lung. Evaluation of mdig in cancers can offer novel biomarker with potential therapeutic interventions for the early assessment of cancer development in patients.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loots, G G; Ovcharenko, I; Collette, N
2007-02-26
Generating the sequence of the human genome represents a colossal achievement for science and mankind. The technical use for the human genome project information holds great promise to cure disease, prevent bioterror threats, as well as to learn about human origins. Yet converting the sequence data into biological meaningful information has not been immediately obvious, and we are still in the preliminary stages of understanding how the genome is organized, what are the functional building blocks and how do these sequences mediate complex biological processes. The overarching goal of this program was to develop novel methods and high throughput strategiesmore » for determining the functions of ''anonymous'' human genes that are evolutionarily deeply conserved in other vertebrates. We coupled analytical tool development and computational predictions regarding gene function with novel high throughput experimental strategies and tested biological predictions in the laboratory. The tools required for comparative genomic data-mining are fundamentally the same whether they are applied to scientific studies of related microbes or the search for functions of novel human genes. For this reason the tools, conceptual framework and the coupled informatics-experimental biology paradigm we developed in this LDRD has many potential scientific applications relevant to LLNL multidisciplinary research in bio-defense, bioengineering, bionanosciences and microbial and environmental genomics.« less
Lumsden, Amanda L; Ma, Yuefang; Ashander, Liam M; Stempel, Andrew J; Keating, Damien J; Smith, Justine R; Appukuttan, Binoy
2018-05-09
Regulation of intercellular adhesion molecule (ICAM)-1 in retinal endothelial cells is a promising druggable target for retinal vascular diseases. The ICAM-1-related (ICR) long non-coding RNA stabilizes ICAM-1 transcript, increasing protein expression. However, studies of ICR involvement in disease have been limited as the promoter is uncharacterized. To address this issue, we undertook a comprehensive in silico analysis of the human ICR gene promoter region. We used genomic evolutionary rate profiling to identify a 115 base pair (bp) sequence within 500 bp upstream of the transcription start site of the annotated human ICR gene that was conserved across 25 eutherian genomes. A second constrained sequence upstream of the orthologous mouse gene (68 bp; conserved across 27 Eutherian genomes including human) was also discovered. Searching these elements identified 33 matrices predictive of binding sites for transcription factors known to be responsive to a broad range of pathological stimuli, including hypoxia, and metabolic and inflammatory proteins. Five phenotype-associated single nucleotide polymorphisms (SNPs) in the immediate vicinity of these elements included four SNPs (i.e. rs2569693, rs281439, rs281440 and rs11575074) predicted to impact binding motifs of transcription factors, and thus the expression of ICR and ICAM-1 genes, with potential to influence disease susceptibility. We verified that human retinal endothelial cells expressed ICR, and observed induction of expression by tumor necrosis factor-α.
Hwang, Shin-Rong; Garza, Christina Z; Wegrzyn, Jill; Hook, Vivian Y H
2004-08-16
This study demonstrates utilization of the novel GTG initiation codon for translation of a human mRNA transcript that encodes the serpin endopin 2B, a protease inhibitor. Molecular cloning revealed the nucleotide sequence of the human endopin 2B cDNA. Its deduced primary sequence shows high homology to bovine endopin 2A that possesses cross-class protease inhibition of elastase and papain. Notably, the human endopin 2B cDNA sequence revealed GTG as the predicted translation initiation codon; the predicted translation product of 46 kDa endopin 2B was produced by in vitro translation of 35S-endopin 2B with mammalian (rabbit) protein translation components. Importantly, bioinformatic studies demonstrated the presence of the entire human endopin 2B cDNA sequence with GTG as initiation codon within the human genome on chromosome 14. Further evidence for GTG as a functional initiation codon was illustrated by GTG-mediated in vitro translation of the heterologous protein EGFP, and by GTG-mediated expression of EGFP in mammalian PC12 cells. Mutagenesis of GTG to GTC resulted in the absence of EGFP expression in PC12 cells, indicating the function of GTG as an initiation codon. In addition, it was apparent that the GTG initiation codon produces lower levels of translated protein compared to ATG as initiation codon. Significantly, GTG-mediated translation of endopin 2B demonstrates a functional human gene product not previously predicted from initial analyses of the human genome. Further analyses based on GTG as an alternative initiation codon may predict new candidate genes of the human genome.
Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R
2008-01-01
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
Diop, Awa; Diop, Khoudia; Tomei, Enora; Raoult, Didier; Fenollar, Florence; Fournier, Pierre-Edouard
2018-03-01
We report here the draft genome sequence of Ezakiella peruensis strain M6.X2 T The draft genome is 1,672,788 bp long and harbors 1,589 predicted protein-encoding genes, including 26 antibiotic resistance genes with 1 gene encoding vancomycin resistance. The genome also exhibits 1 clustered regularly interspaced short palindromic repeat region and 333 genes acquired by horizontal gene transfer. Copyright © 2018 Diop et al.
Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature
Pozzoli, Uberto; Elgar, Greg; Cagliani, Rachele; Riva, Laura; Comi, Giacomo P.; Bresolin, Nereo; Bardoni, Alessandra; Sironi, Manuela
2003-01-01
The human DMD gene is the largest known to date, spanning > 2000 kb on the X chromosome. The gene size is mainly accounted for by huge intronic regions. We sequenced 190 kb of Fugu rubripes (pufferfish) genomic DNA corresponding to the complete dystrophin gene (FrDMD) and provide the first report of gene structure and sequence comparison among dystrophin genomic sequences from different vertebrate organisms. Almost all intron positions and phases are conserved between FrDMD and its mammalian counterparts, and the predicted protein product of the Fugu gene displays 55% identity and 71% similarity to human dystrophin. In analogy to the human gene, FrDMD presents several-fold longer than average intronic regions. Analysis of intron sequences of the human and murine genes revealed that they are extremely conserved in size and that a similar fraction of total intron length is represented by repetitive elements; moreover, our data indicate that intron expansion through repeat accumulation in the two orthologs is the result of independent insertional events. The hypothesis that intron length might be functionally relevant to the DMD gene regulation is proposed and substantiated by the finding that dystrophin intron gigantism is common to the three vertebrate genes. [Supplemental material is available online at www.genome.org.] PMID:12727896
Defining a Cancer Dependency Map | Office of Cancer Genomics
Most human epithelial tumors harbor numerous alterations, making it difficult to predict which genes are required for tumor survival. To systematically identify cancer dependencies, we analyzed 501 genome-scale loss-of-function screens performed in diverse human cancer cell lines. We developed DEMETER, an analytical framework that segregates on- from off-target effects of RNAi. 769 genes were differentially required in subsets of these cell lines at a threshold of six SDs from the mean.
Dopamine Receptor D4 Gene Variation Predicts Preschoolers' Developing Theory of Mind
ERIC Educational Resources Information Center
Lackner, Christine; Sabbagh, Mark A.; Hallinan, Elizabeth; Liu, Xudong; Holden, Jeanette J. A.
2012-01-01
Individual differences in preschoolers' understanding that human action is caused by internal mental states, or representational theory of mind (RTM), are heritable, as are developmental disorders such as autism in which RTM is particularly impaired. We investigated whether polymorphisms of genes affecting dopamine (DA) utilization and metabolism…
The EPA’s vision for the Endocrine Disruptor Screening Program (EDSP) in the 21st Century (EDSP21) includes utilization of high-throughput screening (HTS) assays coupled with computational modeling to prioritize chemicals with the goal of eventually replacing current Tier 1...
Linel, Patrice; Wu, Shuang; Deng, Nan; Wu, Hulin
2014-10-01
Recent studies demonstrate that human blood transcriptional signatures may be used to support diagnosis and clinical decisions for acute respiratory viral infections such as influenza. In this article, we propose to use a newly developed systems biology approach for time course gene expression data to identify significant dynamically response genes and dynamic gene network responses to viral infection. We illustrate the methodological pipeline by reanalyzing the time course gene expression data from a study with healthy human subjects challenged by live influenza virus. We observed clear differences in the number of significant dynamic response genes (DRGs) between the symptomatic and asymptomatic subjects and also identified DRG signatures for symptomatic subjects with influenza infection. The 505 common DRGs shared by the symptomatic subjects have high consistency with the signature genes for predicting viral infection identified in previous works. The temporal response patterns and network response features were carefully analyzed and investigated.
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia
2009-01-01
Background microRNAs (miRNAs) are single-stranded RNA molecules of about 20–23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. Results GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. Conclusion GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA. PMID:19534746
Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia
2009-06-16
microRNAs (miRNAs) are single-stranded RNA molecules of about 20-23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA.
Li, Chengzhe; Ai, Rizi; Wang, Mengchi; Firestein, Gary S.; Wang, Wei
2016-01-01
Motivation: DNA methylation signatures in rheumatoid arthritis (RA) have been identified in fibroblast-like synoviocytes (FLS) with Illumina HumanMethylation450 array. Since <2% of CpG sites are covered by the Illumina 450K array and whole genome bisulfite sequencing is still too expensive for many samples, computationally predicting DNA methylation levels based on 450K data would be valuable to discover more RA-related genes. Results: We developed a computational model that is trained on 14 tissues with both whole genome bisulfite sequencing and 450K array data. This model integrates information derived from the similarity of local methylation pattern between tissues, the methylation information of flanking CpG sites and the methylation tendency of flanking DNA sequences. The predicted and measured methylation values were highly correlated with a Pearson correlation coefficient of 0.9 in leave-one-tissue-out cross-validations. Importantly, the majority (76%) of the top 10% differentially methylated loci among the 14 tissues was correctly detected using the predicted methylation values. Applying this model to 450K data of RA, osteoarthritis and normal FLS, we successfully expanded the coverage of CpG sites 18.5-fold and accounts for about 30% of all the CpGs in the human genome. By integrative omics study, we identified genes and pathways tightly related to RA pathogenesis, among which 12 genes were supported by triple evidences, including 6 genes already known to perform specific roles in RA and 6 genes as new potential therapeutic targets. Availability and implementation: The source code, required data for prediction, and demo data for test are freely available at: http://wanglab.ucsd.edu/star/LR450K/. Contact: wei-wang@ucsd.edu or gfirestein@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26883487
Yanez, Livia Z.; Han, Jinnuo; Behr, Barry B.; Pera, Renee A. Reijo; Camarillo, David B.
2016-01-01
The causes of embryonic arrest during pre-implantation development are poorly understood. Attempts to correlate patterns of oocyte gene expression with successful embryo development have been hampered by the lack of reliable and nondestructive predictors of viability at such an early stage. Here we report that zygote viscoelastic properties can predict blastocyst formation in humans and mice within hours after fertilization, with >90% precision, 95% specificity and 75% sensitivity. We demonstrate that there are significant differences between the transcriptomes of viable and non-viable zygotes, especially in expression of genes important for oocyte maturation. In addition, we show that low-quality oocytes may undergo insufficient cortical granule release and zona-hardening, causing altered mechanics after fertilization. Our results suggest that embryo potential is largely determined by the quality and maturation of the oocyte before fertilization, and can be predicted through a minimally invasive mechanical measurement at the zygote stage. PMID:26904963
Pettigrew, Christopher; Wayte, Nicola; Lovelock, Paul K; Tavtigian, Sean V; Chenevix-Trench, Georgia; Spurdle, Amanda B; Brown, Melissa A
2005-01-01
Introduction Aberrant pre-mRNA splicing can be more detrimental to the function of a gene than changes in the length or nature of the encoded amino acid sequence. Although predicting the effects of changes in consensus 5' and 3' splice sites near intron:exon boundaries is relatively straightforward, predicting the possible effects of changes in exonic splicing enhancers (ESEs) remains a challenge. Methods As an initial step toward determining which ESEs predicted by the web-based tool ESEfinder in the breast cancer susceptibility gene BRCA1 are likely to be functional, we have determined their evolutionary conservation and compared their location with known BRCA1 sequence variants. Results Using the default settings of ESEfinder, we initially detected 669 potential ESEs in the coding region of the BRCA1 gene. Increasing the threshold score reduced the total number to 464, while taking into consideration the proximity to splice donor and acceptor sites reduced the number to 211. Approximately 11% of these ESEs (23/211) either are identical at the nucleotide level in human, primates, mouse, cow, dog and opossum Brca1 (conserved) or are detectable by ESEfinder in the same position in the Brca1 sequence (shared). The frequency of conserved and shared predicted ESEs between human and mouse is higher in BRCA1 exons (2.8 per 100 nucleotides) than in introns (0.6 per 100 nucleotides). Of conserved or shared putative ESEs, 61% (14/23) were predicted to be affected by sequence variants reported in the Breast Cancer Information Core database. Applying the filters described above increased the colocalization of predicted ESEs with missense changes, in-frame deletions and unclassified variants predicted to be deleterious to protein function, whereas they decreased the colocalization with known polymorphisms or unclassified variants predicted to be neutral. Conclusion In this report we show that evolutionary conservation analysis may be used to improve the specificity of an ESE prediction tool. This is the first report on the prediction of the frequency and distribution of ESEs in the BRCA1 gene, and it is the first reported attempt to predict which ESEs are most likely to be functional and therefore which sequence variants in ESEs are most likely to be pathogenic. PMID:16280041
Identification of cis-suppression of human disease mutations by comparative genomics.
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle; Cassa, Christopher A; Kurtzberg, Joanne; Davis, Erica E; Sunyaev, Shamil R; Katsanis, Nicholas
2015-08-13
Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.
Wu, Qiuli; Li, Yiping; Tang, Meng; Ye, Boping; Wang, Dayong
2012-01-01
With growing concerns of the safety of nanotechnology, the in vivo toxicity of nanoparticles (NPs) at environmental relevant concentrations has drawn increasing attentions. We investigated the possible molecular mechanisms of titanium nanoparticles (Ti-NPs) in the induction of toxicity at predicted environmental relevant concentrations. In nematodes, small sizes (4 nm and 10 nm) of TiO2-NPs induced more severe toxicities than large sizes (60 nm and 90 nm) of TiO2-NPs on animals using lethality, growth, reproduction, locomotion behavior, intestinal autofluorescence, and reactive oxygen species (ROS) production as endpoints. Locomotion behaviors could be significantly decreased by exposure to 4-nm and 10-nm TiO2-NPs at concentration of 1 ng/L in nematodes. Among genes required for the control of oxidative stress, only the expression patterns of sod-2 and sod-3 genes encoding Mn-SODs in animals exposed to small sizes of TiO2-NPs were significantly different from those in animals exposed to large sizes of TiO2-NPs. sod-2 and sod-3 gene expressions were closely correlated with lethality, growth, reproduction, locomotion behavior, intestinal autofluorescence, and ROS production in TiO2-NPs-exposed animals. Ectopically expression of human and nematode Mn-SODs genes effectively prevented the induction of ROS production and the development of toxicity of TiO2-NPs. Therefore, the altered expression patterns of Mn-SODs may explain the toxicity formation for different sizes of TiO2-NPs at predicted environmental relevant concentrations. In addition, we demonstrated here a strategy to investigate the toxicological effects of exposure to NPs upon humans by generating transgenic strains in nematodes for specific human genes. PMID:22973466
Klarenbeek, Alex; Mazouari, Khalil El; Desmyter, Aline; Blanchetot, Christophe; Hultberg, Anna; de Jonge, Natalie; Roovers, Rob C; Cambillau, Christian; Spinelli, Sylvia; Del-Favero, Jurgen; Verrips, Theo; de Haard, Hans J; Achour, Ikbel
2015-01-01
Camelid immunoglobulin variable (IGV) regions were found homologous to their human counterparts; however, the germline V repertoires of camelid heavy and light chains are still incomplete and their therapeutic potential is only beginning to be appreciated. We therefore leveraged the publicly available HTG and WGS databases of Lama pacos and Camelus ferus to retrieve the germline repertoire of V genes using human IGV genes as reference. In addition, we amplified IGKV and IGLV genes to uncover the V germline repertoire of Lama glama and sequenced BAC clones covering part of the Lama pacos IGK and IGL loci. Our in silico analysis showed that camelid counterparts of all human IGKV and IGLV families and most IGHV families could be identified, based on canonical structure and sequence homology. Interestingly, this sequence homology seemed largely restricted to the Ig V genes and was far less apparent in other genes: 6 therapeutically relevant target genes differed significantly from their human orthologs. This contributed to efficient immunization of llamas with the human proteins CD70, MET, interleukin (IL)-1β and IL-6, resulting in large panels of functional antibodies. The in silico predicted human-homologous canonical folds of camelid-derived antibodies were confirmed by X-ray crystallography solving the structure of 2 selected camelid anti-CD70 and anti-MET antibodies. These antibodies showed identical fold combinations as found in the corresponding human germline V families, yielding binding site structures closely similar to those occurring in human antibodies. In conclusion, our results indicate that active immunization of camelids can be a powerful therapeutic antibody platform. PMID:26018625
The Prehistory of Antibiotic Resistance.
Perry, Julie; Waglechner, Nicholas; Wright, Gerard
2016-06-01
Antibiotic resistance is a global problem that is reaching crisis levels. The global collection of resistance genes in clinical and environmental samples is the antibiotic "resistome," and is subject to the selective pressure of human activity. The origin of many modern resistance genes in pathogens is likely environmental bacteria, including antibiotic producing organisms that have existed for millennia. Recent work has uncovered resistance in ancient permafrost, isolated caves, and in human specimens preserved for hundreds of years. Together with bioinformatic analyses on modern-day sequences, these studies predict an ancient origin of resistance that long precedes the use of antibiotics in the clinic. Understanding the history of antibiotic resistance is important in predicting its future evolution. Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved.
Cross-organism learning method to discover new gene functionalities.
Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro
2016-04-01
Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Identification of HMX1 target genes: A predictive promoter model approach
Boulling, Arnaud; Wicht, Linda
2013-01-01
Purpose A homozygous mutation in the H6 family homeobox 1 (HMX1) gene is responsible for a new oculoauricular defect leading to eye and auricular developmental abnormalities as well as early retinal degeneration (MIM 612109). However, the HMX1 pathway remains poorly understood, and in the first approach to better understand the pathway’s function, we sought to identify the target genes. Methods We developed a predictive promoter model (PPM) approach using a comparative transcriptomic analysis in the retina at P15 of a mouse model lacking functional Hmx1 (dmbo mouse) and its respective wild-type. This PPM was based on the hypothesis that HMX1 binding site (HMX1-BS) clusters should be more represented in promoters of HMX1 target genes. The most differentially expressed genes in the microarray experiment that contained HMX1-BS clusters were used to generate the PPM, which was then statistically validated. Finally, we developed two genome-wide target prediction methods: one that focused on conserving PPM features in human and mouse and one that was based on the co-occurrence of HMX1-BS pairs fitting the PPM, in human or in mouse, independently. Results The PPM construction revealed that sarcoglycan, gamma (35kDa dystrophin-associated glycoprotein) (Sgcg), teashirt zinc finger homeobox 2 (Tshz2), and solute carrier family 6 (neurotransmitter transporter, glycine) (Slc6a9) genes represented Hmx1 targets in the mouse retina at P15. Moreover, the genome-wide target prediction revealed that mouse genes belonging to the retinal axon guidance pathway were targeted by Hmx1. Expression of these three genes was experimentally validated using a quantitative reverse transcription PCR approach. The inhibitory activity of Hmx1 on Sgcg, as well as protein tyrosine phosphatase, receptor type, O (Ptpro) and Sema3f, two targets identified by the PPM, were validated with luciferase assay. Conclusions Gene expression analysis between wild-type and dmbo mice allowed us to develop a PPM that identified the first target genes of Hmx1. PMID:23946633
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Chai Ling; Yew, Su Mei; Ngeow, Yun Fong
Background: Daldinia eschscholtzii is a wood-inhabiting fungus that causes wood decay under certain conditions. It has a broad host range and produces a large repertoire of potentially bioactive compounds. However, there is no extensive genome analysis on this fungal species. Results: Two fungal isolates (UM 1400 and UM 1020) from human specimens were identified as Daldinia eschscholtzii by morphological features and ITS-based phylogenetic analysis. Both genomes were similar in size with 10,822 predicted genes in UM 1400 (35.8 Mb) and 11,120 predicted genes in UM 1020 (35.5 Mb). A total of 751 gene families were shared among both UM isolates,more » including gene families associated with fungus-host interactions. In the CAZyme comparative analysis, both genomes were found to contain arrays of CAZyme related to plant cell wall degradation. Genes encoding secreted peptidases were found in the genomes, which encode for the peptidases involved in the degradation of structural proteins in plant cell wall. In addition, arrays of secondary metabolite backbone genes were identified in both genomes, indicating of their potential to produce bioactive secondary metabolites. Both genomes also contained an abundance of gene encoding signaling components, with three proposed MAPK cascades involved in cell wall integrity, osmoregulation, and mating/filamentation. Besides genomic evidence for degrading capability, both isolates also harbored an array of genes encoding stress response proteins that are potentially significant for adaptation to living in the hostile environments. In conclusion: Our genomic studies provide further information for the biological understanding of the D. eschscholtzii and suggest that these wood-decaying fungi are also equipped for adaptation to adverse environments in the human host.« less
Chan, Chai Ling; Yew, Su Mei; Ngeow, Yun Fong; ...
2015-11-18
Background: Daldinia eschscholtzii is a wood-inhabiting fungus that causes wood decay under certain conditions. It has a broad host range and produces a large repertoire of potentially bioactive compounds. However, there is no extensive genome analysis on this fungal species. Results: Two fungal isolates (UM 1400 and UM 1020) from human specimens were identified as Daldinia eschscholtzii by morphological features and ITS-based phylogenetic analysis. Both genomes were similar in size with 10,822 predicted genes in UM 1400 (35.8 Mb) and 11,120 predicted genes in UM 1020 (35.5 Mb). A total of 751 gene families were shared among both UM isolates,more » including gene families associated with fungus-host interactions. In the CAZyme comparative analysis, both genomes were found to contain arrays of CAZyme related to plant cell wall degradation. Genes encoding secreted peptidases were found in the genomes, which encode for the peptidases involved in the degradation of structural proteins in plant cell wall. In addition, arrays of secondary metabolite backbone genes were identified in both genomes, indicating of their potential to produce bioactive secondary metabolites. Both genomes also contained an abundance of gene encoding signaling components, with three proposed MAPK cascades involved in cell wall integrity, osmoregulation, and mating/filamentation. Besides genomic evidence for degrading capability, both isolates also harbored an array of genes encoding stress response proteins that are potentially significant for adaptation to living in the hostile environments. In conclusion: Our genomic studies provide further information for the biological understanding of the D. eschscholtzii and suggest that these wood-decaying fungi are also equipped for adaptation to adverse environments in the human host.« less
Loboda, Andrey; Nebozhyn, Michael; Klinghoffer, Rich; Frazier, Jason; Chastain, Michael; Arthur, William; Roberts, Brian; Zhang, Theresa; Chenard, Melissa; Haines, Brian; Andersen, Jannik; Nagashima, Kumiko; Paweletz, Cloud; Lynch, Bethany; Feldman, Igor; Dai, Hongyue; Huang, Pearl; Watters, James
2010-06-30
Hyperactivation of the Ras signaling pathway is a driver of many cancers, and RAS pathway activation can predict response to targeted therapies. Therefore, optimal methods for measuring Ras pathway activation are critical. The main focus of our work was to develop a gene expression signature that is predictive of RAS pathway dependence. We used the coherent expression of RAS pathway-related genes across multiple datasets to derive a RAS pathway gene expression signature and generate RAS pathway activation scores in pre-clinical cancer models and human tumors. We then related this signature to KRAS mutation status and drug response data in pre-clinical and clinical datasets. The RAS signature score is predictive of KRAS mutation status in lung tumors and cell lines with high (> 90%) sensitivity but relatively low (50%) specificity due to samples that have apparent RAS pathway activation in the absence of a KRAS mutation. In lung and breast cancer cell line panels, the RAS pathway signature score correlates with pMEK and pERK expression, and predicts resistance to AKT inhibition and sensitivity to MEK inhibition within both KRAS mutant and KRAS wild-type groups. The RAS pathway signature is upregulated in breast cancer cell lines that have acquired resistance to AKT inhibition, and is downregulated by inhibition of MEK. In lung cancer cell lines knockdown of KRAS using siRNA demonstrates that the RAS pathway signature is a better measure of dependence on RAS compared to KRAS mutation status. In human tumors, the RAS pathway signature is elevated in ER negative breast tumors and lung adenocarcinomas, and predicts resistance to cetuximab in metastatic colorectal cancer. These data demonstrate that the RAS pathway signature is superior to KRAS mutation status for the prediction of dependence on RAS signaling, can predict response to PI3K and RAS pathway inhibitors, and is likely to have the most clinical utility in lung and breast tumors.
STOPGAP: a database for systematic target opportunity assessment by genetic association predictions.
Shen, Judong; Song, Kijoung; Slater, Andrew J; Ferrero, Enrico; Nelson, Matthew R
2017-09-01
We developed the STOPGAP (Systematic Target OPportunity assessment by Genetic Association Predictions) database, an extensive catalog of human genetic associations mapped to effector gene candidates. STOPGAP draws on a variety of publicly available GWAS associations, linkage disequilibrium (LD) measures, functional genomic and variant annotation sources. Algorithms were developed to merge the association data, partition associations into non-overlapping LD clusters, map variants to genes and produce a variant-to-gene score used to rank the relative confidence among potential effector genes. This database can be used for a multitude of investigations into the genes and genetic mechanisms underlying inter-individual variation in human traits, as well as supporting drug discovery applications. Shell, R, Perl and Python scripts and STOPGAP R data files (version 2.5.1 at publication) are available at https://github.com/StatGenPRD/STOPGAP . Some of the most useful STOPGAP fields can be queried through an R Shiny web application at http://stopgapwebapp.com . matthew.r.nelson@gsk.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
PLAU inferred from a correlation network is critical for suppressor function of regulatory T cells
He, Feng; Chen, Hairong; Probst-Kepper, Michael; Geffers, Robert; Eifes, Serge; del Sol, Antonio; Schughart, Klaus; Zeng, An-Ping; Balling, Rudi
2012-01-01
Human FOXP3+CD25+CD4+ regulatory T cells (Tregs) are essential to the maintenance of immune homeostasis. Several genes are known to be important for murine Tregs, but for human Tregs the genes and underlying molecular networks controlling the suppressor function still largely remain unclear. Here, we describe a strategy to identify the key genes directly from an undirected correlation network which we reconstruct from a very high time-resolution (HTR) transcriptome during the activation of human Tregs/CD4+ T-effector cells. We show that a predicted top-ranked new key gene PLAU (the plasminogen activator urokinase) is important for the suppressor function of both human and murine Tregs. Further analysis unveils that PLAU is particularly important for memory Tregs and that PLAU mediates Treg suppressor function via STAT5 and ERK signaling pathways. Our study demonstrates the potential for identifying novel key genes for complex dynamic biological processes using a network strategy based on HTR data, and reveals a critical role for PLAU in Treg suppressor function. PMID:23169000
Nguyen, Quan; Lukowski, Samuel; Chiu, Han; Senabouth, Anne; Bruxner, Timothy; Christ, Angelika; Palpant, Nathan; Powell, Joseph
2018-05-11
Heterogeneity of cell states represented in pluripotent cultures have not been described at the transcriptional level. Since gene expression is highly heterogeneous between cells, single-cell RNA sequencing can be used to identify how individual pluripotent cells function. Here, we present results from the analysis of single-cell RNA sequencing data from 18,787 individual WTC CRISPRi human induced pluripotent stem cells. We developed an unsupervised clustering method, and through this identified four subpopulations distinguishable on the basis of their pluripotent state including: a core pluripotent population (48.3%), proliferative (47.8%), early-primed for differentiation (2.8%) and late-primed for differentiation (1.1%). For each subpopulation we were able to identify the genes and pathways that define differences in pluripotent cell states. Our method identified four discrete predictor gene sets comprised of 165 unique genes that denote the specific pluripotency states; and using these sets, we developed a multigenic machine learning prediction method to accurately classify single cells into each of the subpopulations. Compared against a set of established pluripotency markers, our method increases prediction accuracy by 10%, specificity by 20%, and explains a substantially larger proportion of deviance (up to 3-fold) from the prediction model. Finally, we developed an innovative method to predict cells transitioning between subpopulations, and support our conclusions with results from two orthogonal pseudotime trajectory methods. Published by Cold Spring Harbor Laboratory Press.
Using the methylome to identify aggressive Barrett’s esophagus — EDRN Public Portal
OVERALL STRATEGY: Our strategy will consist of using HumanMethylation450 arrays to identify methylation profiles and/or candidate methylated genes that distinguish BE from BE+LGD, BE+HGD and EAC (Aim 1). We will then assess whether these genes are predictive markers for aggressive BE (Aim 2)
Fang, H; Tong, W; Perkins, R; Shi, L; Hong, H; Cao, X; Xie, Q; Yim, SH; Ward, JM; Pitot, HC; Dragan, YP
2005-01-01
Background The completion of the sequencing of human, mouse and rat genomes and knowledge of cross-species gene homologies enables studies of differential gene expression in animal models. These types of studies have the potential to greatly enhance our understanding of diseases such as liver cancer in humans. Genes co-expressed across multiple species are most likely to have conserved functions. We have used various bioinformatics approaches to examine microarray expression profiles from liver neoplasms that arise in albumin-SV40 transgenic rats to elucidate genes, chromosome aberrations and pathways that might be associated with human liver cancer. Results In this study, we first identified 2223 differentially expressed genes by comparing gene expression profiles for two control, two adenoma and two carcinoma samples using an F-test. These genes were subsequently mapped to the rat chromosomes using a novel visualization tool, the Chromosome Plot. Using the same plot, we further mapped the significant genes to orthologous chromosomal locations in human and mouse. Many genes expressed in rat 1q that are amplified in rat liver cancer map to the human chromosomes 10, 11 and 19 and to the mouse chromosomes 7, 17 and 19, which have been implicated in studies of human and mouse liver cancer. Using Comparative Genomics Microarray Analysis (CGMA), we identified regions of potential aberrations in human. Lastly, a pathway analysis was conducted to predict altered human pathways based on statistical analysis and extrapolation from the rat data. All of the identified pathways have been known to be important in the etiology of human liver cancer, including cell cycle control, cell growth and differentiation, apoptosis, transcriptional regulation, and protein metabolism. Conclusion The study demonstrates that the hepatic gene expression profiles from the albumin-SV40 transgenic rat model revealed genes, pathways and chromosome alterations consistent with experimental and clinical research in human liver cancer. The bioinformatics tools presented in this paper are essential for cross species extrapolation and mapping of microarray data, its analysis and interpretation. PMID:16026603
Eronen, Lauri; Toivonen, Hannu
2012-06-06
Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Predictive and therapeutic markers in ovarian cancer
Gray, Joe W.; Guan, Yinghui; Kuo, Wen-Lin; Fridlyand, Jane; Mills, Gordon B.
2013-03-26
Cancer markers may be developed to detect diseases characterized by increased expression of apoptosis-suppressing genes, such as aggressive cancers. Genes in the human chromosomal regions, 8q24, 11q13, 20q11-q13, were found to be amplified indicating in vivo drug resistance in diseases such as ovarian cancer. Diagnosis and assessment of amplification levels certain genes shown to be amplified, including PVT1, can be useful in prediction of poor outcome of patient's response and drug resistance in ovarian cancer patients with low survival rates. Certain genes were found to be high priority therapeutic targets by the identification of recurrent aberrations involving genome sequence, copy number and/or gene expression are associated with reduced survival duration in certain diseases and cancers, specifically ovarian cancer. Therapeutics to inhibit amplification and inhibitors of one of these genes, PVT1, target drug resistance in ovarian cancer patients with low survival rates is described.
Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity
Saleheen, Danish; Natarajan, Pradeep; Armean, Irina M.; Zhao, Wei; Rasheed, Asif; Khetarpal, Sumeet; Won, Hong-Hee; Karczewski, Konrad J.; O’Donnell-Luria, Anne H.; Samocha, Kaitlin E.; Weisburd, Benjamin; Gupta, Namrata; Zaidi, Mozzam; Samuel, Maria; Imran, Atif; Abbas, Shahid; Majeed, Faisal; Ishaq, Madiha; Akhtar, Saba; Trindade, Kevin; Mucksavage, Megan; Qamar, Nadeem; Zaman, Khan Shah; Yaqoob, Zia; Saghir, Tahir; Rizvi, Syed Nadeem Hasan; Memon, Anis; Mallick, Nadeem Hayyat; Ishaq, Mohammad; Rasheed, Syed Zahed; Memon, Fazal-ur-Rehman; Mahmood, Khalid; Ahmed, Naveeduddin; Do, Ron; Krauss, Ronald M.; MacArthur, Daniel G.; Gabriel, Stacey; Lander, Eric S.; Daly, Mark J.; Frossard, Philippe; Danesh, John; Rader, Daniel J.; Kathiresan, Sekar
2017-01-01
A major goal of biomedicine is to understand the function of every gene in the human genome.1 Loss-of-function (LoF) mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such ‘human knockouts’ can provide insight into gene function. Consanguineous unions are more likely to result in offspring who carry LoF mutations in a homozygous state. In Pakistan, consanguinity rates are notably high.2 Here, we sequenced the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS) designed to understand the determinants of cardiometabolic diseases in South Asians.3 We identified individuals carrying predicted LoF (pLoF) mutations in the homozygous state, and performed phenotypic analysis involving >200 biochemical and disease traits. We enumerated 49,138 rare (<1 % minor allele frequency) pLoF mutations. These pLoF mutations are predicted to knock out 1,317 genes in at least one participant. Homozygosity for pLoF mutations at PLAG27 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signaling. Finally, APOC3 is a gene which retards clearance of plasma triglyceride-rich lipoproteins and where heterozygous deficiency confers protection against coronary heart disease.4,5 In Pakistan, we now observe APOC3 homozygous pLoF carriers; we recalled these knockout humans and challenged with an oral fat load. Compared with wild-type family members, APOC3 knockouts displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a ‘human knockout project’, a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans. PMID:28406212
Neighboring Genes Show Correlated Evolution in Gene Expression
Ghanbarian, Avazeh T.; Hurst, Laurence D.
2015-01-01
When considering the evolution of a gene’s expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking. PMID:25743543
Mouse homologues of human hereditary disease.
Searle, A G; Edwards, J H; Hall, J G
1994-01-01
Details are given of 214 loci known to be associated with human hereditary disease, which have been mapped on both human and mouse chromosomes. Forty two of these have pathological variants in both species; in general the mouse variants are similar in their effects to the corresponding human ones, but exceptions include the Dmd/DMD and Hprt/HPRT mutations which cause little, if any, harm in mice. Possible reasons for phenotypic differences are discussed. In most pathological variants the gene product seems to be absent or greatly reduced in both species. The extensive data on conserved segments between human and mouse chromosomes are used to predict locations in the mouse of over 50 loci of medical interest which are mapped so far only on human chromosomes. In about 80% of these a fairly confident prediction can be made. Some likely homologies between mapped mouse loci and unmapped human ones are also given. Sixty six human and mouse proto-oncogene and growth factor gene homologies are also listed; those of confirmed location are all in known conserved segments. A survey of 18 mapped human disease loci and chromosome regions in which the manifestation or severity of pathological effects is thought to be the result of genomic imprinting shows that most of the homologous regions in the mouse are also associated with imprinting, especially those with homologues on human chromosomes 11p and 15q. Useful methods of accelerating the production of mouse models of human hereditary disease include (1) use of a supermutagen, such as ethylnitrosourea (ENU), (2) targeted mutagenesis involving ES cells, and (3) use of gene transfer techniques, with production of 'knockout mutations'. PMID:8151633
Medical Sequencing at the extremes of Human Body Mass
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy
2006-09-01
Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome
Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen
2016-01-01
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814
General statistics of stochastic process of gene expression in eukaryotic cells.
Kuznetsov, V A; Knott, G D; Bonner, R F
2002-01-01
Thousands of genes are expressed at such very low levels (< or =1 copy per cell) that global gene expression analysis of rarer transcripts remains problematic. Ambiguity in identification of rarer transcripts creates considerable uncertainty in fundamental questions such as the total number of genes expressed in an organism and the biological significance of rarer transcripts. Knowing the distribution of the true number of genes expressed at each level and the corresponding gene expression level probability function (GELPF) could help resolve these uncertainties. We found that all observed large-scale gene expression data sets in yeast, mouse, and human cells follow a Pareto-like distribution model skewed by many low-abundance transcripts. A novel stochastic model of the gene expression process predicts the universality of the GELPF both across different cell types within a multicellular organism and across different organisms. This model allows us to predict the frequency distribution of all gene expression levels within a single cell and to estimate the number of expressed genes in a single cell and in a population of cells. A random "basal" transcription mechanism for protein-coding genes in all or almost all eukaryotic cell types is predicted. This fundamental mechanism might enhance the expression of rarely expressed genes and, thus, provide a basic level of phenotypic diversity, adaptability, and random monoallelic expression in cell populations. PMID:12136033
Comprehensive gene expression analysis of canine invasive urothelial bladder carcinoma by RNA-Seq.
Maeda, Shingo; Tomiyasu, Hirotaka; Tsuboi, Masaya; Inoue, Akiko; Ishihara, Genki; Uchikai, Takao; Chambers, James K; Uchida, Kazuyuki; Yonezawa, Tomohiro; Matsuki, Naoaki
2018-04-27
Invasive urothelial carcinoma (iUC) is a major cause of death in humans, and approximately 165,000 individuals succumb to this cancer annually worldwide. Comparative oncology using relevant animal models is necessary to improve our understanding of progression, diagnosis, and treatment of iUC. Companion canines are a preferred animal model of iUC due to spontaneous tumor development and similarity to human disease in terms of histopathology, metastatic behavior, and treatment response. However, the comprehensive molecular characterization of canine iUC is not well documented. In this study, we performed transcriptome analysis of tissue samples from canine iUC and normal bladders using an RNA sequencing (RNA-Seq) approach to identify key molecular pathways in canine iUC. Total RNA was extracted from bladder tissues of 11 dogs with iUC and five healthy dogs, and RNA-Seq was conducted. Ingenuity Pathway Analysis (IPA) was used to assign differentially expressed genes to known upstream regulators and functional networks. Differential gene expression analysis of the RNA-Seq data revealed 2531 differentially expressed genes, comprising 1007 upregulated and 1524 downregulated genes, in canine iUC. IPA revealed that the most activated upstream regulator was PTGER2 (encoding the prostaglandin E 2 receptor EP2), which is consistent with the therapeutic efficiency of cyclooxygenase inhibitors in canine iUC. Similar to human iUC, canine iUC exhibited upregulated ERBB2 and downregulated TP53 pathways. Biological functions associated with cancer, cell proliferation, and leukocyte migration were predicted to be activated, while muscle functions were predicted to be inhibited, indicating muscle-invasive tumor property. Our data confirmed similarities in gene expression patterns between canine and human iUC and identified potential therapeutic targets (PTGER2, ERBB2, CCND1, Vegf, and EGFR), suggesting the value of naturally occurring canine iUC as a relevant animal model for human iUC.
Broca's arrow: evolution, prediction, and language in the brain.
Cooper, David L
2006-01-01
Brodmann's areas 44 and 45 in the human brain, also known as Broca's area, have long been associated with language functions, especially in the left hemisphere. However, the precise role Broca's area plays in human language has not been established with certainty. Broca's area has homologs in the great apes and in area F5 in monkeys, which suggests that its original function was not linguistic at all. In fact, great ape and hominid brains show very similar left-over-right asymmetries in Broca's area homologs as well as in other areas, such as homologs to Wernicke's area, that are normally associated with language in modern humans. Moreover, the so-called mirror neurons are located in Broca's area in great apes and area F5 in monkeys, which seem to provide a representation of cause and effect in a primate's environment, particularly its social environment. Humans appear to have these mirror neurons in Broca's area as well. Similarly, genetic evidence related to the FOXP2 gene implicates Broca's area in linguistic function and dysfunction, but the gene itself is a highly conserved developmental gene in vertebrates and is shared with only two or three differences between humans and great apes, five between humans and mice, and eight between humans and songbirds. Taking neurons and portions of the brain as discrete computational segments in the sense of constituting specific Turing machines, this evidence points to a predictive motor and conceptual function for Broca's area in primates, especially for social concepts. In human language, this is consistent with evidence from typological and cognitive linguistics. (c) 2006 Wiley-Liss, Inc.
Exploring the Transcriptome of Ciliated Cells Using In Silico Dissection of Human Tissues
Ivliev, Alexander E.; 't Hoen, Peter A. C.; van Roon-Mom, Willeke M. C.; Peters, Dorien J. M.; Sergeeva, Marina G.
2012-01-01
Cilia are cell organelles that play important roles in cell motility, sensory and developmental functions and are involved in a range of human diseases, known as ciliopathies. Here, we search for novel human genes related to cilia using a strategy that exploits the previously reported tendency of cell type-specific genes to be coexpressed in the transcriptome of complex tissues. Gene coexpression networks were constructed using the noise-resistant WGCNA algorithm in 12 publicly available microarray datasets from human tissues rich in motile cilia: airways, fallopian tubes and brain. A cilia-related coexpression module was detected in 10 out of the 12 datasets. A consensus analysis of this module's gene composition recapitulated 297 known and predicted 74 novel cilia-related genes. 82% of the novel candidates were supported by tissue-specificity expression data from GEO and/or proteomic data from the Human Protein Atlas. The novel findings included a set of genes (DCDC2, DYX1C1, KIAA0319) related to a neurological disease dyslexia suggesting their potential involvement in ciliary functions. Furthermore, we searched for differences in gene composition of the ciliary module between the tissues. A multidrug-and-toxin extrusion transporter MATE2 (SLC47A2) was found as a brain-specific central gene in the ciliary module. We confirm the localization of MATE2 in cilia by immunofluorescence staining using MDCK cells as a model. While MATE2 has previously gained attention as a pharmacologically relevant transporter, its potential relation to cilia is suggested for the first time. Taken together, our large-scale analysis of gene coexpression networks identifies novel genes related to human cell cilia. PMID:22558177
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feder, J.N.; Jan, L.Y.; Jan, Y.N.
The Drosophila hairy gene encodes a basic helix- loop-helix protein that functions in at least two steps during Drosophila development: (1) during embryogenesis, when it partakes in the establishment of segments, and (2) during the larval stage, when it functions negatively in determining the pattern of sensory bristles on the adult fly. In the rat, a structurally homologous gene (RHL) behaves as an immediate-early gene in its response to growth factors and can, like that in Drosophila, suppress neuronal differentiation events. Here, the authors report the genomic cloning of the human hairy gene homolog (HRY). The coding region of themore » gene is contained within four exons. The predicted amino acid sequence reveals only four amino acid differences between the human and rat genes. Analysis of the DNA sequence 5[prime] to the coding region reveals a putatitve untranslated exon. To increase the value of the HRY gene as a genetic marker and to assess its potential involvement in genetic disorders, they sublocalized the locus to chromosome 3q28-q29 by fluorescence in situ hybridization. 34 refs., 4 figs., 1 tab.« less
2014-01-01
Background Alternative splicing is an important process in higher eukaryotes that allows obtaining several transcripts from one gene. A specific case of alternative splicing is mutually exclusive splicing, in which exactly one exon out of a cluster of neighbouring exons is spliced into the mature transcript. Recently, a new algorithm for the prediction of these exons has been developed based on the preconditions that the exons of the cluster have similar lengths, sequence homology, and conserved splice sites, and that they are translated in the same reading frame. Description In this contribution we introduce Kassiopeia, a database and web application for the generation, storage, and presentation of genome-wide analyses of mutually exclusive exomes. Currently, Kassiopeia provides access to the mutually exclusive exomes of twelve Drosophila species, the thale cress Arabidopsis thaliana, the flatworm Caenorhabditis elegans, and human. Mutually exclusive spliced exons (MXEs) were predicted based on gene reconstructions from Scipio. Based on the standard prediction values, with which 83.5% of the annotated MXEs of Drosophila melanogaster were reconstructed, the exomes contain surprisingly more MXEs than previously supposed and identified. The user can search Kassiopeia using BLAST or browse the genes of each species optionally adjusting the parameters used for the prediction to reveal more divergent or only very similar exon candidates. Conclusions We developed a pipeline to predict MXEs in the genomes of several model organisms and a web interface, Kassiopeia, for their visualization. For each gene Kassiopeia provides a comprehensive gene structure scheme, the sequences and predicted secondary structures of the MXEs, and, if available, further evidence for MXE candidates from cDNA/EST data, predictions of MXEs in homologous genes of closely related species, and RNA secondary structure predictions. Kassiopeia can be accessed at http://www.motorprotein.de/kassiopeia. PMID:24507667
Munfus, Delicia L; Haga, Christopher L; Burrows, Peter D; Cooper, Max D
2007-01-01
Background In mouse the cytokine interleukin-7 (IL-7) is required for generation of B lymphocytes, but human IL-7 does not appear to have this function. A bioinformatics approach was therefore used to identify IL-7 receptor related genes in the hope of identifying the elusive human cytokine. Results Our database search identified a family of nine gene candidates, which we have provisionally named fibronectin immunoglobulin leucine-rich repeat (FIGLER). The FIGLER 1–9 genes are predicted to encode type I transmembrane glycoproteins with 6–12 leucine-rich repeats (LRR), a C2 type Ig domain, a fibronectin type III domain, a hydrophobic transmembrane domain, and a cytoplasmic domain containing one to four tyrosine residues. Members of this multichromosomal gene family possess 20–47% overall amino acid identity and are differentially expressed in cell lines and primary hematopoietic lineage cells. Genes for FIGLER homologs were identified in macaque, orangutan, chimpanzee, mouse, rat, dog, chicken, toad, and puffer fish databases. The non-human FIGLER homologs share 38–99% overall amino acid identity with their human counterpart. Conclusion The extracellular domain structure and absence of recognizable cytoplasmic signaling motifs in members of the highly conserved FIGLER gene family suggest a trophic or cell adhesion function for these molecules. PMID:17854505
WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.
van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xi, T; Jones, I M; Mohrenweiser, H W
2003-11-03
Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less
Evolution of Siglec-11 and Siglec-16 Genes in Hominins
Wang, Xiaoxia; Mitra, Nivedita; Cruz, Pedro; Deng, Liwen; Varki, Nissi; Angata, Takashi; Green, Eric D.; Mullikin, Jim; Hayakawa, Toshiyuki; Varki, Ajit
2012-01-01
We previously reported a human-specific gene conversion of SIGLEC11 by an adjacent paralogous pseudogene (SIGLEC16P), generating a uniquely human form of the Siglec-11 protein, which is expressed in the human brain. Here, we show that Siglec-11 is expressed exclusively in microglia in all human brains studied—a finding of potential relevance to brain evolution, as microglia modulate neuronal survival, and Siglec-11 recruits SHP-1, a tyrosine phosphatase that modulates microglial biology. Following the recent finding of a functional SIGLEC16 allele in human populations, further analysis of the human SIGLEC11 and SIGLEC16/P sequences revealed an unusual series of gene conversion events between two loci. Two tandem and likely simultaneous gene conversions occurred from SIGLEC16P to SIGLEC11 with a potentially deleterious intervening short segment happening to be excluded. One of the conversion events also changed the 5′ untranslated sequence, altering predicted transcription factor binding sites. Both of the gene conversions have been dated to ∼1–1.2 Ma, after the emergence of the genus Homo, but prior to the emergence of the common ancestor of Denisovans and modern humans about 800,000 years ago, thus suggesting involvement in later stages of hominin brain evolution. In keeping with this, recombinant soluble Siglec-11 binds ligands in the human brain. We also address a second-round more recent gene conversion from SIGLEC11 to SIGLEC16, with the latter showing an allele frequency of ∼0.1–0.3 in a worldwide population study. Initial pseudogenization of SIGLEC16 was estimated to occur at least 3 Ma, which thus preceded the gene conversion of SIGLEC11 by SIGLEC16P. As gene conversion usually disrupts the converted gene, the fact that ORFs of hSIGLEC11 and hSIGLEC16 have been maintained after an unusual series of very complex gene conversion events suggests that these events may have been subject to hominin-specific selection forces. PMID:22383531
Sexy gene conversions: locating gene conversions on the X-chromosome.
Lawson, Mark J; Zhang, Liqing
2009-08-01
Gene conversion can have a profound impact on both the short- and long-term evolution of genes and genomes. Here, we examined the gene families that are located on the X-chromosomes of human (Homo sapiens), chimpanzee (Pan troglodytes), mouse (Mus musculus) and rat (Rattus norvegicus) for evidence of gene conversion. We identified seven gene families (WD repeat protein family, Ferritin Heavy Chain family, RAS-related Protein RAB-40 family, Diphosphoinositol polyphosphate phosphohydrolase family, Transcription Elongation Factor A family, LDOC1-related family, Zinc Finger Protein ZIC, and GLI family) that show evidence of gene conversion. Through phylogenetic analyses and synteny evidence, we show that gene conversion has played an important role in the evolution of these gene families and that gene conversion has occurred independently in both primates and rodents. Comparing the results with those of two gene conversion prediction programs (GENECONV and Partimatrix), we found that both GENECONV and Partimatrix have very high false negative rates (i.e. failed to predict gene conversions), which leads to many undetected gene conversions. The combination of phylogenetic analyses with physical synteny evidence exhibits high resolution in the detection of gene conversions.
Multivariate Cholesky models of human female fertility patterns in the NLSY.
Rodgers, Joseph Lee; Bard, David E; Miller, Warren B
2007-03-01
Substantial evidence now exists that variables measuring or correlated with human fertility outcomes have a heritable component. In this study, we define a series of age-sequenced fertility variables, and fit multivariate models to account for underlying shared genetic and environmental sources of variance. We make predictions based on a theory developed by Udry [(1996) Biosocial models of low-fertility societies. In: Casterline, JB, Lee RD, Foote KA (eds) Fertility in the United States: new patterns, new theories. The Population Council, New York] suggesting that biological/genetic motivations can be more easily realized and measured in settings in which fertility choices are available. Udry's theory, along with principles from molecular genetics and certain tenets of life history theory, allow us to make specific predictions about biometrical patterns across age. Consistent with predictions, our results suggest that there are different sources of genetic influence on fertility variance at early compared to later ages, but that there is only one source of shared environmental influence that occurs at early ages. These patterns are suggestive of the types of gene-gene and gene-environment interactions for which we must account to better understand individual differences in fertility outcomes.
Gene Function Analysis in the Ubiquitous Human Commensal and Pathogen Malassezia Genus
Ianiri, Giuseppe; Averette, Anna F.; Kingsbury, Joanne M.; Heitman, Joseph
2016-01-01
ABSTRACT The genus Malassezia includes 14 species that are found on the skin of humans and animals and are associated with a number of diseases. Recent genome sequencing projects have defined the gene content of all 14 species; however, to date, genetic manipulation has not been possible for any species within this genus. Here, we develop and then optimize molecular tools for the transformation of Malassezia furfur and Malassezia sympodialis using Agrobacterium tumefaciens delivery of transfer DNA (T-DNA) molecules. These T-DNAs can insert randomly into the genome. In the case of M. furfur, targeted gene replacements were also achieved via homologous recombination, enabling deletion of the ADE2 gene for purine biosynthesis and of the LAC2 gene predicted to be involved in melanin biosynthesis. Hence, the introduction of exogenous DNA and direct gene manipulation are feasible in Malassezia species. PMID:27899504
Grabowska, Dorota; Jablonska-Skwiecinska, Ewa; Plochocka, Danuta; Chelstowska, Anna; Lewandowska, Irmina; Witos, Iwona; Majewska, Zofia; Rokicka-Milewska, Roma; Burzynska, Beata
2004-01-01
Glucose-6-phosphate dehydrogenase (G6PD) deficiency is the most common human enzymopathy. Human G6PD gene is highly polymorphic, with over 130 mutations identified, many of which cause hemolytic anemia. We studied a novel point mutation in the G6PD gene 1226 C-->G, predicting the proline 409 to arginine substitution (G6PD Suwalki). We expressed the human wild-type and mutated G6PD gene in yeast Saccharomyces cerevisiae which allowed the characterization of the Suwalki variant. We showed that human wild-type, as well as the mutated (1226 C-->G) G6PD gene, functionally complemented the phenotype displayed by the yeast strain with disruption of the ZWF1 gene (homologue of the human G6PD gene). Comparison of wild-type (wt) human G6PD purified from yeast and from blood shows no significant differences in the Km values for G6P and in the utilization rate for the substrate analogue, 2-deoxyG6P. The P409R substitution leads to drastic changes in G6PD kinetics. The specific activity as well as stability of mutated G6PD is also significantly reduced. Besides this, the effect of this mutation was analyzed using a model of the tertiary structure of the human enzyme. The localization of the P409R mutation suggests that it may influence the stability of the whole protein by changing tetramer interactions and disturbing the binding of structural NADP+.
Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K
2010-01-01
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383
Genetic models of homosexuality: generating testable predictions
Gavrilets, Sergey; Rice, William R
2006-01-01
Homosexuality is a common occurrence in humans and other species, yet its genetic and evolutionary basis is poorly understood. Here, we formulate and study a series of simple mathematical models for the purpose of predicting empirical patterns that can be used to determine the form of selection that leads to polymorphism of genes influencing homosexuality. Specifically, we develop theory to make contrasting predictions about the genetic characteristics of genes influencing homosexuality including: (i) chromosomal location, (ii) dominance among segregating alleles and (iii) effect sizes that distinguish between the two major models for their polymorphism: the overdominance and sexual antagonism models. We conclude that the measurement of the genetic characteristics of quantitative trait loci (QTLs) found in genomic screens for genes influencing homosexuality can be highly informative in resolving the form of natural selection maintaining their polymorphism. PMID:17015344
WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning
Sutphin, George L.; Mahoney, J. Matthew; Sheppard, Keith; Walton, David O.; Korstanje, Ron
2016-01-01
The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/. PMID:27812085
WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning.
Sutphin, George L; Mahoney, J Matthew; Sheppard, Keith; Walton, David O; Korstanje, Ron
2016-11-01
The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.
Liu, Yunxian; Hilakivi-Clarke, Leena; Zhang, Yukun; Wang, Xiao; Pan, Yuan-Xiang; Xuan, Jianhua; Fleck, Stefanie C; Doerge, Daniel R; Helferich, William G
2015-08-01
Soy flour diet (MS) prevented isoflavones from stimulating MCF-7 tumor growth in athymic nude mice, indicating that other bioactive compounds in soy can negate the estrogenic properties of isoflavones. The underlying signal transduction pathways to explain the protective effects of soy flour consumption were studied here. Ovariectomized athymic nude mice inoculated with MCF-7 human breast cancer cells were fed either Soy flour diet (MS) or purified isoflavone mix diet (MI), both with equivalent amounts of genistein. Positive controls received estradiol pellets and negative controls received sham pellets. GeneChip Human Genome U133 Plus 2.0 Array platform was used to evaluate gene expressions, and results were analyzed using bioinformatics approaches. Tumors in MS-fed mice exhibited higher expression of tumor growth suppressing genes ATP2A3 and BLNK and lower expression of oncogene MYC. Tumors in MI-fed mice expressed a higher level of oncogene MYB and a lower level of MHC-I and MHC-II, allowing tumor cells to escape immunosurveillance. MS-induced gene expression alterations were predictive of prolonged survival among estrogen-receptor-positive breast cancer patients, whilst MI-induced gene changes were predictive of shortened survival. Our findings suggest that dietary soy flour affects gene expression differently than purified isoflavones, which may explain why soy foods prevent isoflavones-induced stimulation of MCF-7 tumor growth in athymic nude mice. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kashuk, Carl S.; Stone, Eric A.; Grice, Elizabeth A.; Portnoy, Matthew E.; Green, Eric D.; Sidow, Arend; Chakravarti, Aravinda; McCallion, Andrew S.
2005-01-01
The ability to discriminate between deleterious and neutral amino acid substitutions in the genes of patients remains a significant challenge in human genetics. The increasing availability of genomic sequence data from multiple vertebrate species allows inclusion of sequence conservation and physicochemical properties of residues to be used for functional prediction. In this study, the RET receptor tyrosine kinase serves as a model disease gene in which a broad spectrum (≥116) of disease-associated mutations has been identified among patients with Hirschsprung disease and multiple endocrine neoplasia type 2. We report the alignment of the human RET protein sequence with the orthologous sequences of 12 non-human vertebrates (eight mammalian, one avian, and three teleost species), their comparative analysis, the evolutionary topology of the RET protein, and predicted tolerance for all published missense mutations. We show that, although evolutionary conservation alone provides significant information to predict the effect of a RET mutation, a model that combines comparative sequence data with analysis of physiochemical properties in a quantitative framework provides far greater accuracy. Although the ability to discern the impact of a mutation is imperfect, our analyses permit substantial discrimination between predicted functional classes of RET mutations and disease severity even for a multigenic disease such as Hirschsprung disease. PMID:15956201
Pedra, Joao H F; Brandt, Amanda; Li, Hong-Mei; Westerman, Rick; Romero-Severson, Jeanne; Pollack, Richard J; Murdock, Larry L; Pittendrigh, Barry R
2003-11-01
Genomics information relating to human body lice is surprisingly scarce, and this has constrained studies of their physiology, immunology and vector biology. To identify novel body louse genes, we used engorged adult lice to generate a cDNA library. Initially, 1152 clones were screened for inserts, edited for removal of vector sequences and base pairs of poor quality, and viewed for splicing variations, gene families and polymorphism. Computational methods identified 506 inferred open reading frames including the first predicted louse defensin. The inferred defensin aligns well with other insect defensins and has highly conserved cysteine residues, as are known for other defensin sequences. Two cysteine and five serine proteinases were categorized according to their inferred catalytic sites. We also discovered seven putative ubiquitin-pathway genes and four iron metabolizing deduced enzymes. Finally, glutathione-S-transferases and cytochrome P450 genes were among the detoxification enzymes found. Results from this first systematic effort to discover human body louse genes should promote further studies in Phthiraptera and lice.
Cross-study projections of genomic biomarkers: an evaluation in cancer genomics.
Lucas, Joseph E; Carvalho, Carlos M; Chen, Julia Ling-Yu; Chi, Jen-Tsan; West, Mike
2009-01-01
Human disease studies using DNA microarrays in both clinical/observational and experimental/controlled studies are having increasing impact on our understanding of the complexity of human diseases. A fundamental concept is the use of gene expression as a "common currency" that links the results of in vitro controlled experiments to in vivo observational human studies. Many studies--in cancer and other diseases--have shown promise in using in vitro cell manipulations to improve understanding of in vivo biology, but experiments often simply fail to reflect the enormous phenotypic variation seen in human diseases. We address this with a framework and methods to dissect, enhance and extend the in vivo utility of in vitro derived gene expression signatures. From an experimentally defined gene expression signature we use statistical factor analysis to generate multiple quantitative factors in human cancer gene expression data. These factors retain their relationship to the original, one-dimensional in vitro signature but better describe the diversity of in vivo biology. In a breast cancer analysis, we show that factors can reflect fundamentally different biological processes linked to molecular and clinical features of human cancers, and that in combination they can improve prediction of clinical outcomes.
Selot, Ruchita; Arumugam, Sathyathithan; Mary, Bertin; Cheemadan, Sabna; Jayandharan, Giridhara R.
2017-01-01
Of the 12 common serotypes used for gene delivery applications, Adeno-associated virus (AAV)rh.10 serotype has shown sustained hepatic transduction and has the lowest seropositivity in humans. We have evaluated if further modifications to AAVrh.10 at its phosphodegron like regions or predicted immunogenic epitopes could improve its hepatic gene transfer and immune evasion potential. Mutant AAVrh.10 vectors were generated by site directed mutagenesis of the predicted targets. These mutant vectors were first tested for their transduction efficiency in HeLa and HEK293T cells. The optimal vector was further evaluated for their cellular uptake, entry, and intracellular trafficking by quantitative PCR and time-lapse confocal microscopy. To evaluate their potential during hepatic gene therapy, C57BL/6 mice were administered with wild-type or optimal mutant AAVrh.10 and the luciferase transgene expression was documented by serial bioluminescence imaging at 14, 30, 45, and 72 days post-gene transfer. Their hepatic transduction was further verified by a quantitative PCR analysis of AAV copy number in the liver tissue. The optimal AAVrh.10 vector was further evaluated for their immune escape potential, in animals pre-immunized with human intravenous immunoglobulin. Our results demonstrate that a modified AAVrh.10 S671A vector had enhanced cellular entry (3.6 fold), migrate rapidly to the perinuclear region (1 vs. >2 h for wild type vectors) in vitro, which further translates to modest increase in hepatic gene transfer efficiency in vivo. More importantly, the mutant AAVrh.10 vector was able to partially evade neutralizing antibodies (~27–64 fold) in pre-immunized animals. The development of an AAV vector system that can escape the circulating neutralizing antibodies in the host will substantially widen the scope of gene therapy applications in humans. PMID:28769791
El-Mogharbel, Nisrine; Wakefield, Matthew; Deakin, Janine E; Tsend-Ayush, Enkhjargal; Grützner, Frank; Alsop, Amber; Ezaz, Tariq; Marshall Graves, Jennifer A
2007-01-01
We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.
Cell-specific prediction and application of drug-induced gene expression profiles.
Hodos, Rachel; Zhang, Ping; Lee, Hao-Chih; Duan, Qiaonan; Wang, Zichen; Clark, Neil R; Ma'ayan, Avi; Wang, Fei; Kidd, Brian; Hu, Jianying; Sontag, David; Dudley, Joel
2018-01-01
Gene expression profiling of in vitro drug perturbations is useful for many biomedical discovery applications including drug repurposing and elucidation of drug mechanisms. However, limited data availability across cell types has hindered our capacity to leverage or explore the cell-specificity of these perturbations. While recent efforts have generated a large number of drug perturbation profiles across a variety of human cell types, many gaps remain in this combinatorial drug-cell space. Hence, we asked whether it is possible to fill these gaps by predicting cell-specific drug perturbation profiles using available expression data from related conditions--i.e. from other drugs and cell types. We developed a computational framework that first arranges existing profiles into a three-dimensional array (or tensor) indexed by drugs, genes, and cell types, and then uses either local (nearest-neighbors) or global (tensor completion) information to predict unmeasured profiles. We evaluate prediction accuracy using a variety of metrics, and find that the two methods have complementary performance, each superior in different regions in the drug-cell space. Predictions achieve correlations of 0.68 with true values, and maintain accurate differentially expressed genes (AUC 0.81). Finally, we demonstrate that the predicted profiles add value for making downstream associations with drug targets and therapeutic classes.
Cell-specific prediction and application of drug-induced gene expression profiles
Hodos, Rachel; Zhang, Ping; Lee, Hao-Chih; Duan, Qiaonan; Wang, Zichen; Clark, Neil R.; Ma'ayan, Avi; Wang, Fei; Kidd, Brian; Hu, Jianying; Sontag, David
2017-01-01
Gene expression profiling of in vitro drug perturbations is useful for many biomedical discovery applications including drug repurposing and elucidation of drug mechanisms. However, limited data availability across cell types has hindered our capacity to leverage or explore the cell-specificity of these perturbations. While recent efforts have generated a large number of drug perturbation profiles across a variety of human cell types, many gaps remain in this combinatorial drug-cell space. Hence, we asked whether it is possible to fill these gaps by predicting cell-specific drug perturbation profiles using available expression data from related conditions--i.e. from other drugs and cell types. We developed a computational framework that first arranges existing profiles into a three-dimensional array (or tensor) indexed by drugs, genes, and cell types, and then uses either local (nearest-neighbors) or global (tensor completion) information to predict unmeasured profiles. We evaluate prediction accuracy using a variety of metrics, and find that the two methods have complementary performance, each superior in different regions in the drug-cell space. Predictions achieve correlations of 0.68 with true values, and maintain accurate differentially expressed genes (AUC 0.81). Finally, we demonstrate that the predicted profiles add value for making downstream associations with drug targets and therapeutic classes. PMID:29218867
Identification of cis-suppression of human disease mutations by comparative genomics
Jordan, Daniel M.; Frangakis, Stephan G.; Golzio, Christelle; Cassa, Christopher A.; Kurtzberg, Joanne; Davis, Erica E.; Sunyaev, Shamil R.; Katsanis, Nicholas
2015-01-01
Patterns of amino acid conservation have served as a tool for understanding protein evolution1. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients2. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes3,4 revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity5,6. PMID:26123021
Satora, Leszek
2005-01-01
The application of an evolutionary perspective to human behaviour generates philosophical, political and scientific controversy. Modern human symbolic consciousness is not the cumulation of the long trend that natural selection would predict. The new archaeological data suggested the anatomical and behavioural innovation has been episodic and rare separated by long periods of stagnate. New behavioural mode and the new skeletal structure of modem human arose as an incidental exaptation. Additionally the genetic basis dysfunction connected with suicide behaviour and growing statistic suicide among teenager is contradictory to the theory that our behaviour are programmed in any detail by selfish genes. In this cases genetically determined suicidal behaviour should be rapidly eliminated by natural selection.
Arkusz, Joanna; Stępnik, Maciej; Sobala, Wojciech; Dastych, Jarosław
2010-11-10
The aim of this study was to find differentially regulated genes in THP-1 monocytic cells exposed to sensitizers and nonsensitizers and to investigate if such genes could be reliable markers for an in vitro predictive method for the identification of skin sensitizing chemicals. Changes in expression of 35 genes in the THP-1 cell line following treatment with chemicals of different sensitizing potential (from nonsensitizers to extreme sensitizers) were assessed using real-time PCR. Verification of 13 candidate genes by testing a large number of chemicals (an additional 22 sensitizers and 8 nonsensitizers) revealed that prediction of contact sensitization potential was possible based on evaluation of changes in three genes: IL8, HMOX1 and PAIMP1. In total, changes in expression of these genes allowed correct detection of sensitization potential of 21 out of 27 (78%) test sensitizers. The gene expression levels inside potency groups varied and did not allow estimation of sensitization potency of test chemicals. Results of this study indicate that evaluation of changes in expression of proposed biomarkers in THP-1 cells could be a valuable model for preliminary screening of chemicals to discriminate an appreciable majority of sensitizers from nonsensitizers. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Neighboring Genes Show Correlated Evolution in Gene Expression.
Ghanbarian, Avazeh T; Hurst, Laurence D
2015-07-01
When considering the evolution of a gene's expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Singh, Gulshan; Vajpayee, Poornima; Rani, Neetika; Amoah, Isaac Dennis; Stenström, Thor Axel; Shanker, Rishi
2016-08-15
The emergence of antimicrobial resistant bacteria is an important public health and environmental contamination issue. Antimicrobials of β-lactam group accounts for approximately two thirds, by weight, of all antimicrobials administered to humans due to high clinical efficacy and low toxicity. This study explores β-lactam resistance determinant gene (blaTEM) as emerging contaminant in Indo-Gangetic region using qPCR in molecular beacon format. Quantitative Microbial Risk Assessment (QMRA) approach was adopted to predict risk to human health associated with consumption/exposure of surface water, potable water and street foods contaminated with bacteria having blaTEM gene. It was observed that surface water and sediments of the river Ganga and Gomti showed high numbers of blaTEM gene copies and varied significantly (p<0.05) among the sampling locations. The potable water collected from drinking water facility and clinical settings exhibit significant number of blaTEM gene copies (13±0.44-10200±316 gene copies/100mL). It was observed that E.crassipes among aquatic flora encountered in both the rivers had high load of blaTEM gene copies. The information on prevalence of environmental reservoirs of blaTEM gene containing bacteria in Indo-Gangetic region and risk associated will be useful for formulating strategies to protect public from menace of clinical risks linked with antimicrobial resistant bacteria. Copyright © 2016 Elsevier B.V. All rights reserved.
Revealing Alzheimer's disease genes spectrum in the whole-genome by machine learning.
Huang, Xiaoyan; Liu, Hankui; Li, Xinming; Guan, Liping; Li, Jiankang; Tellier, Laurent Christian Asker M; Yang, Huanming; Wang, Jian; Zhang, Jianguo
2018-01-10
Alzheimer's disease (AD) is an important, progressive neurodegenerative disease, with a complex genetic architecture. A key goal of biomedical research is to seek out disease risk genes, and to elucidate the function of these risk genes in the development of disease. For this purpose, expanding the AD-associated gene set is necessary. In past research, the prediction methods for AD related genes has been limited in their exploration of the target genome regions. We here present a genome-wide method for AD candidate genes predictions. We present a machine learning approach (SVM), based upon integrating gene expression data with human brain-specific gene network data, to discover the full spectrum of AD genes across the whole genome. We classified AD candidate genes with an accuracy and the area under the receiver operating characteristic (ROC) curve of 84.56% and 94%. Our approach provides a supplement for the spectrum of AD-associated genes extracted from more than 20,000 genes in a genome wide scale. In this study, we have elucidated the whole-genome spectrum of AD, using a machine learning approach. Through this method, we expect for the candidate gene catalogue to provide a more comprehensive annotation of AD for researchers.
Tan, Wei; Dean, Michael; Law, Amanda J.
2010-01-01
ErbB4 is a growth factor receptor tyrosine kinase essential for neurodevelopment. Genetic variation in ErbB4 is associated with schizophrenia and risk-associated polymorphisms predict overexpression of ErbB4 CYT-1 isoforms in the brain in the disorder. The molecular mechanism of association is unclear because the polymorphisms flank exon 3 of the gene and reside 700 kb distal to the CYT-1 defining exon. We hypothesized that the polymorphisms are indirectly associated with ErbB4 CYT-1 via splicing of exon 3 on the CYT-1 background. We report via cloning and sequencing of adult and fetal human brain cDNA libraries the identification of novel splice isoforms of ErbB4, whereby exon 3 is skipped (del.3). ErbB4 del.3 transcripts exist as CYT-2 isoforms and are predicted to produce truncated proteins. Furthermore, our data refine the structure of the human ErbB4 gene, clarify that juxtamembrane (JM) splice variants of ErbB4, JM-a and JM-b respectively, are characterized by the replacement of a 75 nucleotide (nt) sequence with a 45-nt insertion, and demonstrate that there are four alternative exons in the gene. Our analyses reveal that novel splice variants of ErbB4 exist in the developing and adult human brain and, given the failure to identify ErbB4 del.3 CYT-1 transcripts, suggest that the association of risk polymorphisms in the ErbB4 gene with CYT-1 transcript levels is not mediated via an exon 3 splicing event. PMID:20886074
Sarraf, Matthew Alexandar; Woodley Of Menie, Michael Anthony
2017-01-01
This commentary article offers new perspective on recent research investigating the behavioral and social ecological effects of a mutation related to autism spectrum disorders in mice. The authors explain the consistency of this research on mice with predictions advanced by a theory of the role of mutations in altering interorganismal gene-gene interactions (social epistasis) in social species including humans, known as the social epistasis amplification model. The potential significance of the mouse research for understanding contemporary human behavioral trends is explored.
GARAULET, MARTA; ORDOVÁS, JOSÉ M.; GÓMEZ-ABELLÁN, PURIFICACIÓN; MARTÍNEZ, JOSE A.; MADRID, JUAN A.
2015-01-01
Although it is well established that human adipose tissue (AT) shows circadian rhythmicity, published studies have been discussed as if tissues or systems showed only one or few circadian rhythms at a time. To provide an overall view of the internal temporal order of circadian rhythms in human AT including genes implicated in metabolic processes such as energy intake and expenditure, insulin resistance, adipocyte differentiation, dyslipidemia, and body fat distribution. Visceral and subcutaneous abdominal AT biopsies (n = 6) were obtained from morbid obese women (BMI ≥ 40 kg/m2). To investigate rhythmic expression pattern, AT explants were cultured during 24-h and gene expression was analyzed at the following times: 08:00, 14:00, 20:00, 02:00 h using quantitative real-time PCR. Clock genes, glucocorticoid metabolism-related genes, leptin, adiponectin and their receptors were studied. Significant differences were found both in achrophases and relative-amplitude among genes (P <0.05). Amplitude of most genes rhythms was high (>30%). When interpreting the phase map of gene expression in both depots, data indicated that circadian rhythmicity of the genes studied followed a predictable physiological pattern, particularly for subcutaneous AT. Interesting are the relationships between adiponectin, leptin, and glucocorticoid metabolism-related genes circadian profiles. Their metabolic significance is discussed. Visceral AT behaved in a different way than subcutaneous for most of the genes studied. For every gene, protein mRNA levels fluctuated during the day in synchrony with its receptors. We have provided an overall view of the internal temporal order of circadian rhythms in human adipose tissue. PMID:21520059
Ion channel gene expression predicts survival in glioma patients
Wang, Rong; Gurguis, Christopher I.; Gu, Wanjun; Ko, Eun A; Lim, Inja; Bang, Hyoweon; Zhou, Tong; Ko, Jae-Hong
2015-01-01
Ion channels are important regulators in cell proliferation, migration, and apoptosis. The malfunction and/or aberrant expression of ion channels may disrupt these important biological processes and influence cancer progression. In this study, we investigate the expression pattern of ion channel genes in glioma. We designate 18 ion channel genes that are differentially expressed in high-grade glioma as a prognostic molecular signature. This ion channel gene expression based signature predicts glioma outcome in three independent validation cohorts. Interestingly, 16 of these 18 genes were down-regulated in high-grade glioma. This signature is independent of traditional clinical, molecular, and histological factors. Resampling tests indicate that the prognostic power of the signature outperforms random gene sets selected from human genome in all the validation cohorts. More importantly, this signature performs better than the random gene signatures selected from glioma-associated genes in two out of three validation datasets. This study implicates ion channels in brain cancer, thus expanding on knowledge of their roles in other cancers. Individualized profiling of ion channel gene expression serves as a superior and independent prognostic tool for glioma patients. PMID:26235283
An approach for reduction of false predictions in reverse engineering of gene regulatory networks.
Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar
2018-05-14
A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives. Copyright © 2018 Elsevier Ltd. All rights reserved.
Upadhyay, Atul Kumar; Sowdhamini, Ramanathan
2016-01-01
3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
Megchelenbrink, Wout; Katzir, Rotem; Lu, Xiaowen; Ruppin, Eytan; Notebaart, Richard A
2015-09-29
Synthetic dosage lethality (SDL) denotes a genetic interaction between two genes whereby the underexpression of gene A combined with the overexpression of gene B is lethal. SDLs offer a promising way to kill cancer cells by inhibiting the activity of SDL partners of activated oncogenes in tumors, which are often difficult to target directly. As experimental genome-wide SDL screens are still scarce, here we introduce a network-level computational modeling framework that quantitatively predicts human SDLs in metabolism. For each enzyme pair (A, B) we systematically knock out the flux through A combined with a stepwise flux increase through B and search for pairs that reduce cellular growth more than when either enzyme is perturbed individually. The predictive signal of the emerging network of 12,000 SDLs is demonstrated in five different ways. (i) It can be successfully used to predict gene essentiality in shRNA cancer cell line screens. Moving to clinical tumors, we show that (ii) SDLs are significantly underrepresented in tumors. Furthermore, breast cancer tumors with SDLs active (iii) have smaller sizes and (iv) result in increased patient survival, indicating that activation of SDLs increases cancer vulnerability. Finally, (v) patient survival improves when multiple SDLs are present, pointing to a cumulative effect. This study lays the basis for quantitative identification of cancer SDLs in a model-based mechanistic manner. The approach presented can be used to identify SDLs in species and cell types in which "omics" data necessary for data-driven identification are missing.
A VEGF-dependent gene signature enriched in mesenchymal ovarian cancer predicts patient prognosis.
Yin, Xia; Wang, Xiaojie; Shen, Boqiang; Jing, Ying; Li, Qing; Cai, Mei-Chun; Gu, Zhuowei; Yang, Qi; Zhang, Zhenfeng; Liu, Jin; Li, Hongxia; Di, Wen; Zhuang, Guanglei
2016-08-08
We have previously reported surrogate biomarkers of VEGF pathway activities with the potential to provide predictive information for anti-VEGF therapies. The aim of this study was to systematically evaluate a new VEGF-dependent gene signature (VDGs) in relation to molecular subtypes of ovarian cancer and patient prognosis. Using microarray profiling and cross-species analysis, we identified 140-gene mouse VDGs and corresponding 139-gene human VDGs, which displayed enrichment of vasculature and basement membrane genes. In patients who received bevacizumab therapy and showed partial response, the expressions of VDGs (summarized to yield VDGs scores) were markedly decreased in post-treatment biopsies compared with pre-treatment baselines. In contrast, VDGs scores were not significantly altered following bevacizumab treatment in patients with stable or progressive disease. Analysis of VDGs in ovarian cancer showed that VDGs as a prognostic signature was able to predict patient outcome. Correlation estimation of VDGs scores and molecular features revealed that VDGs was overrepresented in mesenchymal subtype and BRCA mutation carriers. These findings highlighted the prognostic role of VEGF-mediated angiogenesis in ovarian cancer, and proposed a VEGF-dependent gene signature as a molecular basis for developing novel diagnostic strategies to aid patient selection for VEGF-targeted agents.
Adebali, Ogun; Reznik, Alexander O.; Ory, Daniel S.; ...
2016-02-18
Here, predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. Methods: We identified major events inmore » NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism s fitness. As a result, removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. In conclusion, the results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adebali, Ogun; Reznik, Alexander O.; Ory, Daniel S.
Here, predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. Methods: We identified major events inmore » NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism s fitness. As a result, removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. In conclusion, the results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well.« less
Takayama, Kazuo; Morisaki, Yuta; Kuno, Shuichi; Nagamoto, Yasuhito; Harada, Kazuo; Furukawa, Norihisa; Ohtaka, Manami; Nishimura, Ken; Imagawa, Kazuo; Sakurai, Fuminori; Tachibana, Masashi; Sumazaki, Ryo; Noguchi, Emiko; Nakanishi, Mahito; Hirata, Kazumasa; Kawabata, Kenji; Mizuguchi, Hiroyuki
2014-11-25
Interindividual differences in hepatic metabolism, which are mainly due to genetic polymorphism in its gene, have a large influence on individual drug efficacy and adverse reaction. Hepatocyte-like cells (HLCs) differentiated from human induced pluripotent stem (iPS) cells have the potential to predict interindividual differences in drug metabolism capacity and drug response. However, it remains uncertain whether human iPSC-derived HLCs can reproduce the interindividual difference in hepatic metabolism and drug response. We found that cytochrome P450 (CYP) metabolism capacity and drug responsiveness of the primary human hepatocytes (PHH)-iPS-HLCs were highly correlated with those of PHHs, suggesting that the PHH-iPS-HLCs retained donor-specific CYP metabolism capacity and drug responsiveness. We also demonstrated that the interindividual differences, which are due to the diversity of individual SNPs in the CYP gene, could also be reproduced in PHH-iPS-HLCs. We succeeded in establishing, to our knowledge, the first PHH-iPS-HLC panel that reflects the interindividual differences of hepatic drug-metabolizing capacity and drug responsiveness.
L1000CDS2: LINCS L1000 characteristic direction signatures search engine.
Duan, Qiaonan; Reid, St Patrick; Clark, Neil R; Wang, Zichen; Fernandez, Nicolas F; Rouillard, Andrew D; Readhead, Ben; Tritsch, Sarah R; Hodos, Rachel; Hafner, Marc; Niepel, Mario; Sorger, Peter K; Dudley, Joel T; Bavari, Sina; Panchal, Rekha G; Ma'ayan, Avi
2016-01-01
The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS 2 . The L1000CDS 2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS 2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS 2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS 2 , we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS 2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS 2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.
Gazestani, Vahid H; Salavati, Reza
2015-01-01
Trypanosoma brucei is a vector-borne parasite with intricate life cycle that can cause serious diseases in humans and animals. This pathogen relies on fine regulation of gene expression to respond and adapt to variable environments, with implications in transmission and infectivity. However, the involved regulatory elements and their mechanisms of actions are largely unknown. Here, benefiting from a new graph-based approach for finding functional regulatory elements in RNA (GRAFFER), we have predicted 88 new RNA regulatory elements that are potentially involved in the gene regulatory network of T. brucei. We show that many of these newly predicted elements are responsive to both transcriptomic and proteomic changes during the life cycle of the parasite. Moreover, we found that 11 of predicted elements strikingly resemble previously identified regulatory elements for the parasite. Additionally, comparison with previously predicted motifs on T. brucei suggested the superior performance of our approach based on the current limited knowledge of regulatory elements in T. brucei.
Characterization of cDNAs and genomic DNAs for human threonyl- and cysteinyl-tRNA synthetases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cruzen, M.E.
1993-01-01
Techniques of molecular biology were used to clone, sequence and map two human aminoacyl-tRNA synthetase (aaRS) cDNAs: threonyl-tRNA synthetase (ThrRS) a class II enzyme and cysteinyl-tRNA synthetase (CysRS) a class I enzyme. The predicted protein sequence of human ThrRS is highly homologous to that of lower eukaryotic and prokaryotic ThRSs, particularly in the regions containing the three structural motifs common to all class II synthetases. Signature regions 1 and 2, which characterize the class IIa subgroup (SerRS, ThrRS and HisRS) are highly conserved from bacteria to human. Structural predictions for human ThrRS based on the known structure of the closelymore » related SerRS from E.coli implicate strongly conserved residues in the signature sequences to be important in substrate binding. The amino terminal 100 residues of the deduced amino acid sequence of ThrRS shares structural similarity to SerRS consistent with forming an antiparallel helix implicated in tRNA binding. The 5' untranslated sequence of the human ThrRS gene shares short stretches of common sequence with the gene for hamster HisRS including a binding site for the promoter specific transcription factor sp-1. The deduced amino acid sequence of human CysRS has a high degree of sequence identify to E. coli CysRS. Human CysRS possesses the classic characteristics of a class I synthetase and is most closely related to the MetRS subgroup. The amino terminal half of human CysRS can be modeled as a nucleotide binding fold and shares significant sequence and structural similarity to the other enzymes in this subgroup. The CysRS structural gene (CARS) was mapped to human chromosome 11p15.5 by fluorescent in situ hybridization. CARS is the first aaRS gene to be mapped to chromosome 11. The steady state of both CysRS and ThrRs mRNA were quantitated in several human tissues. Message levels for these enzymes appear to be subjected to differential regulation in different cell types.« less
An extended set of yeast-based functional assays accurately identifies human disease mutations
Sun, Song; Yang, Fan; Tan, Guihong; Costanzo, Michael; Oughtred, Rose; Hirschman, Jodi; Theesfeld, Chandra L.; Bansal, Pritpal; Sahni, Nidhi; Yi, Song; Yu, Analyn; Tyagi, Tanya; Tie, Cathy; Hill, David E.; Vidal, Marc; Andrews, Brenda J.; Boone, Charles; Dolinski, Kara; Roth, Frederick P.
2016-01-01
We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods. PMID:26975778
The structure of the human interferon alpha/beta receptor gene.
Lutfalla, G; Gardiner, K; Proudhon, D; Vielh, E; Uzé, G
1992-02-05
Using the cDNA coding for the human interferon alpha/beta receptor (IFNAR), the IFNAR gene has been physically mapped relative to the other loci of the chromosome 21q22.1 region. 32,906 base pairs covering the IFNAR gene have been cloned and sequenced. Primer extension and solution hybridization-ribonuclease protection have been used to determine that the transcription of the gene is initiated in a broad region of 20 base pairs. Some aspects of the polymorphism of the gene, including noncoding sequences, have been analyzed; some are allelic differences in the coding sequence that induce amino acid variations in the resulting protein. The exon structure of the IFNAR gene and of that of the available genes for the receptors of the cytokine/growth hormone/prolactin/interferon receptor family have been compared with the predictions for the secondary structure of those receptors. From this analysis, we postulate a common origin and propose an hypothesis for the divergence from the immunoglobulin superfamily.
Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues.
Wheeler, Heather E; Shah, Kaanan P; Brenner, Jonathon; Garcia, Tzintzuni; Aquino-Michaels, Keston; Cox, Nancy J; Nicolae, Dan L; Im, Hae Kyung
2016-11-01
Understanding the genetic architecture of gene expression traits is key to elucidating the underlying mechanisms of complex traits. Here, for the first time, we perform a systematic survey of the heritability and the distribution of effect sizes across all representative tissues in the human body. We find that local h2 can be relatively well characterized with 59% of expressed genes showing significant h2 (FDR < 0.1) in the DGN whole blood cohort. However, current sample sizes (n ≤ 922) do not allow us to compute distal h2. Bayesian Sparse Linear Mixed Model (BSLMM) analysis provides strong evidence that the genetic contribution to local expression traits is dominated by a handful of genetic variants rather than by the collective contribution of a large number of variants each of modest size. In other words, the local architecture of gene expression traits is sparse rather than polygenic across all 40 tissues (from DGN and GTEx) examined. This result is confirmed by the sparsity of optimal performing gene expression predictors via elastic net modeling. To further explore the tissue context specificity, we decompose the expression traits into cross-tissue and tissue-specific components using a novel Orthogonal Tissue Decomposition (OTD) approach. Through a series of simulations we show that the cross-tissue and tissue-specific components are identifiable via OTD. Heritability and sparsity estimates of these derived expression phenotypes show similar characteristics to the original traits. Consistent properties relative to prior GTEx multi-tissue analysis results suggest that these traits reflect the expected biology. Finally, we apply this knowledge to develop prediction models of gene expression traits for all tissues. The prediction models, heritability, and prediction performance R2 for original and decomposed expression phenotypes are made publicly available (https://github.com/hakyimlab/PrediXcan).
The Genetic Cost of Neanderthal Introgression
Harris, Kelley; Nielsen, Rasmus
2016-01-01
Approximately 2–4% of genetic material in human populations outside Africa is derived from Neanderthals who interbred with anatomically modern humans. Recent studies have shown that this Neanderthal DNA is depleted around functional genomic regions; this has been suggested to be a consequence of harmful epistatic interactions between human and Neanderthal alleles. However, using published estimates of Neanderthal inbreeding and the distribution of mutational fitness effects, we infer that Neanderthals had at least 40% lower fitness than humans on average; this increased load predicts the reduction in Neanderthal introgression around genes without the need to invoke epistasis. We also predict a residual Neanderthal mutational load in non-Africans, leading to a fitness reduction of at least 0.5%. This effect of Neanderthal admixture has been left out of previous debate on mutation load differences between Africans and non-Africans. We also show that if many deleterious mutations are recessive, the Neanderthal admixture fraction could increase over time due to the protective effect of Neanderthal haplotypes against deleterious alleles that arose recently in the human population. This might partially explain why so many organisms retain gene flow from other species and appear to derive adaptive benefits from introgression. PMID:27038113
Sjögren, Rasmus J. O.; Egan, Brendan; Katayama, Mutsumi; Zierath, Juleen R.
2014-01-01
microRNAs (miRNAs) are short noncoding RNAs that regulate gene expression through posttranscriptional repression of target genes. miRNAs exert a fundamental level of control over many developmental processes, but their role in the differentiation and development of skeletal muscle from myogenic progenitor cells in humans remains incompletely understood. Using primary cultures established from human skeletal muscle satellite cells, we performed microarray profiling of miRNA expression during differentiation of myoblasts (day 0) into myotubes at 48 h intervals (day 2, 4, 6, 8, and 10). Based on a time-course analysis, we identified 44 miRNAs with altered expression [false discovery rate (FDR) < 5%, fold change > ±1.2] during differentiation, including the marked upregulation of the canonical myogenic miRNAs miR-1, miR-133a, miR-133b, and miR-206. Microarray profiling of mRNA expression at day 0, 4, and 10 identified 842 and 949 genes differentially expressed (FDR < 10%) at day 4 and 10, respectively. At day 10, 42% of altered transcripts demonstrated reciprocal expression patterns in relation to the directional change of their in silico predicted regulatory miRNAs based on analysis using Ingenuity Pathway Analysis microRNA Target Filter. Bioinformatic analysis predicted networks of regulation during differentiation including myomiRs miR-1/206 and miR-133a/b, miRNAs previously established in differentiation including miR-26 and miR-30, and novel miRNAs regulated during differentiation of human skeletal muscle cells such as miR-138-5p and miR-20a. These reciprocal expression patterns may represent new regulatory nodes in human skeletal muscle cell differentiation. This analysis serves as a reference point for future studies of human skeletal muscle differentiation and development in healthy and disease states. PMID:25547110
Recombination-Mediated Host Adaptation by Avian Staphylococcus aureus
Murray, Susan; Pascoe, Ben; Méric, Guillaume; Mageiros, Leonardos; Yahara, Koji; Hitchings, Matthew D.; Friedmann, Yasmin; Wilkinson, Thomas S.; Gormley, Fraser J.; Mack, Dietrich; Bray, James E.; Lamble, Sarah; Bowden, Rory; Jolley, Keith A.; Maiden, Martin C.J.; Wendlandt, Sarah; Schwarz, Stefan; Corander, Jukka; Fitzgerald, J. Ross
2017-01-01
Staphylococcus aureus are globally disseminated among farmed chickens causing skeletal muscle infections, dermatitis, and septicaemia. The emergence of poultry-associated lineages has involved zoonotic transmission from humans to chickens but questions remain about the specific adaptations that promote proliferation of chicken pathogens. We characterized genetic variation in a population of genome-sequenced S. aureus isolates of poultry and human origin. Genealogical analysis identified a dominant poultry-associated sequence cluster within the CC5 clonal complex. Poultry and human CC5 isolates were significantly distinct from each other and more recombination events were detected in the poultry isolates. We identified 44 recombination events in 33 genes along the branch extending to the poultry-specific CC5 cluster, and 47 genes were found more often in CC5 poultry isolates compared with those from humans. Many of these gene sequences were common in chicken isolates from other clonal complexes suggesting horizontal gene transfer among poultry associated lineages. Consistent with functional predictions for putative poultry-associated genes, poultry isolates showed enhanced growth at 42 °C and greater erythrocyte lysis on chicken blood agar in comparison with human isolates. By combining phenotype information with evolutionary analyses of staphylococcal genomes, we provide evidence of adaptation, following a human-to-poultry host transition. This has important implications for the emergence and dissemination of new pathogenic clones associated with modern agriculture. PMID:28338786
Wang, QuanQiu; Xu, Rong
2017-07-01
Human metabolomics has great potential in disease mechanism understanding, early diagnosis, and therapy. Existing metabolomics studies are often based on profiling patient biofluids and tissue samples and are difficult owing to the challenges of sample collection and data processing. Here, we report an alternative approach and developed a computation-based prediction system, MetabolitePredict, for disease metabolomics biomarker prediction. We applied MetabolitePredict to identify metabolite biomarkers and metabolite targeting therapies for rheumatoid arthritis (RA), a last-lasting complex disease with multiple genetic and environmental factors involved. MetabolitePredict is a de novo prediction system. It first constructs a disease-specific genetic profile using genes and pathways data associated with an input disease. It then constructs genetic profiles for a total of 259,170 chemicals/metabolites using known chemical genetics and human metabolomic data. MetabolitePredict prioritizes metabolites for a given disease based on the genetic profile similarities between disease and metabolites. We evaluated MetabolitePredict using 63 known RA-associated metabolites. MetabolitePredict found 24 of the 63 metabolites (recall: 0.38) and ranked them highly (mean ranking: top 4.13%, median ranking: top 1.10%, P-value: 5.08E-19). MetabolitePredict performed better than an existing metabolite prediction system, PROFANCY, in predicting RA-associated metabolites (PROFANCY: recall: 0.31, mean ranking: 20.91%, median ranking: 16.47%, P-value: 3.78E-7). Short-chain fatty acids (SCFAs), the abundant metabolites of gut microbiota in the fermentation of fiber, ranked highly (butyrate, 0.03%; acetate, 0.05%; propionate, 0.38%). Finally, we established MetabolitePredict's potential in novel metabolite targeting for disease treatment: MetabolitePredict ranked highly three known metabolite inhibitors for RA treatments (methotrexate:0.25%; leflunomide: 0.56%; sulfasalazine: 0.92%). MetabolitePredict is a generalizable disease metabolite prediction system. The only required input to the system is a disease name or a set of disease-associated genes. The web-based MetabolitePredict is available at:http://xulab. edu/MetabolitePredict. Copyright © 2017 Elsevier Inc. All rights reserved.
Coordinated action of histone modification and microRNA regulations in human genome.
Wang, Xuan; Zheng, Guantao; Dong, Dong
2015-10-10
Both histone modifications and microRNAs (miRNAs) play pivotal role in gene expression regulation. Although numerous studies have been devoted to explore the gene regulation by miRNA and epigenetic regulations, their coordinated actions have not been comprehensively examined. In this work, we systematically investigated the combinatorial relationship between miRNA and epigenetic regulation by taking advantage of recently published whole genome-wide histone modification data and high quality miRNA targeting data. The results showed that miRNA targets have distinct histone modification patterns compared with non-targets in their promoter regions. Based on this finding, we proposed a machine learning approach to fit predictive models on the task to discern whether a gene is targeted by a specific miRNA. We found a considerable advantage in both sensitivity and specificity in diverse human cell lines. Finally, we found that our predicted miRNA targets are consistently annotated with Gene Ontology terms. Our work is the first genome-wide investigation of the coordinated action of miRNA and histone modification regulations, which provide a guide to deeply understand the complexity of transcriptional regulation. Copyright © 2015 Elsevier B.V. All rights reserved.
Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man
Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.
2000-01-01
The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
A network approach to predict pathogenic genes for Fusarium graminearum.
Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan
2010-10-04
Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which demonstrate the effectiveness of the proposed method. The results presented in this paper not only can provide guidelines for future experimental verification, but also shed light on the pathogenesis of the destructive fungus F. graminearum.
Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca
2006-01-01
Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. PMID:16685651
Zhelyabovskaya, Olga B.; Berlin, Yuri A.; Birikh, Klara R.
2004-01-01
In bacterial expression systems, translation initiation is usually the rate limiting and the least predictable stage of protein synthesis. Efficiency of a translation initiation site can vary dramatically depending on the sequence context. This is why many standard expression vectors provide very poor expression levels of some genes. This notion persuaded us to develop an artificial genetic selection protocol, which allows one to find for a given target gene an individual efficient ribosome binding site from a random pool. In order to create Darwinian pressure necessary for the genetic selection, we designed a system based on translational coupling, in which microorganism survival in the presence of antibiotic depends on expression of the target gene, while putting no special requirements on this gene. Using this system we obtained superproducing constructs for the human protein RACK1 (receptor for activated C kinase). PMID:15034151
The transcriptional landscape of age in human peripheral blood
Peters, Marjolein J.; Joehanes, Roby; Pilling, Luke C.; Schurmann, Claudia; Conneely, Karen N.; Powell, Joseph; Reinmaa, Eva; Sutphin, George L.; Zhernakova, Alexandra; Schramm, Katharina; Wilson, Yana A.; Kobes, Sayuko; Tukiainen, Taru; Nalls, Michael A.; Hernandez, Dena G.; Cookson, Mark R.; Gibbs, Raphael J.; Hardy, John; Ramasamy, Adaikalavan; Zonderman, Alan B.; Dillman, Allissa; Traynor, Bryan; Smith, Colin; Longo, Dan L.; Trabzuni, Daniah; Troncoso, Juan; van der Brug, Marcel; Weale, Michael E.; O'Brien, Richard; Johnson, Robert; Walker, Robert; Zielke, Ronald H.; Arepalli, Sampath; Ryten, Mina; Singleton, Andrew B.; Ramos, Yolande F.; Göring, Harald H. H.; Fornage, Myriam; Liu, Yongmei; Gharib, Sina A.; Stranger, Barbara E.; De Jager, Philip L.; Aviv, Abraham; Levy, Daniel; Murabito, Joanne M.; Munson, Peter J.; Huan, Tianxiao; Hofman, Albert; Uitterlinden, André G.; Rivadeneira, Fernando; van Rooij, Jeroen; Stolk, Lisette; Broer, Linda; Verbiest, Michael M. P. J.; Jhamai, Mila; Arp, Pascal; Metspalu, Andres; Tserel, Liina; Milani, Lili; Samani, Nilesh J.; Peterson, Pärt; Kasela, Silva; Codd, Veryan; Peters, Annette; Ward-Caviness, Cavin K.; Herder, Christian; Waldenberger, Melanie; Roden, Michael; Singmann, Paula; Zeilinger, Sonja; Illig, Thomas; Homuth, Georg; Grabe, Hans-Jörgen; Völzke, Henry; Steil, Leif; Kocher, Thomas; Murray, Anna; Melzer, David; Yaghootkar, Hanieh; Bandinelli, Stefania; Moses, Eric K.; Kent, Jack W.; Curran, Joanne E.; Johnson, Matthew P.; Williams-Blangero, Sarah; Westra, Harm-Jan; McRae, Allan F.; Smith, Jennifer A.; Kardia, Sharon L. R.; Hovatta, Iiris; Perola, Markus; Ripatti, Samuli; Salomaa, Veikko; Henders, Anjali K.; Martin, Nicholas G.; Smith, Alicia K.; Mehta, Divya; Binder, Elisabeth B.; Nylocks, K Maria; Kennedy, Elizabeth M.; Klengel, Torsten; Ding, Jingzhong; Suchy-Dicey, Astrid M.; Enquobahrie, Daniel A.; Brody, Jennifer; Rotter, Jerome I.; Chen, Yii-Der I.; Houwing-Duistermaat, Jeanine; Kloppenburg, Margreet; Slagboom, P. Eline; Helmer, Quinta; den Hollander, Wouter; Bean, Shannon; Raj, Towfique; Bakhshi, Noman; Wang, Qiao Ping; Oyston, Lisa J.; Psaty, Bruce M.; Tracy, Russell P.; Montgomery, Grant W.; Turner, Stephen T.; Blangero, John; Meulenbelt, Ingrid; Ressler, Kerry J.; Yang, Jian; Franke, Lude; Kettunen, Johannes; Visscher, Peter M.; Neely, G. Gregory; Korstanje, Ron; Hanson, Robert L.; Prokisch, Holger; Ferrucci, Luigi; Esko, Tonu; Teumer, Alexander; van Meurs, Joyce B. J.; Johnson, Andrew D.
2015-01-01
Disease incidences increase with age, but the molecular characteristics of ageing that lead to increased disease susceptibility remain inadequately understood. Here we perform a whole-blood gene expression meta-analysis in 14,983 individuals of European ancestry (including replication) and identify 1,497 genes that are differentially expressed with chronological age. The age-associated genes do not harbor more age-associated CpG-methylation sites than other genes, but are instead enriched for the presence of potentially functional CpG-methylation sites in enhancer and insulator regions that associate with both chronological age and gene expression levels. We further used the gene expression profiles to calculate the ‘transcriptomic age' of an individual, and show that differences between transcriptomic age and chronological age are associated with biological features linked to ageing, such as blood pressure, cholesterol levels, fasting glucose, and body mass index. The transcriptomic prediction model adds biological relevance and complements existing epigenetic prediction models, and can be used by others to calculate transcriptomic age in external cohorts. PMID:26490707
MicroRNA signature of the human developing pancreas.
Rosero, Samuel; Bravo-Egana, Valia; Jiang, Zhijie; Khuri, Sawsan; Tsinoremas, Nicholas; Klein, Dagmar; Sabates, Eduardo; Correa-Medina, Mayrin; Ricordi, Camillo; Domínguez-Bendala, Juan; Diez, Juan; Pastori, Ricardo L
2010-09-22
MicroRNAs are non-coding RNAs that regulate gene expression including differentiation and development by either inhibiting translation or inducing target degradation. The aim of this study is to determine the microRNA expression signature during human pancreatic development and to identify potential microRNA gene targets calculating correlations between the signature microRNAs and their corresponding mRNA targets, predicted by bioinformatics, in genome-wide RNA microarray study. The microRNA signature of human fetal pancreatic samples 10-22 weeks of gestational age (wga), was obtained by PCR-based high throughput screening with Taqman Low Density Arrays. This method led to identification of 212 microRNAs. The microRNAs were classified in 3 groups: Group number I contains 4 microRNAs with the increasing profile; II, 35 microRNAs with decreasing profile and III with 173 microRNAs, which remain unchanged. We calculated Pearson correlations between the expression profile of microRNAs and target mRNAs, predicted by TargetScan 5.1 and miRBase algorithms, using genome-wide mRNA expression data. Group I correlated with the decreasing expression of 142 target mRNAs and Group II with the increasing expression of 876 target mRNAs. Most microRNAs correlate with multiple targets, just as mRNAs are targeted by multiple microRNAs. Among the identified targets are the genes and transcription factors known to play an essential role in pancreatic development. We have determined specific groups of microRNAs in human fetal pancreas that change the degree of their expression throughout the development. A negative correlative analysis suggests an intertwined network of microRNAs and mRNAs collaborating with each other. This study provides information leading to potential two-way level of combinatorial control regulating gene expression through microRNAs targeting multiple mRNAs and, conversely, target mRNAs regulated in parallel by other microRNAs as well. This study may further the understanding of gene expression regulation in the human developing pancreas.
MicroRNA signature of the human developing pancreas
2010-01-01
Background MicroRNAs are non-coding RNAs that regulate gene expression including differentiation and development by either inhibiting translation or inducing target degradation. The aim of this study is to determine the microRNA expression signature during human pancreatic development and to identify potential microRNA gene targets calculating correlations between the signature microRNAs and their corresponding mRNA targets, predicted by bioinformatics, in genome-wide RNA microarray study. Results The microRNA signature of human fetal pancreatic samples 10-22 weeks of gestational age (wga), was obtained by PCR-based high throughput screening with Taqman Low Density Arrays. This method led to identification of 212 microRNAs. The microRNAs were classified in 3 groups: Group number I contains 4 microRNAs with the increasing profile; II, 35 microRNAs with decreasing profile and III with 173 microRNAs, which remain unchanged. We calculated Pearson correlations between the expression profile of microRNAs and target mRNAs, predicted by TargetScan 5.1 and miRBase altgorithms, using genome-wide mRNA expression data. Group I correlated with the decreasing expression of 142 target mRNAs and Group II with the increasing expression of 876 target mRNAs. Most microRNAs correlate with multiple targets, just as mRNAs are targeted by multiple microRNAs. Among the identified targets are the genes and transcription factors known to play an essential role in pancreatic development. Conclusions We have determined specific groups of microRNAs in human fetal pancreas that change the degree of their expression throughout the development. A negative correlative analysis suggests an intertwined network of microRNAs and mRNAs collaborating with each other. This study provides information leading to potential two-way level of combinatorial control regulating gene expression through microRNAs targeting multiple mRNAs and, conversely, target mRNAs regulated in parallel by other microRNAs as well. This study may further the understanding of gene expression regulation in the human developing pancreas. PMID:20860821
Pirrò, Stefano; Zanella, Letizia; Kenzo, Maurice; Montesano, Carla; Minutolo, Antonella; Potestà, Marina; Sobze, Martin Sanou; Canini, Antonella; Cirilli, Marco; Muleo, Rosario; Colizzi, Vittorio; Galgani, Andrea
2016-01-01
Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs), which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA
Eden, E.; Brunak, S.
2004-01-01
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
2010-01-01
Background Osteosarcoma (OSA) spontaneously arises in the appendicular skeleton of large breed dogs and shares many physiological and molecular biological characteristics with human OSA. The standard treatment for OSA in both species is amputation or limb-sparing surgery, followed by chemotherapy. Unfortunately, OSA is an aggressive cancer with a high metastatic rate. Characterization of OSA with regard to its metastatic potential and chemotherapeutic resistance will improve both prognostic capabilities and treatment modalities. Methods We analyzed archived primary OSA tissue from dogs treated with limb amputation followed by doxorubicin or platinum-based drug chemotherapy. Samples were selected from two groups: dogs with disease free intervals (DFI) of less than 100 days (n = 8) and greater than 300 days (n = 7). Gene expression was assessed with Affymetrix Canine 2.0 microarrays and analyzed with a two-tailed t-test. A subset of genes was confirmed using qRT-PCR and used in classification analysis to predict prognosis. Systems-based gene ontology analysis was conducted on genes selected using a standard J5 metric. The genes identified using this approach were converted to their human homologues and assigned to functional pathways using the GeneGo MetaCore platform. Results Potential biomarkers were identified using gene expression microarray analysis and 11 differentially expressed (p < 0.05) genes were validated with qRT-PCR (n = 10/group). Statistical classification models using the qRT-PCR profiles predicted patient outcomes with 100% accuracy in the training set and up to 90% accuracy upon stratified cross validation. Pathway analysis revealed alterations in pathways associated with oxidative phosphorylation, hedgehog and parathyroid hormone signaling, cAMP/Protein Kinase A (PKA) signaling, immune responses, cytoskeletal remodeling and focal adhesion. Conclusions This profiling study has identified potential new biomarkers to predict patient outcome in OSA and new pathways that may be targeted for therapeutic intervention. PMID:20860831
Zhu, Jie; Qin, Yufang; Liu, Taigang; Wang, Jun; Zheng, Xiaoqi
2013-01-01
Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. Programs and data are available upon request.
O'Brien, Carol; Wallin, Jeffrey J; Sampath, Deepak; GuhaThakurta, Debraj; Savage, Heidi; Punnoose, Elizabeth A; Guan, Jane; Berry, Leanne; Prior, Wei Wei; Amler, Lukas C; Belvin, Marcia; Friedman, Lori S; Lackner, Mark R
2010-07-15
The class I phosphatidylinositol 3' kinase (PI3K) plays a major role in proliferation and survival in a wide variety of human cancers. A key factor in successful development of drugs targeting this pathway is likely to be the identification of responsive patient populations with predictive diagnostic biomarkers. This study sought to identify candidate biomarkers of response to the selective PI3K inhibitor GDC-0941. We used a large panel of breast cancer cell lines and in vivo xenograft models to identify candidate predictive biomarkers for a selective inhibitor of class I PI3K that is currently in clinical development. The approach involved pharmacogenomic profiling as well as analysis of gene expression data sets from cells profiled at baseline or after GDC-0941 treatment. We found that models harboring mutations in PIK3CA, amplification of human epidermal growth factor receptor 2, or dual alterations in two pathway components were exquisitely sensitive to the antitumor effects of GDC-0941. We found that several models that do not harbor these alterations also showed sensitivity, suggesting a need for additional diagnostic markers. Gene expression studies identified a collection of genes whose expression was associated with in vitro sensitivity to GDC-0941, and expression of a subset of these genes was found to be intimately linked to signaling through the pathway. Pathway focused biomarkers and the gene expression signature described in this study may have utility in the identification of patients likely to benefit from therapy with a selective PI3K inhibitor. Copyright 2010 AACR.
Massive NGS Data Analysis Reveals Hundreds Of Potential Novel Gene Fusions in Human Cell Lines.
Gioiosa, Silvia; Bolis, Marco; Flati, Tiziano; Massini, Annalisa; Garattini, Enrico; Chillemi, Giovanni; Fratelli, Maddalena; Castrignanò, Tiziana
2018-06-01
Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets.
A fast and high performance multiple data integration algorithm for identifying human disease genes
2015-01-01
Background Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. Results In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. Conclusions The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms. PMID:26399620
Haplotypes in SLC24A5 Gene as Ancestry Informative Markers in Different Populations
Giardina, Emiliano; Pietrangeli, Ilenia; Martínez-Labarga, Cristina; Martone, Claudia; de Angelis, Flavio; Spinella, Aldo; De Stefano, Gianfranco; Rickards, Olga; Novelli, Giuseppe
2008-01-01
Ancestry informative markers (AIMs) are human polymorphisms that exhibit substantially allele frequency differences among populations. These markers can be useful to provide information about ancestry of samples which may be useful in predicting a perpetrator’s ethnic origin to aid criminal investigations. Variations in human pigmentation are the most obvious phenotypes to distinguish individuals. It has been recently shown that the variation of a G in an A allele of the coding single-nucleotide polymorphism (SNP) rs1426654 within SLC24A5 gene varies in frequency among several population samples according to skin pigmentation. Because of these observations, the SLC24A5 locus has been evaluated as Ancestry Informative Region (AIR) by typing rs1426654 together with two additional intragenic markers (rs2555364 and rs16960620) in 471 unrelated individuals originating from three different continents (Africa, Asia and Europe). This study further supports the role of human SLC24A5 gene in skin pigmentation suggesting that variations in SLC24A5 haplotypes can correlate with human migration and ancestry. Furthermore, our data do reveal the utility of haplotype and combined unphased genotype analysis of SLC24A5 in predicting ancestry and provide a good example of usefulness of genetic characterization of larger regions, in addition to single polymorphisms, as candidates for population-specific sweeps in the ancestral population. PMID:19440451
Microarray Analysis of Differential Gene Expression Profile Between Human Fetal and Adult Heart.
Geng, Zhimin; Wang, Jue; Pan, Lulu; Li, Ming; Zhang, Jitai; Cai, Xueli; Chu, Maoping
2017-04-01
Although many changes have been discovered during heart maturation, the genetic mechanisms involved in the changes between immature and mature myocardium have only been partially elucidated. Here, gene expression profile changed between the human fetal and adult heart was characterized. A human microarray was applied to define the gene expression signatures of the fetal (13-17 weeks of gestation, n = 4) and adult hearts (30-40 years old, n = 4). Gene ontology analyses, pathway analyses, gene set enrichment analyses, and signal transduction network were performed to predict the function of the differentially expressed genes. Ten mRNAs were confirmed by quantificational real-time polymerase chain reaction. 5547 mRNAs were found to be significantly differentially expressed. "Cell cycle" was the most enriched pathway in the down-regulated genes. EFGR, IGF1R, and ITGB1 play a central role in the regulation of heart development. EGFR, IGF1R, and FGFR2 were the core genes regulating cardiac cell proliferation. The quantificational real-time polymerase chain reaction results were concordant with the microarray data. Our data identified the transcriptional regulation of heart development in the second trimester and the potential regulators that play a prominent role in the regulation of heart development and cardiac cells proliferation.
Digital Quantification of Human Eye Color Highlights Genetic Association of Three New Loci
Liu, Fan; Wollstein, Andreas; Hysi, Pirro G.; Ankra-Badu, Georgina A.; Spector, Timothy D.; Park, Daniel; Zhu, Gu; Larsson, Mats; Duffy, David L.; Montgomery, Grant W.; Mackey, David A.; Walsh, Susan; Lao, Oscar; Hofman, Albert; Rivadeneira, Fernando; Vingerling, Johannes R.; Uitterlinden, André G.; Martin, Nicholas G.; Hammond, Christopher J.; Kayser, Manfred
2010-01-01
Previous studies have successfully identified genetic variants in several genes associated with human iris (eye) color; however, they all used simplified categorical trait information. Here, we quantified continuous eye color variation into hue and saturation values using high-resolution digital full-eye photographs and conducted a genome-wide association study on 5,951 Dutch Europeans from the Rotterdam Study. Three new regions, 1q42.3, 17q25.3, and 21q22.13, were highlighted meeting the criterion for genome-wide statistically significant association. The latter two loci were replicated in 2,261 individuals from the UK and in 1,282 from Australia. The LYST gene at 1q42.3 and the DSCR9 gene at 21q22.13 serve as promising functional candidates. A model for predicting quantitative eye colors explained over 50% of trait variance in the Rotterdam Study. Over all our data exemplify that fine phenotyping is a useful strategy for finding genes involved in human complex traits. PMID:20463881
GIANT API: an application programming interface for functional genomics
Roberts, Andrew M.; Wong, Aaron K.; Fisk, Ian; Troyanskaya, Olga G.
2016-01-01
GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. PMID:27098035
2010-01-01
Background Hyperactivation of the Ras signaling pathway is a driver of many cancers, and RAS pathway activation can predict response to targeted therapies. Therefore, optimal methods for measuring Ras pathway activation are critical. The main focus of our work was to develop a gene expression signature that is predictive of RAS pathway dependence. Methods We used the coherent expression of RAS pathway-related genes across multiple datasets to derive a RAS pathway gene expression signature and generate RAS pathway activation scores in pre-clinical cancer models and human tumors. We then related this signature to KRAS mutation status and drug response data in pre-clinical and clinical datasets. Results The RAS signature score is predictive of KRAS mutation status in lung tumors and cell lines with high (> 90%) sensitivity but relatively low (50%) specificity due to samples that have apparent RAS pathway activation in the absence of a KRAS mutation. In lung and breast cancer cell line panels, the RAS pathway signature score correlates with pMEK and pERK expression, and predicts resistance to AKT inhibition and sensitivity to MEK inhibition within both KRAS mutant and KRAS wild-type groups. The RAS pathway signature is upregulated in breast cancer cell lines that have acquired resistance to AKT inhibition, and is downregulated by inhibition of MEK. In lung cancer cell lines knockdown of KRAS using siRNA demonstrates that the RAS pathway signature is a better measure of dependence on RAS compared to KRAS mutation status. In human tumors, the RAS pathway signature is elevated in ER negative breast tumors and lung adenocarcinomas, and predicts resistance to cetuximab in metastatic colorectal cancer. Conclusions These data demonstrate that the RAS pathway signature is superior to KRAS mutation status for the prediction of dependence on RAS signaling, can predict response to PI3K and RAS pathway inhibitors, and is likely to have the most clinical utility in lung and breast tumors. PMID:20591134
Scott, Milcah C.; Sarver, Aaron L.; Gavin, Katherine J.; Thayanithy, Venugopal; Getzy, David M.; Newman, Robert A.; Cutter, Gary R.; Lindblad-Toh, Kerstin; Kisseberth, William C.; Hunter, Lawrence E.; Subramanian, Subbaya; Breen, Matthew; Modiano, Jaime F.
2011-01-01
The heterogeneous and chaotic nature of osteosarcoma has confounded accurate molecular classification, prognosis, and prediction for this tumor. The occurrence of spontaneous osteosarcoma is largely confined to humans and dogs. While the clinical features are remarkably similar in both species, the organization of dogs into defined breeds provides a more homogeneous genetic background that may increase the likelihood to uncover molecular subtypes for this complex disease. We thus hypothesized that molecular profiles derived from canine osteosarcoma would aid in molecular subclassification of this disease when applied to humans. To test the hypothesis, we performed genome wide gene expression profiling in a cohort of dogs with osteosarcoma, primarily from high-risk breeds. To further reduce inter-sample heterogeneity, we assessed tumor-intrinsic properties through use of an extensive panel of osteosarcoma-derived cell lines. We observed strong differential gene expression that segregated samples into two groups with differential survival probabilities. Groupings were characterized by the inversely correlated expression of genes associated with G2/M transition and DNA damage checkpoint and microenvironment-interaction categories. This signature was preserved in data from whole tumor samples of three independent dog osteosarcoma cohorts, with stratification into the two expected groups. Significantly, this restricted signature partially overlapped a previously defined, predictive signature for soft tissue sarcomas, and it unmasked orthologous molecular subtypes and their corresponding natural histories in five independent data sets from human patients with osteosarcoma. Our results indicate that the narrower genetic diversity of dogs can be utilized to group complex human osteosarcoma into biologically and clinically relevant molecular subtypes. This in turn may enhance prognosis and prediction, and identify relevant therapeutic targets. PMID:21621658
Predicting disease-related proteins based on clique backbone in protein-protein interaction network.
Yang, Lei; Zhao, Xudong; Tang, Xianglong
2014-01-01
Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.
USDA-ARS?s Scientific Manuscript database
Clostridium perfringens is a Gram-positive, spore-forming anaerobic bacterium that plays a significant role in human food-borne disease as well as non-food-borne human, animal and poultry diseases. There has been a resurgent interest in the use of bacteriophages or their gene products to control ba...
Genome-wide analysis of alternative splicing during human heart development
NASA Astrophysics Data System (ADS)
Wang, He; Chen, Yanmei; Li, Xinzhong; Chen, Guojun; Zhong, Lintao; Chen, Gangbing; Liao, Yulin; Liao, Wangjun; Bin, Jianping
2016-10-01
Alternative splicing (AS) drives determinative changes during mouse heart development. Recent high-throughput technological advancements have facilitated genome-wide AS, while its analysis in human foetal heart transition to the adult stage has not been reported. Here, we present a high-resolution global analysis of AS transitions between human foetal and adult hearts. RNA-sequencing data showed extensive AS transitions occurred between human foetal and adult hearts, and AS events occurred more frequently in protein-coding genes than in long non-coding RNA (lncRNA). A significant difference of AS patterns was found between foetal and adult hearts. The predicted difference in AS events was further confirmed using quantitative reverse transcription-polymerase chain reaction analysis of human heart samples. Functional foetal-specific AS event analysis showed enrichment associated with cell proliferation-related pathways including cell cycle, whereas adult-specific AS events were associated with protein synthesis. Furthermore, 42.6% of foetal-specific AS events showed significant changes in gene expression levels between foetal and adult hearts. Genes exhibiting both foetal-specific AS and differential expression were highly enriched in cell cycle-associated functions. In conclusion, we provided a genome-wide profiling of AS transitions between foetal and adult hearts and proposed that AS transitions and deferential gene expression may play determinative roles in human heart development.
Decoding the complex genetic causes of heart diseases using systems biology.
Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K
2015-03-01
The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Labib, Sarah, E-mail: Sarah.Labib@hc-sc.gc.ca; Guo, Charles H., E-mail: Charles.Guo@hc-sc.gc.ca; Williams, Andrew, E-mail: Andrew.Williams@hc-sc.gc.ca
2013-12-01
Forestomach tumors are observed in mice exposed to environmental carcinogens. However, the relevance of this data to humans is controversial because humans lack a forestomach. We hypothesize that an understanding of early molecular changes after exposure to a carcinogen in the forestomach will provide mode-of-action information to evaluate the applicability of forestomach cancers to human cancer risk assessment. In the present study we exposed mice to benzo(a)pyrene (BaP), an environmental carcinogen commonly associated with tumors of the rodent forestomach. Toxicogenomic tools were used to profile gene expression response in the forestomach. Adult Muta™Mouse males were orally exposed to 25, 50,more » and 75 mg BaP/kg-body-weight/day for 28 consecutive days. The forestomach was collected three days post-exposure. DNA microarrays, real-time RT-qPCR arrays, and protein analyses were employed to characterize responses in the forestomach. Microarray results showed altered expression of 414 genes across all treatment groups (± 1.5 fold; false discovery rate adjusted P ≤ 0.05). Significant downregulation of genes associated with phase II xenobiotic metabolism and increased expression of genes implicated in antigen processing and presentation, immune response, chemotaxis, and keratinocyte differentiation were observed in treated groups in a dose-dependent manner. A systematic comparison of the differentially expressed genes in the forestomach from the present study to differentially expressed genes identified in human diseases including human gastrointestinal tract cancers using the NextBio Human Disease Atlas showed significant commonalities between the two models. Our results provide molecular evidence supporting the use of the mouse forestomach model to evaluate chemically-induced gastrointestinal carcinogenesis in humans. - Highlights: • Benzo(a)pyrene-mediated transcriptomic response in the forestomach was examined. • The immunoproteosome subunits and MHC class I pathway were the most affected. • Keratinocyte differentiation associated gene expression changes were dose-dependent. • Molecular similarities exist between cancers of the forestomach and human stomach.« less
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP.
Deng, Li; Wang, Guohua; Chen, Bo
2015-01-01
In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency.
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP
Wang, Guohua; Chen, Bo
2015-01-01
In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency. PMID:26448740
Abrisqueta Zarrabe, J A
1999-01-01
The Human Genome Project (HGP) is the greatest scientific adventure in modern human biology, and the genetic map that is going to be revealed through this Project is going to be an important basis of the medicine of the future. Human beings do not however depend solely on their genes. In order to comprehend human pathology, it is essential to focus on the genetic factors and on the environmental factors. Genetic diagnoses, being fostered by the HGP, make it possible to know genetic predisposition and the risks of the onset of a given disorder. Predictive medicine offers great hopes, but is giving rise to major concerns and is causing ethics-related dilemmas. Confidentiality, the moral imperative of medicine, is necessary to prevent discriminatory deviations. As is stated in the UNESCO's Universal Declaration on the Human Genome and Human Rights, no one shall be subjected to discrimination based on genetic characteristics.
Manzardo, A M; Gunewardena, S; Butler, M G
2013-09-10
We examined miRNA expression from RNA isolated from the frontal cortex (Broadman area 9) of 9 alcoholics (6 males, 3 females, mean age 48 years) and 9 matched controls using both the Affymetrix GeneChip miRNA 2.0 and Human Exon 1.0 ST Arrays to further characterize genetic influences in alcoholism and the effects of alcohol consumption on predicted target mRNA expression. A total of 12 human miRNAs were significantly up-regulated in alcohol dependent subjects (fold change≥1.5, false discovery rate (FDR)≤0.3; p<0.05) compared with controls including a cluster of 4 miRNAs (e.g., miR-377, miR-379) from the maternally expressed 14q32 chromosome region. The status of the up-regulated miRNAs was supported using the high-throughput method of exon microarrays showing decreased predicted mRNA gene target expression as anticipated from the same RNA aliquot. Predicted mRNA targets were involved in cellular adhesion (e.g., THBS2), tissue differentiation (e.g., CHN2), neuronal migration (e.g., NDE1), myelination (e.g., UGT8, CNP) and oligodendrocyte proliferation (e.g., ENPP2, SEMA4D1). Our data support an association of alcoholism with up-regulation of a cluster of miRNAs located in the genomic imprinted domain on chromosome 14q32 with their predicted gene targets involved with oligodendrocyte growth, differentiation and signaling. Copyright © 2013 Elsevier B.V. All rights reserved.
Satellite DNA-based artificial chromosomes for use in gene therapy.
Hadlaczky, G
2001-04-01
Satellite DNA-based artificial chromosomes (SATACs) can be made by induced de novo chromosome formation in cells of different mammalian species. These artificially generated accessory chromosomes are composed of predictable DNA sequences and they contain defined genetic information. Prototype human SATACs have been successfully constructed in different cell types from 'neutral' endogenous DNA sequences from the short arm of the human chromosome 15. SATACs have already passed a number of hurdles crucial to their further development as gene therapy vectors, including: large-scale purification; transfer of purified artificial chromosomes into different cells and embryos; generation of transgenic animals and germline transmission with purified SATACs; and the tissue-specific expression of a therapeutic gene from an artificial chromosome in the milk of transgenic animals.
Characterization of ROS1 cDNA from a human glioblastoma cell line
DOE Office of Scientific and Technical Information (OSTI.GOV)
Birchmeier, C.; O'Neill, K.; Riggs, M.
1990-06-01
The authors have isolated and characterized a human ROS1 cDNA from the glioblastoma cell line SW-1088. The cDNA, 8.3 kilobases long, has the potential to encode a transmembrane tyrosine-specific protein kinase with a predicted molecular mass of 259 kDa. The putative extracellular domain of ROS1 is homologous to the extracellular domain of the sevenless gene product from Drosophila. No comparable similarities in the extracellular domains were found between ROS1 and other receptor-type tyrosine kinases. Together, ROS1 and sevenless gene products define a distinct subclass of transmember tyrosine kinases.
Chapman, Robert W; Reading, Benjamin J; Sullivan, Craig V
2014-01-01
Inherited gene transcripts deposited in oocytes direct early embryonic development in all vertebrates, but transcript profiles indicative of embryo developmental competence have not previously been identified. We employed artificial intelligence to model profiles of maternal ovary gene expression and their relationship to egg quality, evaluated as production of viable mid-blastula stage embryos, in the striped bass (Morone saxatilis), a farmed species with serious egg quality problems. In models developed using artificial neural networks (ANNs) and supervised machine learning, collective changes in the expression of a limited suite of genes (233) representing <2% of the queried ovary transcriptome explained >90% of the eventual variance in embryo survival. Egg quality related to minor changes in gene expression (<0.2-fold), with most individual transcripts making a small contribution (<1%) to the overall prediction of egg quality. These findings indicate that the predictive power of the transcriptome as regards egg quality resides not in levels of individual genes, but rather in the collective, coordinated expression of a suite of transcripts constituting a transcriptomic "fingerprint". Correlation analyses of the corresponding candidate genes indicated that dysfunction of the ubiquitin-26S proteasome, COP9 signalosome, and subsequent control of the cell cycle engenders embryonic developmental incompetence. The affected gene networks are centrally involved in regulation of early development in all vertebrates, including humans. By assessing collective levels of the relevant ovarian transcripts via ANNs we were able, for the first time in any vertebrate, to accurately predict the subsequent embryo developmental potential of eggs from individual females. Our results show that the transcriptomic fingerprint evidencing developmental dysfunction is highly predictive of, and therefore likely to regulate, egg quality, a biologically complex trait crucial to reproductive fitness.
2014-01-01
Inherited gene transcripts deposited in oocytes direct early embryonic development in all vertebrates, but transcript profiles indicative of embryo developmental competence have not previously been identified. We employed artificial intelligence to model profiles of maternal ovary gene expression and their relationship to egg quality, evaluated as production of viable mid-blastula stage embryos, in the striped bass (Morone saxatilis), a farmed species with serious egg quality problems. In models developed using artificial neural networks (ANNs) and supervised machine learning, collective changes in the expression of a limited suite of genes (233) representing <2% of the queried ovary transcriptome explained >90% of the eventual variance in embryo survival. Egg quality related to minor changes in gene expression (<0.2-fold), with most individual transcripts making a small contribution (<1%) to the overall prediction of egg quality. These findings indicate that the predictive power of the transcriptome as regards egg quality resides not in levels of individual genes, but rather in the collective, coordinated expression of a suite of transcripts constituting a transcriptomic “fingerprint”. Correlation analyses of the corresponding candidate genes indicated that dysfunction of the ubiquitin-26S proteasome, COP9 signalosome, and subsequent control of the cell cycle engenders embryonic developmental incompetence. The affected gene networks are centrally involved in regulation of early development in all vertebrates, including humans. By assessing collective levels of the relevant ovarian transcripts via ANNs we were able, for the first time in any vertebrate, to accurately predict the subsequent embryo developmental potential of eggs from individual females. Our results show that the transcriptomic fingerprint evidencing developmental dysfunction is highly predictive of, and therefore likely to regulate, egg quality, a biologically complex trait crucial to reproductive fitness. PMID:24820964
Goulielmos, George N; Zervou, Maria I; Myrthianou, Effie; Burska, Agata; Niewold, Timothy B; Ponchel, Frederique
2016-06-01
Rapid advances in genotyping technology, analytical methods, and the establishment of large cohorts for population genetic studies have resulted in a large new body of information about the genetic basis of human rheumatoid arthritis (RA). Improved understanding of the root pathogenesis of the disease holds the promise of improved diagnostic and prognostic tools based upon this information. In this review, we summarize the nature of new genetic findings in human RA, including susceptibility loci and gene-gene and gene-environment interactions, as well as genetic loci associated with sub-groups of patients and those associated with response to therapy. Possible uses of these data are discussed, such as prediction of disease risk as well as personalized therapy and prediction of therapeutic response and risk of adverse events. While these applications are largely not refined to the point of clinical utility in RA, it seems likely that multi-parameter datasets including genetic, clinical, and biomarker data will be employed in the future care of RA patients. Copyright © 2016 Elsevier B.V. All rights reserved.
Identification and characterization of the autophagy-related genes Atg12 and Atg5 in hydra.
Dixit, Nishikant S; Shravage, Bhupendra V; Ghaskadbi, Surendra
2017-01-01
Autophagy is an evolutionarily conserved process in eukaryotic cells that is involved in the degradation of cytoplasmic contents including organelles via the lysosome. Hydra is an early metazoan which exhibits simple tissue grade organization, a primitive nervous system, and is one of the classical non-bilaterian models extensively used in evo-devo research. Here, we describe the characterization of two core autophagy genes, Atg12 and Atg5, from hydra. In silico analyses including sequence similarity, domain analysis, and phylogenetic analysis demonstrate the conservation of these genes across eukaryotes. The predicted 3D structure of hydra Atg12 showed very little variance when compared to human Atg12 and yeast Atg12, whereas the hydra Atg5 predicted 3D structure was found to be variable, when compared with its human and yeast homologs. Strikingly, whole mount in situ hybridization showed high expression of Atg12 transcripts specifically in nematoblasts, whereas Atg5 transcripts were found to be expressed strongly in budding region and growing buds. This study may provide a framework to understand the evolution of autophagy networks in higher eukaryotes.
Human Intellectual Disability Genes Form Conserved Functional Modules in Drosophila
Oortveld, Merel A. W.; Keerthikumar, Shivakumar; Oti, Martin; Nijhof, Bonnie; Fernandes, Ana Clara; Kochinke, Korinna; Castells-Nobau, Anna; van Engelen, Eva; Ellenkamp, Thijs; Eshuis, Lilian; Galy, Anne; van Bokhoven, Hans; Habermann, Bianca; Brunner, Han G.; Zweier, Christiane; Verstreken, Patrik; Huynen, Martijn A.; Schenck, Annette
2013-01-01
Intellectual Disability (ID) disorders, defined by an IQ below 70, are genetically and phenotypically highly heterogeneous. Identification of common molecular pathways underlying these disorders is crucial for understanding the molecular basis of cognition and for the development of therapeutic intervention strategies. To systematically establish their functional connectivity, we used transgenic RNAi to target 270 ID gene orthologs in the Drosophila eye. Assessment of neuronal function in behavioral and electrophysiological assays and multiparametric morphological analysis identified phenotypes associated with knockdown of 180 ID gene orthologs. Most of these genotype-phenotype associations were novel. For example, we uncovered 16 genes that are required for basal neurotransmission and have not previously been implicated in this process in any system or organism. ID gene orthologs with morphological eye phenotypes, in contrast to genes without phenotypes, are relatively highly expressed in the human nervous system and are enriched for neuronal functions, suggesting that eye phenotyping can distinguish different classes of ID genes. Indeed, grouping genes by Drosophila phenotype uncovered 26 connected functional modules. Novel links between ID genes successfully predicted that MYCN, PIGV and UPF3B regulate synapse development. Drosophila phenotype groups show, in addition to ID, significant phenotypic similarity also in humans, indicating that functional modules are conserved. The combined data indicate that ID disorders, despite their extreme genetic diversity, are caused by disruption of a limited number of highly connected functional modules. PMID:24204314
Human intellectual disability genes form conserved functional modules in Drosophila.
Oortveld, Merel A W; Keerthikumar, Shivakumar; Oti, Martin; Nijhof, Bonnie; Fernandes, Ana Clara; Kochinke, Korinna; Castells-Nobau, Anna; van Engelen, Eva; Ellenkamp, Thijs; Eshuis, Lilian; Galy, Anne; van Bokhoven, Hans; Habermann, Bianca; Brunner, Han G; Zweier, Christiane; Verstreken, Patrik; Huynen, Martijn A; Schenck, Annette
2013-10-01
Intellectual Disability (ID) disorders, defined by an IQ below 70, are genetically and phenotypically highly heterogeneous. Identification of common molecular pathways underlying these disorders is crucial for understanding the molecular basis of cognition and for the development of therapeutic intervention strategies. To systematically establish their functional connectivity, we used transgenic RNAi to target 270 ID gene orthologs in the Drosophila eye. Assessment of neuronal function in behavioral and electrophysiological assays and multiparametric morphological analysis identified phenotypes associated with knockdown of 180 ID gene orthologs. Most of these genotype-phenotype associations were novel. For example, we uncovered 16 genes that are required for basal neurotransmission and have not previously been implicated in this process in any system or organism. ID gene orthologs with morphological eye phenotypes, in contrast to genes without phenotypes, are relatively highly expressed in the human nervous system and are enriched for neuronal functions, suggesting that eye phenotyping can distinguish different classes of ID genes. Indeed, grouping genes by Drosophila phenotype uncovered 26 connected functional modules. Novel links between ID genes successfully predicted that MYCN, PIGV and UPF3B regulate synapse development. Drosophila phenotype groups show, in addition to ID, significant phenotypic similarity also in humans, indicating that functional modules are conserved. The combined data indicate that ID disorders, despite their extreme genetic diversity, are caused by disruption of a limited number of highly connected functional modules.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.
Fu, Wenqing; O'Connor, Timothy D; Jun, Goo; Kang, Hyun Min; Abecasis, Goncalo; Leal, Suzanne M; Gabriel, Stacey; Rieder, Mark J; Altshuler, David; Shendure, Jay; Nickerson, Deborah A; Bamshad, Michael J; Akey, Joshua M
2013-01-10
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.
HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tu, Q.; Deng, Ye; Lin, Lu
Microbiomes play very important roles in terms of nutrition, health and disease by interacting with their hosts. Based on sequence data currently available in public domains, we have developed a functional gene array to monitor both organismal and functional gene profiles of normal microbiota in human and mouse hosts, and such an array is called human and mouse microbiota array, HMM-Chip. First, seed sequences were identified from KEGG databases, and used to construct a seed database (seedDB) containing 136 gene families in 19 metabolic pathways closely related to human and mouse microbiomes. Second, a mother database (motherDB) was constructed withmore » 81 genomes of bacterial strains with 54 from gut and 27 from oral environments, and 16 metagenomes, and used for selection of genes and probe design. Gene prediction was performed by Glimmer3 for bacterial genomes, and by the Metagene program for metagenomes. In total, 228,240 and 801,599 genes were identified for bacterial genomes and metagenomes, respectively. Then the motherDB was searched against the seedDB using the HMMer program, and gene sequences in the motherDB that were highly homologous with seed sequences in the seedDB were used for probe design by the CommOligo software. Different degrees of specific probes, including gene-specific, inclusive and exclusive group-specific probes were selected. All candidate probes were checked against the motherDB and NCBI databases for specificity. Finally, 7,763 probes covering 91.2percent (12,601 out of 13,814) HMMer confirmed sequences from 75 bacterial genomes and 16 metagenomes were selected. This developed HMM-Chip is able to detect the diversity and abundance of functional genes, the gene expression of microbial communities, and potentially, the interactions of microorganisms and their hosts.« less
Molecular cloning and characterization of chitinase genes from Candida albicans.
McCreath, K J; Specht, C A; Robbins, P W
1995-03-28
Chitinase (EC 3.2.1.14) is an important enzyme for the remodeling of chitin in the cell wall of fungi. We have cloned three chitinase genes (CHT1, CHT2, and CHT3) from the dimorphic human pathogen Candida albicans. CHT2 and CHT3 have been sequenced in full and their primary structures have been analyzed: CHT2 encodes a protein of 583 aa with a predicted size of 60.8 kDa; CHT3 encodes a protein of 567 aa with a predicted size of 60 kDa. All three genes show striking similarity to other chitinase genes in the literature, especially in the proposed catalytic domain. Transcription of CHT2 and CHT3 was greater when C. albicans was grown in a yeast phase as compared to a mycelial phase. A transcript of CHT1 could not be detected in either growth condition.
Smoking-related microRNAs and mRNAs in human peripheral blood mononuclear cells
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Ming-Wei
Teenager smoking is of great importance in public health. Functional roles of microRNAs have been documented in smoke-induced gene expression changes, but comprehensive mechanisms of microRNA-mRNA regulation and benefits remained poorly understood. We conducted the Teenager Smoking Reduction Trial (TSRT) to investigate the causal association between active smoking reduction and whole-genome microRNA and mRNA expression changes in human peripheral blood mononuclear cells (PBMC). A total of 12 teenagers with a substantial reduction in smoke quantity and a decrease in urine cotinine/creatinine ratio were enrolled in genomic analyses. In Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis (IPA), differentially expressedmore » genes altered by smoke reduction were mainly associated with glucocorticoid receptor signaling pathway. The integrative analysis of microRNA and mRNA found eleven differentially expressed microRNAs negatively correlated with predicted target genes. CD83 molecule regulated by miR-4498 in human PBMC, was critical for the canonical pathway of communication between innate and adaptive immune cells. Our data demonstrated that microRNAs could regulate immune responses in human PBMC after habitual smokers quit smoking and support the potential translational value of microRNAs in regulating disease-relevant gene expression caused by tobacco smoke. - Highlights: • We conducted a smoke reduction trial program and investigated the causal relationship between smoke and gene regulation. • MicroRNA and mRNA expression changes were examined in human PBMC. • MicroRNAs are important in regulating disease-causal genes after tobacco smoke reduction.« less
Weng, Kai; Hu, Haiyang; Xu, Augix Guohua; Khaitovich, Philipp; Somel, Mehmet
2012-01-01
Background Humans have a widely different diet from other primate species, and are dependent on its high nutritional content. The molecular mechanisms responsible for adaptation to the human diet are currently unknown. Here, we addressed this question by investigating whether the gene expression response observed in mice fed human and chimpanzee diets involves the same regulatory mechanisms as expression differences between humans and chimpanzees. Results Using mouse and primate transcriptomic data, we identified the transcription factor EGR1 (early growth response 1) as a putative regulator of diet-related differential gene expression between human and chimpanzee livers. Specifically, we predict that EGR1 regulates the response to the high caloric content of human diets. However, we also show that close to 90% of the dietary response to the primate diet found in mice, is not observed in primates. This might be explained by changes in tissue-specific gene expression between taxa. Conclusion Our results suggest that the gene expression response to the nutritionally rich human diet is partially mediated by the transcription factor EGR1. While this EGR1-driven response is conserved between mice and primates, the bulk of the mouse response to human and chimpanzee dietary differences is not observed in primates. This result highlights the rapid evolution of diet-related expression regulation and underscores potential limitations of mouse models in dietary studies. PMID:22937124
Martínez, Noelia; Luque, Roberto; Milani, Christian; Ventura, Marco; Bañuelos, Oscar; Margolles, Abelardo
2018-05-15
Bifidobacteria are mutualistic intestinal bacteria, and their presence in the human gut has been associated with health-promoting activities. The presence of antibiotic resistance genes in this genus is controversial, since, although bifidobacteria are nonpathogenic microorganisms, they could serve as reservoirs of resistance determinants for intestinal pathogens. However, until now, few antibiotic resistance determinants have been functionally characterized in this genus. In this work, we show that Bifidobacterium breve CECT7263 displays atypical resistance to erythromycin and clindamycin. In order to delimit the genomic region responsible for the observed resistance phenotype, a library of genomic DNA was constructed and a fragment of 5.8 kb containing a gene homologous to rRNA methylase genes was able to confer erythromycin resistance in Escherichia coli This genomic region seems to be very uncommon, and homologs of the gene have been detected in only one strain of Bifidobacterium longum and two other strains of B. breve In this context, analysis of shotgun metagenomics data sets revealed that the gene is also uncommon in the microbiomes of adults and infants. The structural gene and its upstream region were cloned into a B. breve -sensitive strain, which became resistant after acquiring the genetic material. In vitro conjugation experiments did not allow us to detect gene transfer to other recipients. Nevertheless, prediction of genes potentially acquired through horizontal gene transfer events revealed that the gene is located in a putative genomic island. IMPORTANCE Bifidobacterium breve is a very common human intestinal bacterium. Often described as a pioneer microorganism in the establishment of early-life intestinal microbiota, its presence has been associated with several beneficial effects for the host, including immune stimulation and protection against infections. Therefore, some strains of this species are considered probiotics. In relation to this, because probiotic bacteria are used for human and animal consumption, one of the safety concerns over these bacteria is the presence of antibiotic resistance genes, since the human gut is a densely populated habitat that could favor the transfer of genetic material to potential pathogens. In this study, we analyzed the genetic basis responsible for the erythromycin and clindamycin resistance phenotype of B. breve CECT7263. We were able to identify and characterize a novel gene homologous to rRNA methylase genes which confers erythromycin and clindamycin resistance. This gene seems to be very uncommon in other bifidobacteria and in the gut microbiomes of both adults and infants. Even though conjugation experiments showed the absence of transferability under in vitro conditions, it has been predicted to be located in a putative genomic island recently acquired by specific bifidobacterial strains. Copyright © 2018 American Society for Microbiology.
Weidner, Christopher; Steinfath, Matthias; Wistorf, Elisa; Oelgeschläger, Michael; Schneider, Marlon R; Schönfelder, Gilbert
2017-08-16
Recent studies that compared transcriptomic datasets of human diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. A major reason for the discrepancies between different gene expression analyses is the arbitrary filtering of differentially expressed genes. Furthermore, the comparison of single genes between different species and platforms often is limited by technical variance, leading to misinterpretation of the con/discordance between data from human and animal models. Thus, standardized approaches for systematic data analysis are needed. To overcome subjective gene filtering and ineffective gene-to-gene comparisons, we recently demonstrated that gene set enrichment analysis (GSEA) has the potential to avoid these problems. Therefore, we developed a standardized protocol for the use of GSEA to distinguish between appropriate and inappropriate animal models for translational research. This protocol is not suitable to predict how to design new model systems a-priori, as it requires existing experimental omics data. However, the protocol describes how to interpret existing data in a standardized manner in order to select the most suitable animal model, thus avoiding unnecessary animal experiments and misleading translational studies.
Schadt, Eric E; Edwards, Stephen W; GuhaThakurta, Debraj; Holder, Dan; Ying, Lisa; Svetnik, Vladimir; Leonardson, Amy; Hart, Kyle W; Russell, Archie; Li, Guoya; Cavet, Guy; Castle, John; McDonagh, Paul; Kan, Zhengyan; Chen, Ronghua; Kasarskis, Andrew; Margarint, Mihai; Caceres, Ramon M; Johnson, Jason M; Armour, Christopher D; Garrett-Engele, Philip W; Tsinoremas, Nicholas F; Shoemaker, Daniel D
2004-01-01
Background Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22. Results The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale. Conclusions These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized. PMID:15461792
Equine performance genes and the future of doping in horseracing.
Wilkin, Tessa; Baoutina, Anna; Hamilton, Natasha
2017-09-01
A horse's success on the racetrack is determined by genetics, training and nutrition, and their translation into physical traits such as speed, endurance and muscle strength. Advances in genetic technologies are slowly explaining the roles of specific genes in equine performance, and offering new insights into the development of novel therapies for diseases and musculoskeletal injuries that cause early retirement of many racehorses. Gene therapy approaches may also soon provide new means to artificially enhance the physical performance of racehorses. Gene doping, the misuse of gene therapies for performance enhancement, is predicted to be the next phase of doping faced by horseracing. The risk of gene doping to human sports has been recognised for almost 15 years, and the introduction of the first gene doping detection tests for doping control in human athletes is imminent. Gene doping is also a threat to horseracing, but there are currently no methods to detect it. Efficient and accurate detection methods need to be developed to deter those looking to use gene doping in horses and to maintain the integrity of the sport. Methods developed for human athletes could offer an avenue for detection in racehorses. Development of an equine equivalent test will first require identification of equine genes that will likely be targeted by gene doping attempts. This review focuses on genes that have been linked to athletic performance in horses and, therefore, could be targeted for genetic manipulation. The risks associated with gene doping and approaches to detect gene doping are also discussed. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Josset, Laurence; Menachery, Vineet D; Gralinski, Lisa E; Agnihothram, Sudhakar; Sova, Pavel; Carter, Victoria S; Yount, Boyd L; Graham, Rachel L; Baric, Ralph S; Katze, Michael G
2013-04-30
A novel human coronavirus (HCoV-EMC) was recently identified in the Middle East as the causative agent of a severe acute respiratory syndrome (SARS) resembling the illness caused by SARS coronavirus (SARS-CoV). Although derived from the CoV family, the two viruses are genetically distinct and do not use the same receptor. Here, we investigated whether HCoV-EMC and SARS-CoV induce similar or distinct host responses after infection of a human lung epithelial cell line. HCoV-EMC was able to replicate as efficiently as SARS-CoV in Calu-3 cells and similarly induced minimal transcriptomic changes before 12 h postinfection. Later in infection, HCoV-EMC induced a massive dysregulation of the host transcriptome, to a much greater extent than SARS-CoV. Both viruses induced a similar activation of pattern recognition receptors and the interleukin 17 (IL-17) pathway, but HCoV-EMC specifically down-regulated the expression of several genes within the antigen presentation pathway, including both type I and II major histocompatibility complex (MHC) genes. This could have an important impact on the ability of the host to mount an adaptive host response. A unique set of 207 genes was dysregulated early and permanently throughout infection with HCoV-EMC, and was used in a computational screen to predict potential antiviral compounds, including kinase inhibitors and glucocorticoids. Overall, HCoV-EMC and SARS-CoV elicit distinct host gene expression responses, which might impact in vivo pathogenesis and could orient therapeutic strategies against that emergent virus. Identification of a novel coronavirus causing fatal respiratory infection in humans raises concerns about a possible widespread outbreak of severe respiratory infection similar to the one caused by SARS-CoV. Using a human lung epithelial cell line and global transcriptomic profiling, we identified differences in the host response between HCoV-EMC and SARS-CoV. This enables rapid assessment of viral properties and the ability to anticipate possible differences in human clinical responses to HCoV-EMC and SARS-CoV. We used this information to predict potential effective drugs against HCoV-EMC, a method that could be more generally used to identify candidate therapeutics in future disease outbreaks. These data will help to generate hypotheses and make rapid advancements in characterizing this new virus.
Whole-Genome Thermodynamic Analysis Reduces siRNA Off-Target Effects
Chen, Xi; Liu, Peng; Chou, Hui-Hsien
2013-01-01
Small interfering RNAs (siRNAs) are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design. PMID:23484018
WormQTLHD—a web database for linking human disease to natural variation data in C. elegans
van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915
Wang, Junbai; Wu, Qianqian; Hu, Xiaohua Tony; Tian, Tianhai
2016-11-01
Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation. Copyright © 2016 Elsevier Inc. All rights reserved.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.
Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia
2015-01-01
Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Genetic risk prediction and neurobiological understanding of alcoholism.
Levey, D F; Le-Niculescu, H; Frank, J; Ayalew, M; Jain, N; Kirlin, B; Learman, R; Winiger, E; Rodd, Z; Shekhar, A; Schork, N; Kiefer, F; Kiefe, F; Wodarz, N; Müller-Myhsok, B; Dahmen, N; Nöthen, M; Sherva, R; Farrer, L; Smith, A H; Kranzler, H R; Rietschel, M; Gelernter, J; Niculescu, A B
2014-05-20
We have used a translational Convergent Functional Genomics (CFG) approach to discover genes involved in alcoholism, by gene-level integration of genome-wide association study (GWAS) data from a German alcohol dependence cohort with other genetic and gene expression data, from human and animal model studies, similar to our previous work in bipolar disorder and schizophrenia. A panel of all the nominally significant P-value SNPs in the top candidate genes discovered by CFG (n=135 genes, 713 SNPs) was used to generate a genetic risk prediction score (GRPS), which showed a trend towards significance (P=0.053) in separating alcohol dependent individuals from controls in an independent German test cohort. We then validated and prioritized our top findings from this discovery work, and subsequently tested them in three independent cohorts, from two continents. A panel of all the nominally significant P-value single-nucleotide length polymorphisms (SNPs) in the top candidate genes discovered by CFG (n=135 genes, 713 SNPs) were used to generate a Genetic Risk Prediction Score (GRPS), which showed a trend towards significance (P=0.053) in separating alcohol-dependent individuals from controls in an independent German test cohort. In order to validate and prioritize the key genes that drive behavior without some of the pleiotropic environmental confounds present in humans, we used a stress-reactive animal model of alcoholism developed by our group, the D-box binding protein (DBP) knockout mouse, consistent with the surfeit of stress theory of addiction proposed by Koob and colleagues. A much smaller panel (n=11 genes, 66 SNPs) of the top CFG-discovered genes for alcoholism, cross-validated and prioritized by this stress-reactive animal model showed better predictive ability in the independent German test cohort (P=0.041). The top CFG scoring gene for alcoholism from the initial discovery step, synuclein alpha (SNCA) remained the top gene after the stress-reactive animal model cross-validation. We also tested this small panel of genes in two other independent test cohorts from the United States, one with alcohol dependence (P=0.00012) and one with alcohol abuse (a less severe form of alcoholism; P=0.0094). SNCA by itself was able to separate alcoholics from controls in the alcohol-dependent cohort (P=0.000013) and the alcohol abuse cohort (P=0.023). So did eight other genes from the panel of 11 genes taken individually, albeit to a lesser extent and/or less broadly across cohorts. SNCA, GRM3 and MBP survived strict Bonferroni correction for multiple comparisons. Taken together, these results suggest that our stress-reactive DBP animal model helped to validate and prioritize from the CFG-discovered genes some of the key behaviorally relevant genes for alcoholism. These genes fall into a series of biological pathways involved in signal transduction, transmission of nerve impulse (including myelination) and cocaine addiction. Overall, our work provides leads towards a better understanding of illness, diagnostics and therapeutics, including treatment with omega-3 fatty acids. We also examined the overlap between the top candidate genes for alcoholism from this work and the top candidate genes for bipolar disorder, schizophrenia, anxiety from previous CFG analyses conducted by us, as well as cross-tested genetic risk predictions. This revealed the significant genetic overlap with other major psychiatric disorder domains, providing a basis for comorbidity and dual diagnosis, and placing alcohol use in the broader context of modulating the mental landscape.
HomoTarget: a new algorithm for prediction of microRNA targets in Homo sapiens.
Ahmadi, Hamed; Ahmadi, Ali; Azimzadeh-Jamalkandi, Sadegh; Shoorehdeli, Mahdi Aliyari; Salehzadeh-Yazdi, Ali; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-02-01
MiRNAs play an essential role in the networks of gene regulation by inhibiting the translation of target mRNAs. Several computational approaches have been proposed for the prediction of miRNA target-genes. Reports reveal a large fraction of under-predicted or falsely predicted target genes. Thus, there is an imperative need to develop a computational method by which the target mRNAs of existing miRNAs can be correctly identified. In this study, combined pattern recognition neural network (PRNN) and principle component analysis (PCA) architecture has been proposed in order to model the complicated relationship between miRNAs and their target mRNAs in humans. The results of several types of intelligent classifiers and our proposed model were compared, showing that our algorithm outperformed them with higher sensitivity and specificity. Using the recent release of the mirBase database to find potential targets of miRNAs, this model incorporated twelve structural, thermodynamic and positional features of miRNA:mRNA binding sites to select target candidates. Copyright © 2012 Elsevier Inc. All rights reserved.
Constructing an integrated gene similarity network for the identification of disease genes.
Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin
2017-09-20
Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Majoros, William H.; Campbell, Michael S.; Holt, Carson; DeNardo, Erin K.; Ware, Doreen; Allen, Andrew S.; Yandell, Mark; Reddy, Timothy E.
2017-01-01
Abstract Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype–phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online. PMID:28011790
Majoros, William H; Campbell, Michael S; Holt, Carson; DeNardo, Erin K; Ware, Doreen; Allen, Andrew S; Yandell, Mark; Reddy, Timothy E
2017-05-15
The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. myandell@genetics.utah.edu or tim.reddy@duke.edu. Supplementary information is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
MicroRNA regulation of human protease genes essential for influenza virus replication.
Meliopoulos, Victoria A; Andersen, Lauren E; Brooks, Paula; Yan, Xiuzhen; Bakre, Abhijeet; Coleman, J Keegan; Tompkins, S Mark; Tripp, Ralph A
2012-01-01
Influenza A virus causes seasonal epidemics and periodic pandemics threatening the health of millions of people each year. Vaccination is an effective strategy for reducing morbidity and mortality, and in the absence of drug resistance, the efficacy of chemoprophylaxis is comparable to that of vaccines. However, the rapid emergence of drug resistance has emphasized the need for new drug targets. Knowledge of the host cell components required for influenza replication has been an area targeted for disease intervention. In this study, the human protease genes required for influenza virus replication were determined and validated using RNA interference approaches. The genes validated as critical for influenza virus replication were ADAMTS7, CPE, DPP3, MST1, and PRSS12, and pathway analysis showed these genes were in global host cell pathways governing inflammation (NF-κB), cAMP/calcium signaling (CRE/CREB), and apoptosis. Analyses of host microRNAs predicted to govern expression of these genes showed that eight miRNAs regulated gene expression during virus replication. These findings identify unique host genes and microRNAs important for influenza replication providing potential new targets for disease intervention strategies.
Voss, Joachim G.; Dobra, Adrian; Morse, Caryn; Kovacs, Joseph A.; Danner, Robert L.; Munson, Peter J.; Logan, Carolea; Rangel, Zoila; Adelsberger, Joseph W.; McLaughlin, Mary; Adams, Larry D.; Raju, Raghavan; Dalakas, Marinos C.
2016-01-01
Purpose Human immunodeficiency virus (HIV)–related fatigue (HRF) is multicausal and potentially related to mitochondrial dysfunction caused by antiretroviral therapy with nucleoside reverse transcriptase inhibitors (NRTIs). Methodology The authors compared gene expression profiles of CD14+ cells of low versus high fatigued, NRTI-treated HIV patients to healthy controls (n = 5/group). The authors identified 32 genes predictive of low versus high fatigue and 33 genes predictive of healthy versus HIV infection. The authors constructed genetic networks to further elucidate the possible biological pathways in which these genes are involved. Relevance for nursing practice Genes including the actin cytoskeletal regulatory proteins Prokineticin 2 and Cofilin 2 along with mitochondrial inner membrane proteins are involved in multiple pathways and were predictors of fatigue status. Previously identified inflammatory and signaling genes were predictive of HIV status, clearly confirming our results and suggesting a possible further connection between mitochondrial function and HIV. Isolated CD14+ cells are easily accessible cells that could be used for further study of the connection between fatigue and mitochondrial function of HIV patients. Implication for Practice The findings from this pilot study take us one step closer to identifying biomarker targets for fatigue status and mitochondrial dysfunction. Specific biomarkers will be pertinent to the development of methodologies to diagnosis, monitor, and treat fatigue and mitochondrial dysfunction. PMID:23324479
Schroeder, Kari B; McElreath, Richard; Nettle, Daniel
2013-03-05
Punishment of free-riding has been implicated in the evolution of cooperation in humans, and yet mechanisms for punishment avoidance remain largely uninvestigated. Individual variation in these mechanisms may stem from variation in the serotonergic system, which modulates processing of aversive stimuli. Functional serotonin gene variants have been associated with variation in the processing of aversive stimuli and widely studied as risk factors for psychiatric disorders. We show that variants at the serotonin transporter gene (SLC6A4) and serotonin 2A receptor gene (HTR2A) predict contributions to the public good in economic games, dependent upon whether contribution behavior can be punished. Participants with a variant at the serotonin transporter gene contribute more, leading to group-level differences in cooperation, but this effect dissipates in the presence of punishment. When contribution behavior can be punished, those with a variant at the serotonin 2A receptor gene contribute more than those without it. This variant also predicts a more stressful experience of the games. The diversity of institutions (including norms) that govern cooperation and punishment may create selective pressures for punishment avoidance that change rapidly across time and space. Variant-specific epigenetic regulation of these genes, as well as population-level variation in the frequencies of these variants, may facilitate adaptation to local norms of cooperation and punishment.
Schroeder, Kari B.; McElreath, Richard; Nettle, Daniel
2013-01-01
Punishment of free-riding has been implicated in the evolution of cooperation in humans, and yet mechanisms for punishment avoidance remain largely uninvestigated. Individual variation in these mechanisms may stem from variation in the serotonergic system, which modulates processing of aversive stimuli. Functional serotonin gene variants have been associated with variation in the processing of aversive stimuli and widely studied as risk factors for psychiatric disorders. We show that variants at the serotonin transporter gene (SLC6A4) and serotonin 2A receptor gene (HTR2A) predict contributions to the public good in economic games, dependent upon whether contribution behavior can be punished. Participants with a variant at the serotonin transporter gene contribute more, leading to group-level differences in cooperation, but this effect dissipates in the presence of punishment. When contribution behavior can be punished, those with a variant at the serotonin 2A receptor gene contribute more than those without it. This variant also predicts a more stressful experience of the games. The diversity of institutions (including norms) that govern cooperation and punishment may create selective pressures for punishment avoidance that change rapidly across time and space. Variant-specific epigenetic regulation of these genes, as well as population-level variation in the frequencies of these variants, may facilitate adaptation to local norms of cooperation and punishment. PMID:23431136
Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources
Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel
2013-01-01
An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964
Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja
2013-02-14
Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter
2014-09-24
Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.
Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; Carmona e Ferreira, Renata; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana
2013-12-01
We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3'-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment.
Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana
2013-01-01
We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904
Yang, Xiaofei; Gao, Lin; Guo, Xingli; Shi, Xinghua; Wu, Hao; Song, Fei; Wang, Bingbo
2014-01-01
Increasing evidence has indicated that long non-coding RNAs (lncRNAs) are implicated in and associated with many complex human diseases. Despite of the accumulation of lncRNA-disease associations, only a few studies had studied the roles of these associations in pathogenesis. In this paper, we investigated lncRNA-disease associations from a network view to understand the contribution of these lncRNAs to complex diseases. Specifically, we studied both the properties of the diseases in which the lncRNAs were implicated, and that of the lncRNAs associated with complex diseases. Regarding the fact that protein coding genes and lncRNAs are involved in human diseases, we constructed a coding-non-coding gene-disease bipartite network based on known associations between diseases and disease-causing genes. We then applied a propagation algorithm to uncover the hidden lncRNA-disease associations in this network. The algorithm was evaluated by leave-one-out cross validation on 103 diseases in which at least two genes were known to be involved, and achieved an AUC of 0.7881. Our algorithm successfully predicted 768 potential lncRNA-disease associations between 66 lncRNAs and 193 diseases. Furthermore, our results for Alzheimer's disease, pancreatic cancer, and gastric cancer were verified by other independent studies. PMID:24498199
Lentiviral vector-based insertional mutagenesis identifies genes associated with liver cancer
Ranzani, Marco; Cesana, Daniela; Bartholomae, Cynthia C.; Sanvito, Francesca; Pala, Mauro; Benedicenti, Fabrizio; Gallina, Pierangela; Sergi, Lucia Sergi; Merella, Stefania; Bulfone, Alessandro; Doglioni, Claudio; von Kalle, Christof; Kim, Yoon Jun; Schmidt, Manfred; Tonon, Giovanni; Naldini, Luigi; Montini, Eugenio
2013-01-01
Transposons and γ-retroviruses have been efficiently used as insertional mutagens in different tissues to identify molecular culprits of cancer. However, these systems are characterized by recurring integrations that accumulate in tumor cells, hampering the identification of early cancer-driving events amongst bystander and progression-related events. We developed an insertional mutagenesis platform based on lentiviral vectors (LVV) by which we could efficiently induce hepatocellular carcinoma (HCC) in 3 different mouse models. By virtue of LVV’s replication-deficient nature and broad genome-wide integration pattern, LVV-based insertional mutagenesis allowed identification of 4 new liver cancer genes from a limited number of integrations. We validated the oncogenic potential of all the identified genes in vivo, with different levels of penetrance. Our newly identified cancer genes are likely to play a role in human disease, since they are upregulated and/or amplified/deleted in human HCCs and can predict clinical outcome of patients. PMID:23314173
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.
2017-01-01
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623
Shotgun metaproteomics of the human distal gut microbiota
DOE Office of Scientific and Technical Information (OSTI.GOV)
VerBerkmoes, N.C.; Russell, A.L.; Shah, M.
2008-10-15
The human gut contains a dense, complex and diverse microbial community, comprising the gut microbiome. Metagenomics has recently revealed the composition of genes in the gut microbiome, but provides no direct information about which genes are expressed or functioning. Therefore, our goal was to develop a novel approach to directly identify microbial proteins in fecal samples to gain information about the genes expressed and about key microbial functions in the human gut. We used a non-targeted, shotgun mass spectrometry-based whole community proteomics, or metaproteomics, approach for the first deep proteome measurements of thousands of proteins in human fecal samples, thusmore » demonstrating this approach on the most complex sample type to date. The resulting metaproteomes had a skewed distribution relative to the metagenome, with more proteins for translation, energy production and carbohydrate metabolism when compared to what was earlier predicted from metagenomics. Human proteins, including antimicrobial peptides, were also identified, providing a non-targeted glimpse of the host response to the microbiota. Several unknown proteins represented previously undescribed microbial pathways or host immune responses, revealing a novel complex interplay between the human host and its associated microbes.« less
Creighton, Chad J; Hernandez-Herrera, Anadulce; Jacobsen, Anders; Levine, Douglas A; Mankoo, Parminder; Schultz, Nikolaus; Du, Ying; Zhang, Yiqun; Larsson, Erik; Sheridan, Robert; Xiao, Weimin; Spellman, Paul T; Getz, Gad; Wheeler, David A; Perou, Charles M; Gibbs, Richard A; Sander, Chris; Hayes, D Neil; Gunaratne, Preethi H
2012-01-01
The Cancer Genome Atlas (TCGA) Network recently comprehensively catalogued the molecular aberrations in 487 high-grade serous ovarian cancers, with much remaining to be elucidated regarding the microRNAs (miRNAs). Here, using TCGA ovarian data, we surveyed the miRNAs, in the context of their predicted gene targets. Integration of miRNA and gene patterns yielded evidence that proximal pairs of miRNAs are processed from polycistronic primary transcripts, and that intronic miRNAs and their host gene mRNAs derive from common transcripts. Patterns of miRNA expression revealed multiple tumor subtypes and a set of 34 miRNAs predictive of overall patient survival. In a global analysis, miRNA:mRNA pairs anti-correlated in expression across tumors showed a higher frequency of in silico predicted target sites in the mRNA 3'-untranslated region (with less frequency observed for coding sequence and 5'-untranslated regions). The miR-29 family and predicted target genes were among the most strongly anti-correlated miRNA:mRNA pairs; over-expression of miR-29a in vitro repressed several anti-correlated genes (including DNMT3A and DNMT3B) and substantially decreased ovarian cancer cell viability. This study establishes miRNAs as having a widespread impact on gene expression programs in ovarian cancer, further strengthening our understanding of miRNA biology as it applies to human cancer. As with gene transcripts, miRNAs exhibit high diversity reflecting the genomic heterogeneity within a clinically homogeneous disease population. Putative miRNA:mRNA interactions, as identified using integrative analysis, can be validated. TCGA data are a valuable resource for the identification of novel tumor suppressive miRNAs in ovarian as well as other cancers.
Stevens, Stewart G.; Brown, Chris M
2013-01-01
Recently large scale transcriptome and proteome datasets for human cells have become available. A striking finding from these studies is that the level of an mRNA typically predicts no more than 40% of the abundance of protein. This correlation represents the overall figure for all genes. We present here a bioinformatic analysis of translation efficiency – the rate at which mRNA is translated into protein. We have analysed those human datasets that include genome wide mRNA and protein levels determined in the same study. The analysis comprises five distinct human cell lines that together provide comparable data for 8,170 genes. For each gene we have used levels of mRNA and protein combined with protein stability data from the HeLa cell line to estimate translation efficiency. This was possible for 3,990 genes in one or more cell lines and 1,807 genes in all five cell lines. Interestingly, our analysis and modelling shows that for many genes this estimated translation efficiency has considerable consistency between cell lines. Some deviations from this consistency likely result from the regulation of protein degradation. Others are likely due to known translational control mechanisms. These findings suggest it will be possible to build improved models for the interpretation of mRNA expression data. The results we present here provide a view of translation efficiency for many genes. We provide an online resource allowing the exploration of translation efficiency in genes of interest within different cell lines (http://bioanalysis.otago.ac.nz/TranslationEfficiency). PMID:23460887
Zhang, J D; Berntenis, N; Roth, A; Ebeling, M
2014-06-01
Gene signatures of drug-induced toxicity are of broad interest, but they are often identified from small-scale, single-time point experiments, and are therefore of limited applicability. To address this issue, we performed multivariate analysis of gene expression, cell-based assays, and histopathological data in the TG-GATEs (Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system) database. Data mining highlights four genes-EGR1, ATF3, GDF15 and FGF21-that are induced 2 h after drug administration in human and rat primary hepatocytes poised to eventually undergo cytotoxicity-induced cell death. Modelling and simulation reveals that these early stress-response genes form a functional network with evolutionarily conserved structure and intrinsic dynamics. This is underlined by the fact that early induction of this network in vivo predicts drug-induced liver and kidney pathology with high accuracy. Our findings demonstrate the value of early gene-expression signatures in predicting and understanding compound-induced toxicity. The identified network can empower first-line tests that reduce animal use and costs of safety evaluation.
Open source machine-learning algorithms for the prediction of optimal cancer drug therapies.
Huang, Cai; Mezencev, Roman; McDonald, John F; Vannberg, Fredrik
2017-01-01
Precision medicine is a rapidly growing area of modern medical science and open source machine-learning codes promise to be a critical component for the successful development of standardized and automated analysis of patient data. One important goal of precision cancer medicine is the accurate prediction of optimal drug therapies from the genomic profiles of individual patient tumors. We introduce here an open source software platform that employs a highly versatile support vector machine (SVM) algorithm combined with a standard recursive feature elimination (RFE) approach to predict personalized drug responses from gene expression profiles. Drug specific models were built using gene expression and drug response data from the National Cancer Institute panel of 60 human cancer cell lines (NCI-60). The models are highly accurate in predicting the drug responsiveness of a variety of cancer cell lines including those comprising the recent NCI-DREAM Challenge. We demonstrate that predictive accuracy is optimized when the learning dataset utilizes all probe-set expression values from a diversity of cancer cell types without pre-filtering for genes generally considered to be "drivers" of cancer onset/progression. Application of our models to publically available ovarian cancer (OC) patient gene expression datasets generated predictions consistent with observed responses previously reported in the literature. By making our algorithm "open source", we hope to facilitate its testing in a variety of cancer types and contexts leading to community-driven improvements and refinements in subsequent applications.
Makeyev, A V; Liebhaber, S A
2000-08-01
We have identified two novel human genes encoding proteins with a high level of sequence identity to two previously characterized RNA-binding proteins, alphaCP-1 and alphaCP-2. Both of these novel genes, alphaCP-3 and alphaCP-4, are predicted to encode proteins with triplicated KH domains. The number and organization of the KH domains, their sequences, and the sequences of the contiguous regions are conserved among all four alphaCP proteins. The common evolutionary origin of these proteins is substantiated by conservation of exon-intron organization in the corresponding genes. The map positions of alphaCP-1 and alphaCP-2 (previously reported) and those of alphaCP-3 and alphaCP-4 (present report) reveal that the four alphaCP loci are dispersed in the human genome; alphaCP-3 and alphaCP-4 mapped to 21q22.3 and 3p21, and the respective mouse orthologues mapped to syntenic regions of the mouse genome, 10B5 and 9F1-F2, respectively. Two additional loci in the human genome were identified as alphaCP-2 processed pseudogenes (PCBP2P1, 21q22.3, and PCBP2P2, 8q21-q22). Although the overall levels of alphaCP-3 and alphaCP-4 mRNAs are substantially lower than those of alphaCP-1 and alphaCP-2, transcripts of alphaCP-3 and alphaCP-4 were found in all mouse tissues tested. These data establish a new subfamily of genes predicted to encode closely related KH-containing RNA-binding proteins with potential functions in posttranscriptional controls. Copyright 2000 Academic Press.
Gufford, Brandon T; Robarge, Jason D; Eadon, Michael T; Gao, Hongyu; Lin, Hai; Liu, Yunlong; Desta, Zeruesenay; Skaar, Todd C
2018-04-01
Rifampin is a pleiotropic inducer of multiple drug metabolizing enzymes and transporters. This work utilized a global approach to evaluate rifampin effects on conjugating enzyme gene expression with relevance to human xeno- and endo-biotic metabolism. Primary human hepatocytes from 7 subjects were treated with rifampin (10 μmol/L, 24 hours). Standard methods for RNA-seq library construction, EZBead preparation, and NextGen sequencing were used to measure UDP-glucuronosyl transferase UGT, sulfonyltransferase SULT, N acetyltransferase NAT, and glutathione-S-transferase GST mRNA expression compared to vehicle control (0.01% MeOH). Rifampin-induced (>1.25-fold) mRNA expression of 13 clinically important phase II drug metabolizing genes and repressed (>1.25-fold) the expression of 3 genes ( P < .05). Rifampin-induced miRNA expression changes correlated with mRNA changes and miRNAs were identified that may modulate conjugating enzyme expression. NAT2 gene expression was most strongly repressed (1.3-fold) by rifampin while UGT1A4 and UGT1A1 genes were most strongly induced (7.9- and 4.8-fold, respectively). Physiologically based pharmacokinetic modeling (PBPK) was used to simulate the clinical consequences of rifampin induction of CYP3A4- and UGT1A4-mediated midazolam metabolism. Simulations evaluating isolated UGT1A4 induction predicted increased midazolam N-glucuronide exposure (~4-fold) with minimal reductions in parent midazolam exposure (~10%). Simulations accounting for simultaneous induction of both CYP3A4 and UGT1A4 predicted a ~10-fold decrease in parent midazolam exposure with only a ~2-fold decrease in midazolam N-glucuronide metabolite exposure. These data reveal differential effects of rifampin on the human conjugating enzyme transcriptome and potential associations with miRNAs that form the basis for future mechanistic studies to elucidate the interplay of conjugating enzyme regulatory elements.
Pathway connectivity and signaling coordination in the yeast stress-activated signaling network
Chasman, Deborah; Ho, Yi-Hsuan; Berry, David B; Nemec, Corey M; MacGilvray, Matthew E; Hose, James; Merrill, Anna E; Lee, M Violet; Will, Jessica L; Coon, Joshua J; Ansari, Aseem Z; Craven, Mark; Gasch, Audrey P
2014-01-01
Stressed cells coordinate a multi-faceted response spanning many levels of physiology. Yet knowledge of the complete stress-activated regulatory network as well as design principles for signal integration remains incomplete. We developed an experimental and computational approach to integrate available protein interaction data with gene fitness contributions, mutant transcriptome profiles, and phospho-proteome changes in cells responding to salt stress, to infer the salt-responsive signaling network in yeast. The inferred subnetwork presented many novel predictions by implicating new regulators, uncovering unrecognized crosstalk between known pathways, and pointing to previously unknown ‘hubs’ of signal integration. We exploited these predictions to show that Cdc14 phosphatase is a central hub in the network and that modification of RNA polymerase II coordinates induction of stress-defense genes with reduction of growth-related transcripts. We find that the orthologous human network is enriched for cancer-causing genes, underscoring the importance of the subnetwork's predictions in understanding stress biology. PMID:25411400
Kolata, Stefan; Light, Kenneth; Wass, Christopher D.; Colas-Zelin, Danielle; Roy, Debasri; Matzel, Louis D.
2010-01-01
Background Genetically heterogeneous mice express a trait that is qualitatively and psychometrically analogous to general intelligence in humans, and as in humans, this trait co-varies with the processing efficacy of working memory (including its dependence on selective attention). Dopamine signaling in the prefrontal cortex (PFC) has been established to play a critical role in animals' performance in both working memory and selective attention tasks. Owing to this role of the PFC in the regulation of working memory, here we compared PFC gene expression profiles of 60 genetically diverse CD-1 mice that exhibited a wide range of general learning abilities (i.e., aggregate performance across five diverse learning tasks). Methodology/Principal Findings Animals' general cognitive abilities were first determined based on their aggregate performance across a battery of five diverse learning tasks. With a procedure designed to minimize false positive identifications, analysis of gene expression microarrays (comprised of ≈25,000 genes) identified a small number (<20) of genes that were differentially expressed across animals that exhibited fast and slow aggregate learning abilities. Of these genes, one functional cluster was identified, and this cluster (Darpp-32, Drd1a, and Rgs9) is an established modulator of dopamine signaling. Subsequent quantitative PCR found that expression of these dopaminegic genes plus one vascular gene (Nudt6) were significantly correlated with individual animal's general cognitive performance. Conclusions/Significance These results indicate that D1-mediated dopamine signaling in the PFC, possibly through its modulation of working memory, is predictive of general cognitive abilities. Furthermore, these results provide the first direct evidence of specific molecular pathways that might potentially regulate general intelligence. PMID:21103339
Molecular mechanisms of system responses to novel stimuli are predictable from public data
Danziger, Samuel A.; Ratushny, Alexander V.; Smith, Jennifer J.; Saleem, Ramsey A.; Wan, Yakun; Arens, Christina E.; Armstrong, Abraham M.; Sitko, Katherine; Chen, Wei-Ming; Chiang, Jung-Hsien; Reiss, David J.; Baliga, Nitin S.; Aitchison, John D.
2014-01-01
Systems scale models provide the foundation for an effective iterative cycle between hypothesis generation, experiment and model refinement. Such models also enable predictions facilitating the understanding of biological complexity and the control of biological systems. Here, we demonstrate the reconstruction of a globally predictive gene regulatory model from public data: a model that can drive rational experiment design and reveal new regulatory mechanisms underlying responses to novel environments. Specifically, using ∼1500 publically available genome-wide transcriptome data sets from Saccharomyces cerevisiae, we have reconstructed an environment and gene regulatory influence network that accurately predicts regulatory mechanisms and gene expression changes on exposure of cells to completely novel environments. Focusing on transcriptional networks that induce peroxisomes biogenesis, the model-guided experiments allow us to expand a core regulatory network to include novel transcriptional influences and linkage across signaling and transcription. Thus, the approach and model provides a multi-scalar picture of gene dynamics and are powerful resources for exploiting extant data to rationally guide experimentation. The techniques outlined here are generally applicable to any biological system, which is especially important when experimental systems are challenging and samples are difficult and expensive to obtain—a common problem in laboratory animal and human studies. PMID:24185701
L1000CDS2: LINCS L1000 characteristic direction signatures search engine
Duan, Qiaonan; Reid, St Patrick; Clark, Neil R; Wang, Zichen; Fernandez, Nicolas F; Rouillard, Andrew D; Readhead, Ben; Tritsch, Sarah R; Hodos, Rachel; Hafner, Marc; Niepel, Mario; Sorger, Peter K; Dudley, Joel T; Bavari, Sina; Panchal, Rekha G; Ma’ayan, Avi
2016-01-01
The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS2. The L1000CDS2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS2, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource. PMID:28413689
Park, Christopher Y.; Krishnan, Arjun; Zhu, Qian; Wong, Aaron K.; Lee, Young-Suk; Troyanskaya, Olga G.
2015-01-01
Motivation: Leveraging the large compendium of genomic data to predict biomedical pathways and specific mechanisms of protein interactions genome-wide in metazoan organisms has been challenging. In contrast to unicellular organisms, biological and technical variation originating from diverse tissues and cell-lineages is often the largest source of variation in metazoan data compendia. Therefore, a new computational strategy accounting for the tissue heterogeneity in the functional genomic data is needed to accurately translate the vast amount of human genomic data into specific interaction-level hypotheses. Results: We developed an integrated, scalable strategy for inferring multiple human gene interaction types that takes advantage of data from diverse tissue and cell-lineage origins. Our approach specifically predicts both the presence of a functional association and also the most likely interaction type among human genes or its protein products on a whole-genome scale. We demonstrate that directly incorporating tissue contextual information improves the accuracy of our predictions, and further, that such genome-wide results can be used to significantly refine regulatory interactions from primary experimental datasets (e.g. ChIP-Seq, mass spectrometry). Availability and implementation: An interactive website hosting all of our interaction predictions is publically available at http://pathwaynet.princeton.edu. Software was implemented using the open-source Sleipnir library, which is available for download at https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org. Contact: ogt@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25431329
Efficient ablation of genes in human hematopoietic stem and effector cells using CRISPR/Cas9
Mandal, Pankaj K.; Ferreira, Leonardo M. R.; Collins, Ryan; Meissner, Torsten B.; Boutwell, Christian L.; Friesen, Max; Vrbanac, Vladimir; Garrison, Brian S.; Stortchevoi, Alexei; Bryder, David; Musunuru, Kiran; Brand, Harrison; Tager, Andrew M.; Allen, Todd M.; Talkowski, Michael E.; Rossi, Derrick J.; Cowan, Chad A.
2014-01-01
SUMMARY Genome editing via CRISPR/Cas9 has rapidly become the tool of choice by virtue of its efficacy and ease of use. However, CRISPR/Cas9 mediated genome editing in clinically relevant human somatic cells remains untested. Here, we report CRISPR/Cas9 targeting of two clinically relevant genes, B2M and CCR5, in primary human CD4+ T cells and CD34+ hematopoietic stem and progenitor cells (HSPCs). Use of single RNA guides led to highly efficient mutagenesis in HSPCs but not in T cells. A dual guide approach improved gene deletion efficacy in both cell types. HSPCs that had undergone genome editing with CRISPR/Cas9 retained multi-lineage potential. We examined predicted on- and off-target mutations via target capture sequencing in HSPCs and observed low levels of off-target mutagenesis at only one site. These results demonstrate that CRISPR/Cas9 can efficiently ablate genes in HSPCs with minimal off-target mutagenesis, which could have broad applicability for hematopoietic cell-based therapy. PMID:25517468
A draft annotation and overview of the human genome
Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo
2001-01-01
Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338
GIANT API: an application programming interface for functional genomics.
Roberts, Andrew M; Wong, Aaron K; Fisk, Ian; Troyanskaya, Olga G
2016-07-08
GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Advanced systems biology methods in drug discovery and translational biomedicine.
Zou, Jun; Zheng, Ming-Wu; Li, Gen; Su, Zhi-Guang
2013-01-01
Systems biology is in an exponential development stage in recent years and has been widely utilized in biomedicine to better understand the molecular basis of human disease and the mechanism of drug action. Here, we discuss the fundamental concept of systems biology and its two computational methods that have been commonly used, that is, network analysis and dynamical modeling. The applications of systems biology in elucidating human disease are highlighted, consisting of human disease networks, treatment response prediction, investigation of disease mechanisms, and disease-associated gene prediction. In addition, important advances in drug discovery, to which systems biology makes significant contributions, are discussed, including drug-target networks, prediction of drug-target interactions, investigation of drug adverse effects, drug repositioning, and drug combination prediction. The systems biology methods and applications covered in this review provide a framework for addressing disease mechanism and approaching drug discovery, which will facilitate the translation of research findings into clinical benefits such as novel biomarkers and promising therapies.
IFNL4 affects clearance of hepatitis C virus
Scientists have discovered a new human interferon gene, Interferon Lambda 4 (IFNL4), that affects clearance of the hepatitis C virus. They also identified an inherited genetic variant within IFNL4 that predicts how people respond to treatment for hepatit
Characterization of the apolipoprotein AI and CIII genes in the domestic pig
DOE Office of Scientific and Technical Information (OSTI.GOV)
Birchbauer, A.; Knipping, G.; Juritsch, B.
1993-03-01
The apolipoproteins (apo) AI and CIII are important constituents of triglyceride-rich lipoproteins and high-density lipoproteins. In humans, apo AI is believed to play an important protective role in the pathogenesis of arteriosclerosis, whereas apo CIII might be involved in the development of hypertriglyceridemia. Both human genes are located within a gene cluster on chromosome 11. Although the domestic pig has been widely used as an animal model in arteriosclerosis and lipid research, the porcine apolipoproteins genes are poorly characterized. In this report, the complete nucleotide sequences of the porcine apo AI and CIII genes are presented and the authors demonstrate,more » for the first time, apo CIII expression in the pig. Both genes are composed of four exons and three introns and resemble closely their human counterparts with regard to the transcriptional start sites, exon sizes, intron sizes, exon-intron borders, and the size of the intergenic region. The predicted pig apo AI is a protein of 241 amino acids, which is 2 amino acids shorter than human apo AI. The protein sequence was found to be very homologous to apo AI sequences in other mammalian species. Apo AI expression was detected on the mRNA level in porcine liver and intestine. The apo CIII gene encodes a protein with 73 amino acids, which is 6 amino acids shorter than human apo CIII. In contrast to the three isoforms of apo CIII found in humans, only one major isoform was detected in the pig. Presumably this isoform is unglycosylated. In addition to apo CIII expression in the liver and the intestine, a truncated form of apo CIII mRNA was also found in porcine kidney. The studies demonstrate the presence of an apo CIII gene, an apo CIII mRNA, and an apo CIII protein in the pig and, therefore, exclude a hypothesized apo CIII deficiency in these animals. 53 refs., 5 figs.« less
Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning.
Zhao, Jonathan Z L; Mucaki, Eliseos J; Rogan, Peter K
2018-01-01
Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2 , PRKDC , TPP2 , PTPRE , and GADD45A ) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2 , CD8A , TALDO1 , PCNA , EIF4G2 , LCN2 , CDKN1A , PRKCH , ENO1 , and PPM1D ) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.
A novel genomic signature with translational significance for human idiopathic pulmonary fibrosis.
Bauer, Yasmina; Tedrow, John; de Bernard, Simon; Birker-Robaczewska, Magdalena; Gibson, Kevin F; Guardela, Brenda Juan; Hess, Patrick; Klenk, Axel; Lindell, Kathleen O; Poirey, Sylvie; Renault, Bérengère; Rey, Markus; Weber, Edgar; Nayler, Oliver; Kaminski, Naftali
2015-02-01
The bleomycin-induced rodent lung fibrosis model is commonly used to study mechanisms of lung fibrosis and to test potential therapeutic interventions, despite the well recognized dissimilarities to human idiopathic pulmonary fibrosis (IPF). Therefore, in this study, we sought to identify genomic commonalities between the gene expression profiles from 100 IPF lungs and 108 control lungs that were obtained from the Lung Tissue Research Consortium, and rat lungs harvested at Days 3, 7, 14, 21, 28, 42, and 56 after bleomycin instillation. Surprisingly, the highest gene expression similarity between bleomycin-treated rat and IPF lungs was observed at Day 7. At this point of maximal rat-human commonality, we identified a novel set of 12 disease-relevant translational gene markers (C6, CTHRC1, CTSE, FHL2, GAL, GREM1, LCN2, MMP7, NELL1, PCSK1, PLA2G2A, and SLC2A5) that was able to separate almost all patients with IPF from control subjects in our cohort and in two additional IPF/control cohorts (GSE10667 and GSE24206). Furthermore, in combination with diffusing capacity of carbon monoxide measurements, four members of the translational gene marker set contributed to stratify patients with IPF according to disease severity. Significantly, pirfenidone attenuated the expression change of one (CTHRC1) translational gene marker in the bleomycin-induced lung fibrosis model, in transforming growth factor-β1-treated primary human lung fibroblasts and transforming growth factor-β1-treated human epithelial A549 cells. Our results suggest that a strategy focused on rodent model-human disease commonalities may identify genes that could be used to predict the pharmacological impact of therapeutic interventions, and thus facilitate the development of novel treatments for this devastating lung disease.
2012-01-01
Background Francisella is a genus of gram-negative bacterium highly virulent in fishes and human where F. tularensis is causing the serious disease tularaemia in human. Recently Francisella species have been reported to cause mortality in aquaculture species like Atlantic cod and tilapia. We have completed the sequencing and draft assembly of the Francisella noatunensis subsp. orientalisToba04 strain isolated from farmed Tilapia. Compared to other available Francisella genomes, it is most similar to the genome of Francisella philomiragia subsp. philomiragia, a free-living bacterium not virulent to human. Results The genome is rearranged compared to the available Francisella genomes even though we found no IS-elements in the genome. Nearly 16% percent of the predicted ORFs are pseudogenes. Computational pathway analysis indicates that a number of the metabolic pathways are disrupted due to pseudogenes. Comparing the novel genome with other available Francisella genomes, we found around 2.5% of unique genes present in Francisella noatunensis subsp. orientalis Toba04 and a list of genes uniquely present in the human-pathogenic Francisella subspecies. Most of these genes might have transferred from bacterial species through horizontal gene transfer. Comparative analysis between human and fish pathogen also provide insights into genes responsible for pathogenecity. Our analysis of pseudogenes indicates that the evolution of Francisella subspecies’s pseudogenes from Tilapia is old with large number of pseudogenes having more than one inactivating mutation. Conclusions The fish pathogen has lost non-essential genes some time ago. Evolutionary analysis of the Francisella genomes, strongly suggests that human and fish pathogenic Francisella species have evolved independently from free-living metabolically competent Francisella species. These findings will contribute to understanding the evolution of Francisella species and pathogenesis. PMID:23131096
A Novel Genomic Signature with Translational Significance for Human Idiopathic Pulmonary Fibrosis
Tedrow, John; de Bernard, Simon; Birker-Robaczewska, Magdalena; Gibson, Kevin F.; Guardela, Brenda Juan; Hess, Patrick; Klenk, Axel; Lindell, Kathleen O.; Poirey, Sylvie; Renault, Bérengère; Rey, Markus; Weber, Edgar; Nayler, Oliver; Kaminski, Naftali
2015-01-01
The bleomycin-induced rodent lung fibrosis model is commonly used to study mechanisms of lung fibrosis and to test potential therapeutic interventions, despite the well recognized dissimilarities to human idiopathic pulmonary fibrosis (IPF). Therefore, in this study, we sought to identify genomic commonalities between the gene expression profiles from 100 IPF lungs and 108 control lungs that were obtained from the Lung Tissue Research Consortium, and rat lungs harvested at Days 3, 7, 14, 21, 28, 42, and 56 after bleomycin instillation. Surprisingly, the highest gene expression similarity between bleomycin-treated rat and IPF lungs was observed at Day 7. At this point of maximal rat–human commonality, we identified a novel set of 12 disease-relevant translational gene markers (C6, CTHRC1, CTSE, FHL2, GAL, GREM1, LCN2, MMP7, NELL1, PCSK1, PLA2G2A, and SLC2A5) that was able to separate almost all patients with IPF from control subjects in our cohort and in two additional IPF/control cohorts (GSE10667 and GSE24206). Furthermore, in combination with diffusing capacity of carbon monoxide measurements, four members of the translational gene marker set contributed to stratify patients with IPF according to disease severity. Significantly, pirfenidone attenuated the expression change of one (CTHRC1) translational gene marker in the bleomycin-induced lung fibrosis model, in transforming growth factor-β1–treated primary human lung fibroblasts and transforming growth factor-β1–treated human epithelial A549 cells. Our results suggest that a strategy focused on rodent model–human disease commonalities may identify genes that could be used to predict the pharmacological impact of therapeutic interventions, and thus facilitate the development of novel treatments for this devastating lung disease. PMID:25029475
Discovering functions of unannotated genes from a transcriptome survey of wild fungal isolates.
Ellison, Christopher E; Kowbel, David; Glass, N Louise; Taylor, John W; Brem, Rachel B
2014-04-01
Most fungal genomes are poorly annotated, and many fungal traits of industrial and biomedical relevance are not well suited to classical genetic screens. Assigning genes to phenotypes on a genomic scale thus remains an urgent need in the field. We developed an approach to infer gene function from expression profiles of wild fungal isolates, and we applied our strategy to the filamentous fungus Neurospora crassa. Using transcriptome measurements in 70 strains from two well-defined clades of this microbe, we first identified 2,247 cases in which the expression of an unannotated gene rose and fell across N. crassa strains in parallel with the expression of well-characterized genes. We then used image analysis of hyphal morphologies, quantitative growth assays, and expression profiling to test the functions of four genes predicted from our population analyses. The results revealed two factors that influenced regulation of metabolism of nonpreferred carbon and nitrogen sources, a gene that governed hyphal architecture, and a gene that mediated amino acid starvation resistance. These findings validate the power of our population-transcriptomic approach for inference of novel gene function, and we suggest that this strategy will be of broad utility for genome-scale annotation in many fungal systems. IMPORTANCE Some fungal species cause deadly infections in humans or crop plants, and other fungi are workhorses of industrial chemistry, including the production of biofuels. Advances in medical and industrial mycology require an understanding of the genes that control fungal traits. We developed a method to infer functions of uncharacterized genes by observing correlated expression of their mRNAs with those of known genes across wild fungal isolates. We applied this strategy to a filamentous fungus and predicted functions for thousands of unknown genes. In four cases, we experimentally validated the predictions from our method, discovering novel genes involved in the metabolism of nutrient sources relevant for biofuel production, as well as colony morphology and starvation resistance. Our strategy is straightforward, inexpensive, and applicable for predicting gene function in many fungal species.
Sequence divergence of the red and green visual pigments in great apes and humans.
Deeb, S S; Jorgensen, A L; Battisti, L; Iwasaki, L; Motulsky, A G
1994-01-01
We have determined the coding sequences of red and green visual pigment genes of the chimpanzee, gorilla, and orangutan. The deduced amino acid sequences of these pigments are highly homologous to the equivalent human pigments. None of the amino acid differences occurred at sites that were previously shown to influence pigment absorption characteristics. Therefore, we predict the spectra of red and green pigments of the apes to have wavelengths of maximum absorption that differ by < 2 nm from the equivalent human pigments and that color vision in these nonhuman primates will be very similar, if not identical, to that in humans. A total of 14 within-species polymorphisms (6 involving silent substitutions) were observed in the coding sequences of the red and green pigment genes of the great apes. Remarkably, the polymorphisms at 6 of these sites had been observed in human populations, suggesting that they predated the evolution of higher primates. Alleles at polymorphic sites were often shared between the red and green pigment genes. The average synonymous rate of divergence of red from green sequences was approximately 1/10th that estimated for other proteins of higher primates, indicating the involvement of gene conversion in generating these polymorphisms. The high degree of homology and juxtaposition of these two genes on the X chromosome has promoted unequal recombination and/or gene conversion that led to sequence homogenization. However, natural selection operated to maintain the degree of separation in peak absorbance between the red and green pigments that resulted in optimal chromatic discrimination. This represents a unique case of molecular coevolution between two homologous genes that functionally interact at the behavioral level. PMID:8041777
Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V; Craig, Timothy K; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C; van Raaij, Mark J; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten
2014-02-01
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. Copyright © 2013 The Authors. Wiley Periodicals, Inc.
A comparative study of disease genes and drug targets in the human protein interactome
2015-01-01
Background Disease genes cause or contribute genetically to the development of the most complex diseases. Drugs are the major approaches to treat the complex disease through interacting with their targets. Thus, drug targets are critical for treatment efficacy. However, the interrelationship between the disease genes and drug targets is not clear. Results In this study, we comprehensively compared the network properties of disease genes and drug targets for five major disease categories (cancer, cardiovascular disease, immune system disease, metabolic disease, and nervous system disease). We first collected disease genes from genome-wide association studies (GWAS) for five disease categories and collected their corresponding drugs based on drugs' Anatomical Therapeutic Chemical (ATC) classification. Then, we obtained the drug targets for these five different disease categories. We found that, though the intersections between disease genes and drug targets were small, disease genes were significantly enriched in targets compared to their enrichment in human protein-coding genes. We further compared network properties of the proteins encoded by disease genes and drug targets in human protein-protein interaction networks (interactome). The results showed that the drug targets tended to have higher degree, higher betweenness, and lower clustering coefficient in cancer Furthermore, we observed a clear fraction increase of disease proteins or drug targets in the near neighborhood compared with the randomized genes. Conclusions The study presents the first comprehensive comparison of the disease genes and drug targets in the context of interactome. The results provide some foundational network characteristics for further designing computational strategies to predict novel drug targets and drug repurposing. PMID:25861037
Smoot, L M; Smoot, J C; Graham, M R; Somerville, G A; Sturdevant, D E; Migliaccio, C A; Sylva, G L; Musser, J M
2001-08-28
Pathogens are exposed to different temperatures during an infection cycle and must regulate gene expression accordingly. However, the extent to which virulent bacteria alter gene expression in response to temperatures encountered in the host is unknown. Group A Streptococcus (GAS) is a human-specific pathogen that is responsible for illnesses ranging from superficial skin infections and pharyngitis to severe invasive infections such as necrotizing fasciitis and streptococcal toxic shock syndrome. GAS survives and multiplies at different temperatures during human infection. DNA microarray analysis was used to investigate the influence of temperature on global gene expression in a serotype M1 strain grown to exponential phase at 29 degrees C and 37 degrees C. Approximately 9% of genes were differentially expressed by at least 1.5-fold at 29 degrees C relative to 37 degrees C, including genes encoding transporter proteins, proteins involved in iron homeostasis, transcriptional regulators, phage-associated proteins, and proteins with no known homologue. Relatively few known virulence genes were differentially expressed at this threshold. However, transcription of 28 genes encoding proteins with predicted secretion signal sequences was altered, indicating that growth temperature substantially influences the extracellular proteome. TaqMan real-time reverse transcription-PCR assays confirmed the microarray data. We also discovered that transcription of genes encoding hemolysins, and proteins with inferred roles in iron regulation, transport, and homeostasis, was influenced by growth at 40 degrees C. Thus, GAS profoundly alters gene expression in response to temperature. The data delineate the spectrum of temperature-regulated gene expression in an important human pathogen and provide many unforeseen lines of pathogenesis investigation.
A comparative study of disease genes and drug targets in the human protein interactome.
Sun, Jingchun; Zhu, Kevin; Zheng, W; Xu, Hua
2015-01-01
Disease genes cause or contribute genetically to the development of the most complex diseases. Drugs are the major approaches to treat the complex disease through interacting with their targets. Thus, drug targets are critical for treatment efficacy. However, the interrelationship between the disease genes and drug targets is not clear. In this study, we comprehensively compared the network properties of disease genes and drug targets for five major disease categories (cancer, cardiovascular disease, immune system disease, metabolic disease, and nervous system disease). We first collected disease genes from genome-wide association studies (GWAS) for five disease categories and collected their corresponding drugs based on drugs' Anatomical Therapeutic Chemical (ATC) classification. Then, we obtained the drug targets for these five different disease categories. We found that, though the intersections between disease genes and drug targets were small, disease genes were significantly enriched in targets compared to their enrichment in human protein-coding genes. We further compared network properties of the proteins encoded by disease genes and drug targets in human protein-protein interaction networks (interactome). The results showed that the drug targets tended to have higher degree, higher betweenness, and lower clustering coefficient in cancer Furthermore, we observed a clear fraction increase of disease proteins or drug targets in the near neighborhood compared with the randomized genes. The study presents the first comprehensive comparison of the disease genes and drug targets in the context of interactome. The results provide some foundational network characteristics for further designing computational strategies to predict novel drug targets and drug repurposing.
Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa
2006-01-01
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
Phenotypes from ancient DNA: approaches, insights and prospects.
Fortes, Gloria G; Speller, Camilla F; Hofreiter, Michael; King, Turi E
2013-08-01
The great majority of phenotypic characteristics are complex traits, complicating the identification of the genes underlying their expression. However, both methodological and theoretical progress in genome-wide association studies have resulted in a much better understanding of the underlying genetics of many phenotypic traits, including externally visible characteristics (EVCs) such as eye and hair color. Consequently, it has become possible to predict EVCs from human samples lacking phenotypic information. Predicting EVCs from genetic evidence is clearly appealing for forensic applications involving the personal identification of human remains. Now, a recent paper has reported the genetic determination of eye and hair color in samples up to 800 years old. The ability to predict EVCs from ancient human remains opens up promising perspectives for ancient DNA research, as this could allow studies to directly address archaeological and evolutionary questions related to the temporal and geographical origins of the genetic variants underlying phenotypes. © 2013 WILEY Periodicals, Inc.
Šmajs, David; Zobaníková, Marie; Strouhal, Michal; Čejková, Darina; Dugan-Rocha, Shannon; Pospíšilová, Petra; Norris, Steven J.; Albert, Tom; Qin, Xiang; Hallsworth-Pepin, Kym; Buhay, Christian; Muzny, Donna M.; Chen, Lei; Gibbs, Richard A.; Weinstock, George M.
2011-01-01
Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies. PMID:21655244
Guedj, Faycal; Pennings, Jeroen LA; Massingham, Lauren J; Wick, Heather C; Siegel, Ashley E; Tantravahi, Umadevi; Bianchi, Diana W
2016-09-02
Anatomical and functional brain abnormalities begin during fetal life in Down syndrome (DS). We hypothesize that novel prenatal treatments can be identified by targeting signaling pathways that are consistently perturbed in cell types/tissues obtained from human fetuses with DS and mouse embryos. We analyzed transcriptome data from fetuses with trisomy 21, age and sex-matched euploid controls, and embryonic day 15.5 forebrains from Ts1Cje, Ts65Dn, and Dp16 mice. The new datasets were compared to other publicly available datasets from humans with DS. We used the human Connectivity Map (CMap) database and created a murine adaptation to identify FDA-approved drugs that can rescue affected pathways. USP16 and TTC3 were dysregulated in all affected human cells and two mouse models. DS-associated pathway abnormalities were either the result of gene dosage specific effects or the consequence of a global cell stress response with activation of compensatory mechanisms. CMap analyses identified 56 molecules with high predictive scores to rescue abnormal gene expression in both species. Our novel integrated human/murine systems biology approach identified commonly dysregulated genes and pathways. This can help to prioritize therapeutic molecules on which to further test safety and efficacy. Additional studies in human cells are ongoing prior to pre-clinical prenatal treatment in mice.
Feliu, Neus; Kohonen, Pekka; Ji, Jie; Zhang, Yuning; Karlsson, Hanna L; Palmberg, Lena; Nyström, Andreas; Fadeel, Bengt
2015-01-27
Gene expression profiling has developed rapidly in recent years with the advent of deep sequencing technologies such as RNA sequencing (RNA Seq) and could be harnessed to predict and define mechanisms of toxicity of chemicals and nanomaterials. However, the full potential of these technologies in (nano)toxicology is yet to be realized. Here, we show that systems biology approaches can uncover mechanisms underlying cellular responses to nanomaterials. Using RNA Seq and computational approaches, we found that cationic poly(amidoamine) dendrimers (PAMAM-NH2) are capable of triggering down-regulation of cell-cycle-related genes in primary human bronchial epithelial cells at doses that do not elicit acute cytotoxicity, as demonstrated using conventional cell viability assays, while gene transcription was not affected by neutral PAMAM-OH dendrimers. The PAMAMs were internalized in an active manner by lung cells and localized mainly in lysosomes; amine-terminated dendrimers were internalized more efficiently when compared to the hydroxyl-terminated dendrimers. Upstream regulator analysis implicated NF-κB as a putative transcriptional regulator, and subsequent cell-based assays confirmed that PAMAM-NH2 caused NF-κB-dependent cell cycle arrest. However, PAMAM-NH2 did not affect cell cycle progression in the human A549 adenocarcinoma cell line. These results demonstrate the feasibility of applying systems biology approaches to predict cellular responses to nanomaterials and highlight the importance of using relevant (primary) cell models.
Genetics and epigenetics of aging and longevity
Moskalev, Alexey A; Aliper, Alexander M; Smit-McBride, Zeljka; Buzdin, Anton; Zhavoronkov, Alex
2014-01-01
Evolutionary theories of aging predict the existence of certain genes that provide selective advantage early in life with adverse effect on lifespan later in life (antagonistic pleiotropy theory) or longevity insurance genes (disposable soma theory). Indeed, the study of human and animal genetics is gradually identifying new genes that increase lifespan when overexpressed or mutated: gerontogenes. Furthermore, genetic and epigenetic mechanisms are being identified that have a positive effect on longevity. The gerontogenes are classified as lifespan regulators, mediators, effectors, housekeeping genes, genes involved in mitochondrial function, and genes regulating cellular senescence and apoptosis. In this review we demonstrate that the majority of the genes as well as genetic and epigenetic mechanisms that are involved in regulation of longevity are highly interconnected and related to stress response. PMID:24603410
Handa, Koichi; Nakagome, Izumi; Yamaotsu, Noriyuki; Gouda, Hiroaki; Hirono, Shuichi
2015-01-01
The pregnane X receptor [PXR (NR1I2)] induces the expression of xenobiotic metabolic genes and transporter genes. In this study, we aimed to establish a computational method for quantifying the enzyme-inducing potencies of different compounds via their ability to activate PXR, for the application in drug discovery and development. To achieve this purpose, we developed a three-dimensional quantitative structure-activity relationship (3D-QSAR) model using comparative molecular field analysis (CoMFA) for predicting enzyme-inducing potencies, based on computer-ligand docking to multiple PXR protein structures sampled from the trajectory of a molecular dynamics simulation. Molecular mechanics-generalized born/surface area scores representing the ligand-protein-binding free energies were calculated for each ligand. As a result, the predicted enzyme-inducing potencies for compounds generated by the CoMFA model were in good agreement with the experimental values. Finally, we concluded that this 3D-QSAR model has the potential to predict the enzyme-inducing potencies of novel compounds with high precision and therefore has valuable applications in the early stages of the drug discovery process. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.
Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra
2012-01-01
Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed. PMID:22539940
Culture-gene coevolution of individualism-collectivism and the serotonin transporter gene.
Chiao, Joan Y; Blizinsky, Katherine D
2010-02-22
Culture-gene coevolutionary theory posits that cultural values have evolved, are adaptive and influence the social and physical environments under which genetic selection operates. Here, we examined the association between cultural values of individualism-collectivism and allelic frequency of the serotonin transporter functional polymorphism (5-HTTLPR) as well as the role this culture-gene association may play in explaining global variability in prevalence of pathogens and affective disorders. We found evidence that collectivistic cultures were significantly more likely to comprise individuals carrying the short (S) allele of the 5-HTTLPR across 29 nations. Results further show that historical pathogen prevalence predicts cultural variability in individualism-collectivism owing to genetic selection of the S allele. Additionally, cultural values and frequency of S allele carriers negatively predict global prevalence of anxiety and mood disorder. Finally, mediation analyses further indicate that increased frequency of S allele carriers predicted decreased anxiety and mood disorder prevalence owing to increased collectivistic cultural values. Taken together, our findings suggest culture-gene coevolution between allelic frequency of 5-HTTLPR and cultural values of individualism-collectivism and support the notion that cultural values buffer genetically susceptible populations from increased prevalence of affective disorders. Implications of the current findings for understanding culture-gene coevolution of human brain and behaviour as well as how this coevolutionary process may contribute to global variation in pathogen prevalence and epidemiology of affective disorders, such as anxiety and depression, are discussed.
Harries, Lorna W; Fellows, Alexander D; Pilling, Luke C; Hernandez, Dena; Singleton, Andrew; Bandinelli, Stefania; Guralnik, Jack; Powell, Jonathan; Ferrucci, Luigi; Melzer, David
2012-08-01
Interventions which inhibit TOR activity (including rapamycin and caloric restriction) lead to downstream gene expression changes and increased lifespan in laboratory models. However, the role of mTOR signaling in human aging is unclear. We tested the expression of mTOR-related transcripts in two independent study cohorts; the InCHIANTI population study of aging and the San Antonio Family Heart Study (SAFHS). Expression of 27/56 (InCHIANTI) and 19/44 (SAFHS) genes were associated with age after correction for multiple testing. 8 genes were robustly associated with age in both cohorts. Genes involved in insulin signaling (PTEN, PI3K, PDK1), ribosomal biogenesis (S6K), lipid metabolism (SREBF1), cellular apoptosis (SGK1), angiogenesis (VEGFB), insulin production and sensitivity (FOXO), cellular stress response (HIF1A) and cytoskeletal remodeling (PKC) were inversely correlated with age, whereas genes relating to inhibition of ribosomal components (4EBP1) and inflammatory mediators (STAT3) were positively associated with age in one or both datasets. We conclude that the expression of mTOR-related transcripts is associated with advancing age in humans. Changes seen are broadly similar to mTOR inhibition interventions associated with increased lifespan in animals. Work is needed to establish whether these changes are predictive of human longevity and whether further mTOR inhibition would be beneficial in older people. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.
Schulz, Tizian; Stoye, Jens; Doerr, Daniel
2018-05-08
Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
Prediction of gestational age based on genome-wide differentially methylated regions.
Bohlin, J; Håberg, S E; Magnus, P; Reese, S E; Gjessing, H K; Magnus, M C; Parr, C L; Page, C M; London, S J; Nystad, W
2016-10-07
We explored the association between gestational age and cord blood DNA methylation at birth and whether DNA methylation could be effective in predicting gestational age due to limitations with the presently used methods. We used data from the Norwegian Mother and Child Birth Cohort study (MoBa) with Illumina HumanMethylation450 data measured for 1753 newborns in two batches: MoBa 1, n = 1068; and MoBa 2, n = 685. Gestational age was computed using both ultrasound and the last menstrual period. We evaluated associations between DNA methylation and gestational age and developed a statistical model for predicting gestational age using MoBa 1 for training and MoBa 2 for predictions. The prediction model was additionally used to compare ultrasound and last menstrual period-based gestational age predictions. Furthermore, both CpGs and associated genes detected in the training models were compared to those detected in a published prediction model for chronological age. There were 5474 CpGs associated with ultrasound gestational age after adjustment for a set of covariates, including estimated cell type proportions, and Bonferroni-correction for multiple testing. Our model predicted ultrasound gestational age more accurately than it predicted last menstrual period gestational age. DNA methylation at birth appears to be a good predictor of gestational age. Ultrasound gestational age is more strongly associated with methylation than last menstrual period gestational age. The CpGs linked with our gestational age prediction model, and their associated genes, differed substantially from the corresponding CpGs and genes associated with a chronological age prediction model.
Ren, Xuefeng; Graham, Jessica C; Jing, Lichen; Mikheev, Andrei M; Gao, Yuan; Lew, Jenny Pan; Xie, Hong; Kim, Andrea S; Shang, Xiuling; Friedman, Cynthia; Vail, Graham; Fang, Ming Zhu; Bromberg, Yana; Zarbl, Helmut
2013-01-01
Rat strains differ dramatically in their susceptibility to mammary carcinogenesis. On the assumption that susceptibility genes are conserved across mammalian species and hence inform human carcinogenesis, numerous investigators have used genetic linkage studies in rats to identify genes responsible for differential susceptibility to carcinogenesis. Using a genetic backcross between the resistant Copenhagen (Cop) and susceptible Fischer 344 (F344) strains, we mapped a novel mammary carcinoma susceptibility (Mcs30) locus to the centromeric region on chromosome 12 (LOD score of ∼8.6 at the D12Rat59 marker). The Mcs30 locus comprises approximately 12 Mbp on the long arm of rat RNO12 whose synteny is conserved on human chromosome 13q12 to 13q13. After analyzing numerous genes comprising this locus, we identified Fry, the rat ortholog of the furry gene of Drosophila melanogaster, as a candidate Mcs gene. We cloned and determined the complete nucleotide sequence of the 13 kbp Fry mRNA. Sequence analysis indicated that the Fry gene was highly conserved across evolution, with 90% similarity of the predicted amino acid sequence among eutherian mammals. Comparison of the Fry sequence in the Cop and F344 strains identified two non-synonymous single nucleotide polymorphisms (SNPs), one of which creates a putative, de novo phosphorylation site. Further analysis showed that the expression of the Fry gene is reduced in a majority of rat mammary tumors. Our results also suggested that FRY activity was reduced in human breast carcinoma cell lines as a result of reduced levels or mutation. This study is the first to identify the Fry gene as a candidate Mcs gene. Our data suggest that the SNPs within the Fry gene contribute to the genetic susceptibility of the F344 rat strain to mammary carcinogenesis. These results provide the foundation for analyzing the role of the human FRY gene in cancer susceptibility and progression.
Mutations in the human GlyT2 gene define a presynaptic component of human startle disease
Rees, Mark I.; Harvey, Kirsten; Pearce, Brian R.; Chung, Seo-Kyung; Duguid, Ian C.; Thomas, Philip; Beatty, Sarah; Graham, Gail E.; Armstrong, Linlea; Shiang, Rita; Abbott, Kim J.; Zuberi, Sameer M.; Stephenson, John B.P.; Owen, Michael J.; Tijssen, Marina A.J.; van den Maagdenberg, Arn M.J.M.; Smart, Trevor G.; Supplisson, Stéphane; Harvey, Robert J.
2011-01-01
Hyperekplexia is a human neurological disorder characterized by an excessive startle response and is typically caused by missense and nonsense mutations in the gene encoding the inhibitory glycine receptor (GlyR) α1 subunit (GLRA1)1-3. Genetic heterogeneity has been confirmed in isolated sporadic cases with mutations in other postsynaptic glycinergic proteins including the GlyR β subunit (GLRB)4, gephyrin (GPHN)5 and RhoGEF collybistin (ARHGEF9)6. However, many sporadic patients diagnosed with hyperekplexia do not carry mutations in these genes2-7. Here we reveal that missense, nonsense and frameshift mutations in the presynaptic glycine transporter 2 (GlyT2) gene (SLC6A5)8 also cause hyperekplexia. Patients harbouring mutations in SLC6A5 presented with hypertonia, an exaggerated startle response to tactile or acoustic stimuli, and life-threatening neonatal apnoea episodes. GlyT2 mutations result in defective subcellular localisation and/or decreased glycine uptake, with selected mutations affecting predicted glycine and Na+ binding sites. Our results demonstrate that SLC6A5 is a major gene for hyperekplexia and define the first neurological disorder linked to mutations in a Na+/Cl−-dependent transporter for a classical fast neurotransmitter. By analogy, we suggest that in other human disorders where defects in postsynaptic receptors have been identified, similar symptoms could result from defects in the cognate presynaptic neurotransmitter transporter. PMID:16751771
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Identification of susceptibility genes and genetic modifiers of human diseases
NASA Astrophysics Data System (ADS)
Abel, Kenneth; Kammerer, Stefan; Hoyal, Carolyn; Reneland, Rikard; Marnellos, George; Nelson, Matthew R.; Braun, Andreas
2005-03-01
The completion of the human genome sequence enables the discovery of genes involved in common human disorders. The successful identification of these genes is dependent on the availability of informative sample sets, validated marker panels, a high-throughput scoring technology, and a strategy for combining these resources. We have developed a universal platform technology based on mass spectrometry (MassARRAY) for analyzing nucleic acids with high precision and accuracy. To fuel this technology, we generated more than 100,000 validated assays for single nucleotide polymorphisms (SNPs) covering virtually all known and predicted human genes. We also established a large DNA sample bank comprised of more than 50,000 consented healthy and diseased individuals. This combination of reagents and technology allows the execution of large-scale genome-wide association studies. Taking advantage of MassARRAY"s capability for quantitative analysis of nucleic acids, allele frequencies are estimated in sample pools containing large numbers of individual DNAs. To compare pools as a first-pass "filtering" step is a tremendous advantage in throughput and cost over individual genotyping. We employed this approach in numerous genome-wide, hypothesis-free searches to identify genes associated with common complex diseases, such as breast cancer, osteoporosis, and osteoarthritis, and genes involved in quantitative traits like high density lipoproteins cholesterol (HDL-c) levels and central fat. Access to additional well-characterized patient samples through collaborations allows us to conduct replication studies that validate true disease genes. These discoveries will expand our understanding of genetic disease predisposition, and our ability for early diagnosis and determination of specific disease subtype or progression stage.
Anaerobic biosynthesis of the lower ligand of vitamin B12
Hazra, Amrita B.; Han, Andrew W.; Mehta, Angad P.; Mok, Kenny C.; Osadchiy, Vadim; Begley, Tadhg P.; Taga, Michiko E.
2015-01-01
Vitamin B12 (cobalamin) is required by humans and other organisms for diverse metabolic processes, although only a subset of prokaryotes is capable of synthesizing B12 and other cobamide cofactors. The complete aerobic and anaerobic pathways for the de novo biosynthesis of B12 are known, with the exception of the steps leading to the anaerobic biosynthesis of the lower ligand, 5,6-dimethylbenzimidazole (DMB). Here, we report the identification and characterization of the complete pathway for anaerobic DMB biosynthesis. This pathway, identified in the obligate anaerobic bacterium Eubacterium limosum, is composed of five previously uncharacterized genes, bzaABCDE, that together direct DMB production when expressed in anaerobically cultured Escherichia coli. Expression of different combinations of the bza genes revealed that 5-hydroxybenzimidazole, 5-methoxybenzimidazole, and 5-methoxy-6-methylbenzimidazole, all of which are lower ligands of cobamides produced by other organisms, are intermediates in the pathway. The bza gene content of several bacterial and archaeal genomes is consistent with experimentally determined structures of the benzimidazoles produced by these organisms, indicating that these genes can be used to predict cobamide structure. The identification of the bza genes thus represents the last remaining unknown component of the biosynthetic pathway for not only B12 itself, but also for three other cobamide lower ligands whose biosynthesis was previously unknown. Given the importance of cobamides in environmental, industrial, and human-associated microbial metabolism, the ability to predict cobamide structure may lead to an improved ability to understand and manipulate microbial metabolism. PMID:26246619
NASA Astrophysics Data System (ADS)
Tomasek, Abigail; Kozarek, Jessica L.; Hondzo, Miki; Lurndahl, Nicole; Sadowsky, Michael J.; Wang, Ping; Staley, Christopher
2017-08-01
Intensive agriculture in the Midwestern United States contributes to excess nitrogen in surface water and groundwater, negatively affecting human health and aquatic ecosystems. Complete denitrification removes reactive nitrogen from aquatic environments and releases inert dinitrogen gas. We examined denitrification rates and the abundances of denitrifying genes and total bacteria at three sites in an agricultural watershed and in an experimental stream in Minnesota. Sampling was conducted along transects with a gradient from always inundated (in-channel), to periodically inundated, to noninundated conditions to determine how denitrification rates and gene abundances varied from channels to riparian areas with different inundation histories. Results indicate a coupling between environmental parameters, gene abundances, and denitrification rates at the in-channel locations, and limited to no coupling at the periodically inundated and noninundated locations, respectively. Nutrient-amended potential denitrification rates for the in-channel locations were significantly correlated (α = 0.05) with five of six measured denitrifying gene abundances, whereas the periodically inundated and noninundated locations were each only significantly correlated with the abundance of one denitrifying gene. These results suggest that DNA-based analysis of denitrifying gene abundances alone cannot predict functional responses (denitrification potential), especially in studies with varying hydrologic regimes. A scaling analysis was performed to develop a predictive functional relationship relating environmental parameters to denitrification rates for in-channel locations. This method could be applied to other geographic and climatic regions to predict the occurrence of denitrification hot spots.
Moodley, Yoshan; Uhr, Markus; Stamer, Christiana; Vauterin, Marc; Suerbaum, Sebastian; Achtman, Mark
2010-01-01
The Helicobacter pylori cag pathogenicity island (cagPAI) encodes a type IV secretion system. Humans infected with cagPAI–carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown. PMID:20808891
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lambrechts, Nathalie; Verstraelen, Sandra; Lodewyckx, Hanne
2009-04-15
Early detection of the sensitizing potential of chemicals is an emerging issue for chemical, pharmaceutical and cosmetic industries. In our institute, an in vitro classification model for prediction of chemical-induced skin sensitization based on gene expression signatures in human CD34{sup +} progenitor-derived dendritic cells (DC) has been developed. This primary cell model is able to closely mimic the induction phase of sensitization by Langerhans cells in the skin, but it has drawbacks, such as the availability of cord blood. The aim of this study was to investigate whether human in vitro cultured THP-1 monocytes or macrophages display a similar expressionmore » profile for 13 predictive gene markers previously identified in DC and whether they also possess a discriminating capacity towards skin sensitizers and non-sensitizers based on these marker genes. To this end, the cell models were exposed to 5 skin sensitizers (ammonium hexachloroplatinate IV, 1-chloro-2,4-dinitrobenzene, eugenol, para-phenylenediamine, and tetramethylthiuram disulfide) and 5 non-sensitizers (L-glutamic acid, methyl salicylate, sodium dodecyl sulfate, tributyltin chloride, and zinc sulfate) for 6, 10, and 24 h, and mRNA expression of the 13 genes was analyzed using real-time RT-PCR. The transcriptional response of 7 out of 13 genes in THP-1 monocytes was significantly correlated with DC, whereas only 2 out of 13 genes in THP-1 macrophages. After a cross-validation of a discriminant analysis of the gene expression profiles in the THP-1 monocytes, this cell model demonstrated to also have a capacity to distinguish skin sensitizers from non-sensitizers. However, the DC model was superior to the monocyte model for discrimination of (non-)sensitizing chemicals.« less
Olbermann, Patrick; Josenhans, Christine; Moodley, Yoshan; Uhr, Markus; Stamer, Christiana; Vauterin, Marc; Suerbaum, Sebastian; Achtman, Mark; Linz, Bodo
2010-08-19
The Helicobacter pylori cag pathogenicity island (cagPAI) encodes a type IV secretion system. Humans infected with cagPAI-carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.
Ventura, Marco; Turroni, Francesca; Zomer, Aldert; Foroni, Elena; Giubellini, Vanessa; Bottacini, Francesca; Canchaya, Carlos; Claesson, Marcus J.; He, Fei; Mantzourani, Maria; Mulas, Laura; Ferrarini, Alberto; Gao, Beile; Delledonne, Massimo; Henrissat, Bernard; Coutinho, Pedro; Oggioni, Marco; Gupta, Radhey S.; Zhang, Ziding; Beighton, David; Fitzgerald, Gerald F.; O'Toole, Paul W.; van Sinderen, Douwe
2009-01-01
Bifidobacteria, one of the relatively dominant components of the human intestinal microbiota, are considered one of the key groups of beneficial intestinal bacteria (probiotic bacteria). However, in addition to health-promoting taxa, the genus Bifidobacterium also includes Bifidobacterium dentium, an opportunistic cariogenic pathogen. The genetic basis for the ability of B. dentium to survive in the oral cavity and contribute to caries development is not understood. The genome of B. dentium Bd1, a strain isolated from dental caries, was sequenced to completion to uncover a single circular 2,636,368 base pair chromosome with 2,143 predicted open reading frames. Annotation of the genome sequence revealed multiple ways in which B. dentium has adapted to the oral environment through specialized nutrient acquisition, defences against antimicrobials, and gene products that increase fitness and competitiveness within the oral niche. B. dentium Bd1 was shown to metabolize a wide variety of carbohydrates, consistent with genome-based predictions, while colonization and persistence factors implicated in tissue adhesion, acid tolerance, and the metabolism of human saliva-derived compounds were also identified. Global transcriptome analysis demonstrated that many of the genes encoding these predicted traits are highly expressed under relevant physiological conditions. This is the first report to identify, through various genomic approaches, specific genetic adaptations of a Bifidobacterium taxon, Bifidobacterium dentium Bd1, to a lifestyle as a cariogenic microorganism in the oral cavity. In silico analysis and comparative genomic hybridization experiments clearly reveal a high level of genome conservation among various B. dentium strains. The data indicate that the genome of this opportunistic cariogen has evolved through a very limited number of horizontal gene acquisition events, highlighting the narrow boundaries that separate commensals from opportunistic pathogens. PMID:20041198
Yiu, Glenn; Tieu, Eric; Nguyen, Anthony T; Wong, Brittany; Smit-McBride, Zeljka
2016-10-01
To employ type II clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 endonuclease to suppress ocular angiogenesis by genomic disruption of VEGF-A in human RPE cells. CRISPR sequences targeting exon 1 of human VEGF-A were computationally identified based on predicted Cas9 on- and off-target probabilities. Single guide RNA (gRNA) cassettes with these target sequences were cloned into lentiviral vectors encoding the Streptococcuspyogenes Cas9 endonuclease (SpCas9) gene. The lentiviral vectors were used to infect ARPE-19 cells, a human RPE cell line. Frequency of insertion or deletion (indel) mutations was assessed by T7 endonuclease 1 mismatch detection assay; mRNA levels were assessed with quantitative real-time PCR; and VEGF-A protein levels were determined by ELISA. In vitro angiogenesis was measured using an endothelial cell tube formation assay. Five gRNAs targeting VEGF-A were selected based on the highest predicted on-target probabilities, lowest off-target probabilities, or combined average of both scores. Lentiviral delivery of the top-scoring gRNAs with SpCas9 resulted in indel formation in the VEGF-A gene at frequencies up to 37.0% ± 4.0% with corresponding decreases in secreted VEGF-A protein up to 41.2% ± 7.4% (P < 0.001), and reduction of endothelial tube formation up to 39.4% ± 9.8% (P = 0.02). No significant indel formation in the top three putative off-target sites tested was detected. The CRISPR-Cas9 endonuclease system may reduce VEGF-A secretion from human RPE cells and suppress angiogenesis, supporting the possibility of employing gene editing for antiangiogenesis therapy in ocular diseases.
Genetic risk prediction and neurobiological understanding of alcoholism
Levey, D F; Le-Niculescu, H; Frank, J; Ayalew, M; Jain, N; Kirlin, B; Learman, R; Winiger, E; Rodd, Z; Shekhar, A; Schork, N; Kiefe, F; Wodarz, N; Müller-Myhsok, B; Dahmen, N; Nöthen, M; Sherva, R; Farrer, L; Smith, A H; Kranzler, H R; Rietschel, M; Gelernter, J; Niculescu, A B
2014-01-01
We have used a translational Convergent Functional Genomics (CFG) approach to discover genes involved in alcoholism, by gene-level integration of genome-wide association study (GWAS) data from a German alcohol dependence cohort with other genetic and gene expression data, from human and animal model studies, similar to our previous work in bipolar disorder and schizophrenia. A panel of all the nominally significant P-value SNPs in the top candidate genes discovered by CFG (n=135 genes, 713 SNPs) was used to generate a genetic risk prediction score (GRPS), which showed a trend towards significance (P=0.053) in separating alcohol dependent individuals from controls in an independent German test cohort. We then validated and prioritized our top findings from this discovery work, and subsequently tested them in three independent cohorts, from two continents. In order to validate and prioritize the key genes that drive behavior without some of the pleiotropic environmental confounds present in humans, we used a stress-reactive animal model of alcoholism developed by our group, the D-box binding protein (DBP) knockout mouse, consistent with the surfeit of stress theory of addiction proposed by Koob and colleagues. A much smaller panel (n=11 genes, 66 SNPs) of the top CFG-discovered genes for alcoholism, cross-validated and prioritized by this stress-reactive animal model showed better predictive ability in the independent German test cohort (P=0.041). The top CFG scoring gene for alcoholism from the initial discovery step, synuclein alpha (SNCA) remained the top gene after the stress-reactive animal model cross-validation. We also tested this small panel of genes in two other independent test cohorts from the United States, one with alcohol dependence (P=0.00012) and one with alcohol abuse (a less severe form of alcoholism; P=0.0094). SNCA by itself was able to separate alcoholics from controls in the alcohol-dependent cohort (P=0.000013) and the alcohol abuse cohort (P=0.023). So did eight other genes from the panel of 11 genes taken individually, albeit to a lesser extent and/or less broadly across cohorts. SNCA, GRM3 and MBP survived strict Bonferroni correction for multiple comparisons. Taken together, these results suggest that our stress-reactive DBP animal model helped to validate and prioritize from the CFG-discovered genes some of the key behaviorally relevant genes for alcoholism. These genes fall into a series of biological pathways involved in signal transduction, transmission of nerve impulse (including myelination) and cocaine addiction. Overall, our work provides leads towards a better understanding of illness, diagnostics and therapeutics, including treatment with omega-3 fatty acids. We also examined the overlap between the top candidate genes for alcoholism from this work and the top candidate genes for bipolar disorder, schizophrenia, anxiety from previous CFG analyses conducted by us, as well as cross-tested genetic risk predictions. This revealed the significant genetic overlap with other major psychiatric disorder domains, providing a basis for comorbidity and dual diagnosis, and placing alcohol use in the broader context of modulating the mental landscape. PMID:24844177
SinEx DB: a database for single exon coding sequences in mammalian genomes.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
2016-01-01
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
Ghadie, Mohamed Ali; Lambourne, Luke; Vidal, Marc; Xia, Yu
2017-08-01
Alternative splicing is known to remodel protein-protein interaction networks ("interactomes"), yet large-scale determination of isoform-specific interactions remains challenging. We present a domain-based method to predict the isoform interactome from the reference interactome. First, we construct the domain-resolved reference interactome by mapping known domain-domain interactions onto experimentally-determined interactions between reference proteins. Then, we construct the isoform interactome by predicting that an isoform loses an interaction if it loses the domain mediating the interaction. Our prediction framework is of high-quality when assessed by experimental data. The predicted human isoform interactome reveals extensive network remodeling by alternative splicing. Protein pairs interacting with different isoforms of the same gene tend to be more divergent in biological function, tissue expression, and disease phenotype than protein pairs interacting with the same isoforms. Our prediction method complements experimental efforts, and demonstrates that integrating structural domain information with interactomes provides insights into the functional impact of alternative splicing.
Lambourne, Luke; Vidal, Marc
2017-01-01
Alternative splicing is known to remodel protein-protein interaction networks (“interactomes”), yet large-scale determination of isoform-specific interactions remains challenging. We present a domain-based method to predict the isoform interactome from the reference interactome. First, we construct the domain-resolved reference interactome by mapping known domain-domain interactions onto experimentally-determined interactions between reference proteins. Then, we construct the isoform interactome by predicting that an isoform loses an interaction if it loses the domain mediating the interaction. Our prediction framework is of high-quality when assessed by experimental data. The predicted human isoform interactome reveals extensive network remodeling by alternative splicing. Protein pairs interacting with different isoforms of the same gene tend to be more divergent in biological function, tissue expression, and disease phenotype than protein pairs interacting with the same isoforms. Our prediction method complements experimental efforts, and demonstrates that integrating structural domain information with interactomes provides insights into the functional impact of alternative splicing. PMID:28846689
The Early Effects of Rapid Androgen Deprivation on Human Prostate Cancer.
Shaw, Greg L; Whitaker, Hayley; Corcoran, Marie; Dunning, Mark J; Luxton, Hayley; Kay, Jonathan; Massie, Charlie E; Miller, Jodi L; Lamb, Alastair D; Ross-Adams, Helen; Russell, Roslin; Nelson, Adam W; Eldridge, Matthew D; Lynch, Andrew G; Ramos-Montoya, Antonio; Mills, Ian G; Taylor, Angela E; Arlt, Wiebke; Shah, Nimish; Warren, Anne Y; Neal, David E
2016-08-01
The androgen receptor (AR) is the dominant growth factor in prostate cancer (PCa). Therefore, understanding how ARs regulate the human transcriptome is of paramount importance. The early effects of castration on human PCa have not previously been studied 27 patients medically castrated with degarelix 7 d before radical prostatectomy. We used mass spectrometry, immunohistochemistry, and gene expression array (validated by reverse transcription-polymerase chain reaction) to compare resected tumour with matched, controlled, untreated PCa tissue. All patients had levels of serum androgen, with reduced levels of intraprostatic androgen at prostatectomy. We observed differential expression of known androgen-regulated genes (TMPRSS2, KLK3, CAMKK2, FKBP5). We identified 749 genes downregulated and 908 genes upregulated following castration. AR regulation of α-methylacyl-CoA racemase expression and three other genes (FAM129A, RAB27A, and KIAA0101) was confirmed. Upregulation of oestrogen receptor 1 (ESR1) expression was observed in malignant epithelia and was associated with differential expression of ESR1-regulated genes and correlated with proliferation (Ki-67 expression). This first-in-man study defines the rapid gene expression changes taking place in prostate cancer (PCa) following castration. Expression levels of the genes that the androgen receptor regulates are predictive of treatment outcome. Upregulation of oestrogen receptor 1 is a mechanism by which PCa cells may survive despite castration. Copyright © 2015 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Saccharomyces genome database informs human biology
Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Karra, Kalpana; Binkley, Gail; Simison, Matt; Miyasato, Stuart R
2018-01-01
Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. PMID:29140510
Identification of the Consistently Altered Metabolic Targets in Human Hepatocellular Carcinoma.
Nwosu, Zeribe Chike; Megger, Dominik Andre; Hammad, Seddik; Sitek, Barbara; Roessler, Stephanie; Ebert, Matthias Philip; Meyer, Christoph; Dooley, Steven
2017-09-01
Cancer cells rely on metabolic alterations to enhance proliferation and survival. Metabolic gene alterations that repeatedly occur in liver cancer are largely unknown. We aimed to identify metabolic genes that are consistently deregulated, and are of potential clinical significance in human hepatocellular carcinoma (HCC). We studied the expression of 2,761 metabolic genes in 8 microarray datasets comprising 521 human HCC tissues. Genes exclusively up-regulated or down-regulated in 6 or more datasets were defined as consistently deregulated. The consistent genes that correlated with tumor progression markers ( ECM2 and MMP9) (Pearson correlation P < .05) were used for Kaplan-Meier overall survival analysis in a patient cohort. We further compared proteomic expression of metabolic genes in 19 tumors vs adjacent normal liver tissues. We identified 634 consistent metabolic genes, ∼60% of which are not yet described in HCC. The down-regulated genes (n = 350) are mostly involved in physiologic hepatocyte metabolic functions (eg, xenobiotic, fatty acid, and amino acid metabolism). In contrast, among consistently up-regulated metabolic genes (n = 284) are those involved in glycolysis, pentose phosphate pathway, nucleotide biosynthesis, tricarboxylic acid cycle, oxidative phosphorylation, proton transport, membrane lipid, and glycan metabolism. Several metabolic genes (n = 434) correlated with progression markers, and of these, 201 predicted overall survival outcome in the patient cohort analyzed. Over 90% of the metabolic targets significantly altered at the protein level were similarly up- or down-regulated as in genomic profile. We provide the first exposition of the consistently altered metabolic genes in HCC and show that these genes are potentially relevant targets for onward studies in preclinical and clinical contexts.
Bokulich, Nicholas A; Bergsveinson, Jordyn; Ziola, Barry; Mills, David A
2015-01-01
Distinct microbial ecosystems have evolved to meet the challenges of indoor environments, shaping the microbial communities that interact most with modern human activities. Microbial transmission in food-processing facilities has an enormous impact on the qualities and healthfulness of foods, beneficially or detrimentally interacting with food products. To explore modes of microbial transmission and spoilage-gene frequency in a commercial food-production scenario, we profiled hop-resistance gene frequencies and bacterial and fungal communities in a brewery. We employed a Bayesian approach for predicting routes of contamination, revealing critical control points for microbial management. Physically mapping microbial populations over time illustrates patterns of dispersal and identifies potential contaminant reservoirs within this environment. Habitual exposure to beer is associated with increased abundance of spoilage genes, predicting greater contamination risk. Elucidating the genetic landscapes of indoor environments poses important practical implications for food-production systems and these concepts are translatable to other built environments. DOI: http://dx.doi.org/10.7554/eLife.04634.001 PMID:25756611
MusTRD can regulate postnatal fiber-specific expression.
Issa, Laura L; Palmer, Stephen J; Guven, Kim L; Santucci, Nicole; Hodgson, Vanessa R M; Popovic, Kata; Joya, Josephine E; Hardeman, Edna C
2006-05-01
Human MusTRD1alpha1 was isolated as a result of its ability to bind a critical element within the Troponin I slow upstream enhancer (TnIslow USE) and was predicted to be a regulator of slow fiber-specific genes. To test this hypothesis in vivo, we generated transgenic mice expressing hMusTRD1alpha1 in skeletal muscle. Adult transgenic mice show a complete loss of slow fibers and a concomitant replacement by fast IIA fibers, resulting in postural muscle weakness. However, developmental analysis demonstrates that transgene expression has no impact on embryonic patterning of slow fibers but causes a gradual postnatal slow to fast fiber conversion. This conversion was underpinned by a demonstrable repression of many slow fiber-specific genes, whereas fast fiber-specific gene expression was either unchanged or enhanced. These data are consistent with our initial predictions for hMusTRD1alpha1 and suggest that slow fiber genes contain a specific common regulatory element that can be targeted by MusTRD proteins.
Glöckner, Gernot; Scherer, Stephen; Schattevoy, Ruben; Boright, Andrew; Weber, Jacqueline; Tsui, Lap-Chee; Rosenthal, André
1998-01-01
We have sequenced and annotated two genomic regions located in the Giemsa negative band q22 of human chromosome 7. The first region defined by the erythropoietin (EPO) locus is 228 kb in length and contains 13 genes. Whereas 3 genes (GNB2, EPO, PCOLCE) were known previously on the mRNA level, we have been able to identify 10 novel genes using a newly developed automatic annotation tool RUMMAGE-DP, which comprises >26 different programs mainly for exon prediction, homology searches, and compositional and repeat analysis. For precise annotation we have also resequenced ESTs identified to the region and assembled them to build large cDNAs. In addition, we have investigated the differential splicing of genes. Using these tools we annotated 4 of the 10 genes as a zonadhesin, a transferrin homolog, a nucleoporin-like gene, and an actin gene. Two genes showed weak similarity to an insulin-like receptor and a neuronal protein with a leucine-rich amino-terminal domain. Four predicted genes (CDS1–CDS4) CDS that have been confirmed on the mRNA level showed no similarity to known proteins and a potential function could not be assigned. The second region in 7q22 defined by the CUTL1 (CCAAT displacement protein and its splice variant) locus is 416 kb in length and contains three known genes, including PMSL12, APS, CUTL1, and a novel gene (CDS5). The CUTL1 locus, consisting of two splice variants (CDP and CASP), occupies >300 kb. Based on the G,C profile an isochore switch can be defined between the CUTL1 gene and the APS and PMSL12 genes. [Clones 37G3, 164c7, and 235f8 are deposited in GenBank under accession no. AF053356; clone 123e15, accession no. AF024533; 186d2, accession no. AF024534; 46f6, accession no. AF006752; 50h2, accession no. AF047825; and 76h2, accession no. AF030453] PMID:9799793
Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min
2011-01-01
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507
MicroRNA Regulation of Human Protease Genes Essential for Influenza Virus Replication
Meliopoulos, Victoria A.; Andersen, Lauren E.; Brooks, Paula; Yan, Xiuzhen; Bakre, Abhijeet; Coleman, J. Keegan; Tompkins, S. Mark; Tripp, Ralph A.
2012-01-01
Influenza A virus causes seasonal epidemics and periodic pandemics threatening the health of millions of people each year. Vaccination is an effective strategy for reducing morbidity and mortality, and in the absence of drug resistance, the efficacy of chemoprophylaxis is comparable to that of vaccines. However, the rapid emergence of drug resistance has emphasized the need for new drug targets. Knowledge of the host cell components required for influenza replication has been an area targeted for disease intervention. In this study, the human protease genes required for influenza virus replication were determined and validated using RNA interference approaches. The genes validated as critical for influenza virus replication were ADAMTS7, CPE, DPP3, MST1, and PRSS12, and pathway analysis showed these genes were in global host cell pathways governing inflammation (NF-κB), cAMP/calcium signaling (CRE/CREB), and apoptosis. Analyses of host microRNAs predicted to govern expression of these genes showed that eight miRNAs regulated gene expression during virus replication. These findings identify unique host genes and microRNAs important for influenza replication providing potential new targets for disease intervention strategies. PMID:22606348
Molecular cloning of an inducible serine esterase gene from human cytotoxic lymphocytes.
Trapani, J A; Klein, J L; White, P C; Dupont, B
1988-01-01
A cDNA clone encoding a human serine esterase gene was isolated from a library constructed from poly(A)+ RNA of allogeneically stimulated, interleukin 2-expanded peripheral blood mononuclear cells. The clone, designated HSE26.1, represents a full-length copy of a 0.9-kilobase mRNA present in human cytotoxic cells but absent from a wide variety of noncytotoxic cell lines. Clone HSE26.1 contains an 892-base-pair sequence, including a single 741-base-pair open reading frame encoding a putative 247-residue polypeptide. The first 20 amino acids of the polypeptide form a leader sequence. The mature protein is predicted to have an unglycosylated Mr of approximately equal to 26,000 and contains a single potential site for N-linked glycosylation. The nucleotide and predicted amino acid sequences of clone HSE26.1 are homologous with all murine and human serine esterases cloned thus far but are most similar to mouse granzyme B (70% nucleotide and 68% amino acid identity). HSE26.1 protein is expressed weakly in unstimulated peripheral blood mononuclear cells but is strongly induced within 6-hr incubation in medium containing phytohemagglutinin. The data suggest that the protein encoded by HSE26.1 plays a role in cell-mediated cytotoxicity. Images PMID:3261871
NASA Astrophysics Data System (ADS)
Abaci, Hasan Erbil; Shen, Yu-I.; Tan, Scott; Gerecht, Sharon
2014-05-01
Studying human vascular disease in conventional cell cultures and in animal models does not effectively mimic the complex vascular microenvironment and may not accurately predict vascular responses in humans. We utilized a microfluidic device to recapitulate both shear stress and O2 levels in health and disease, establishing a microfluidic vascular model (μVM). Maintaining human endothelial cells (ECs) in healthy-mimicking conditions resulted in conversion to a physiological phenotype namely cell elongation, reduced proliferation, lowered angiogenic gene expression and formation of actin cortical rim and continuous barrier. We next examined the responses of the healthy μVM to a vasotoxic cancer drug, 5-Fluorouracil (5-FU), in comparison with an in vivo mouse model. We found that 5-FU does not induce apoptosis rather vascular hyperpermeability, which can be alleviated by Resveratrol treatment. This effect was confirmed by in vivo findings identifying a vasoprotecting strategy by the adjunct therapy of 5-FU with Resveratrol. The μVM of ischemic disease demonstrated the transition of ECs from a quiescent to an activated state, with higher proliferation rate, upregulation of angiogenic genes, and impaired barrier integrity. The μVM offers opportunities to study and predict human ECs with physiologically relevant phenotypes in healthy, pathological and drug-treated environments.
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H
2017-01-09
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
IMGT/GeneInfo: enhancing V(D)J recombination database accessibility
Baum, Thierry-Pascal; Pasqual, Nicolas; Thuderoz, Florence; Hierle, Vivien; Chaume, Denys; Lefranc, Marie-Paule; Jouvin-Marche, Evelyne; Marche, Patrice-Noël; Demongeot, Jacques
2004-01-01
IMGT/GeneInfo is a user-friendly online information system that provides information on data resulting from the complex mechanisms of immunoglobulin (IG) and T cell receptor (TR) V(D)J recombinations. For the first time, it is possible to visualize all the rearrangement parameters on a single page. IMGT/GeneInfo is part of the international ImMunoGeneTics information system® (IMGT), a high-quality integrated knowledge resource specializing in IG, TR, major histocompatibility complex (MHC), and related proteins of the immune system of human and other vertebrate species. The IMGT/GeneInfo system was developed by the TIMC and ICH laboratories (with the collaboration of LIGM), and is the first example of an external system being incorporated into IMGT. In this paper, we report the first part of this work. IMGT/GeneInfo_TR deals with the human and mouse TRA/TRD and TRB loci of the TR. Data handling and visualization are complementary to the current data and tools in IMGT, and will subsequently allow the modelling of V(D)J gene use, and thus, to predict non-standard recombination profiles which may eventually be found in conditions such as leukaemias or lymphomas. Access to IMGT/GeneInfo is free and can be found at http://imgt.cines.fr/GeneInfo. PMID:14681357
Identification of potentially hazardous human gene products in GMO risk assessment.
Bergmans, Hans; Logie, Colin; Van Maanen, Kees; Hermsen, Harm; Meredyth, Michelle; Van Der Vlugt, Cécile
2008-01-01
Genetically modified organisms (GMOs), e.g. viral vectors, could threaten the environment if by their release they spread hazardous gene products. Even in contained use, to prevent adverse consequences, viral vectors carrying genes from mammals or humans should be especially scrutinized as to whether gene products that they synthesize could be hazardous in their new context. Examples of such potentially hazardous gene products (PHGPs) are: protein toxins, products of dominant alleles that have a role in hereditary diseases, gene products and sequences involved in genome rearrangements, gene products involved in immunomodulation or with an endocrine function, gene products involved in apoptosis, activated proto-oncogenes. For contained use of a GMO that carries a construct encoding a PHGP, the precautionary principle dictates that safety measures should be applied on a "worst case" basis, until the risks of the specific case have been assessed. The potential hazard of cloned genes can be estimated before empirical data on the actual GMO become available. Preliminary data may be used to focus hazard identification and risk assessment. Both predictive and empirical data may also help to identify what further information is needed to assess the risk of the GMO. A two-step approach, whereby a PHGP is evaluated for its conceptual dangers, then checked by data bank searches, is delineated here.
van de Vrugt, H J; Cheng, N C; de Vries, Y; Rooimans, M A; de Groot, J; Scheper, R J; Zhi, Y; Hoatlin, M E; Joenje, H; Arwert, F
2000-04-01
Fanconi anemia (FA) is an autosomal recessive disorder in humans characterized by bone marrow failure, cancer predisposition, and cellular hypersensitivity to cross-linking agents such as mitomycin C and diepoxybutane. FA genes display a caretaker function essential for maintenance of genomic integrity. We have cloned the murine homolog of FANCA, the gene mutated in the major FA complementation group (FA-A). The full-length mouse Fanca cDNA consists of 4503 bp and encodes a protein with a predicted molecular weight of 161 kDa. The deduced Fanca mouse protein shares 81% amino acid sequence similarity and 66% identity with the human protein. The nuclear localization signal and partial leucine zipper consensus motifs found in the human FANCA protein were also present in the murine homolog. In spite of the species difference, the murine Fanca cDNA was capable of correcting the cross-linker sensitive phenotype of human FA-A cells, suggesting functional conservation. Based on Northern as well as Western blots, Fanca was mainly expressed in lymphoid tissues, testis, and ovary. This expression pattern correlates with some of the clinical symptoms observed in FA patients. The availability of the murine Fanca cDNA now allows the gene to be studied in experimental mouse models.
IDENTIFYING CRITICAL CYSTEINE RESIDUES IN ARSENIC (+3 OXIDATION STATE) METHYLTRANSFERASE
Arsenic (+3 oxidation state) methyltransferase (AS3MT) catalyzes methylation of inorganic arsenic to mono, di, and trimethylated arsenicals. Orthologous AS3MT genes in genomes ranging from simple echinoderm to human predict a protein with five conserved cysteine (C) residues. In ...
NASA Astrophysics Data System (ADS)
Sinha, Subarna; Thomas, Daniel; Chan, Steven; Gao, Yang; Brunen, Diede; Torabi, Damoun; Reinisch, Andreas; Hernandez, David; Chan, Andy; Rankin, Erinn B.; Bernards, Rene; Majeti, Ravindra; Dill, David L.
2017-05-01
Two genes are synthetically lethal (SL) when defects in both are lethal to a cell but a single defect is non-lethal. SL partners of cancer mutations are of great interest as pharmacological targets; however, identifying them by cell line-based methods is challenging. Here we develop MiSL (Mining Synthetic Lethals), an algorithm that mines pan-cancer human primary tumour data to identify mutation-specific SL partners for specific cancers. We apply MiSL to 12 different cancers and predict 145,891 SL partners for 3,120 mutations, including known mutation-specific SL partners. Comparisons with functional screens show that MiSL predictions are enriched for SLs in multiple cancers. We extensively validate a SL interaction identified by MiSL between the IDH1 mutation and ACACA in leukaemia using gene targeting and patient-derived xenografts. Furthermore, we apply MiSL to pinpoint genetic biomarkers for drug sensitivity. These results demonstrate that MiSL can accelerate precision oncology by identifying mutation-specific targets and biomarkers.
Savageau, M A
1998-01-01
Induction of gene expression can be accomplished either by removing a restraining element (negative mode of control) or by providing a stimulatory element (positive mode of control). According to the demand theory of gene regulation, which was first presented in qualitative form in the 1970s, the negative mode will be selected for the control of a gene whose function is in low demand in the organism's natural environment, whereas the positive mode will be selected for the control of a gene whose function is in high demand. This theory has now been further developed in a quantitative form that reveals the importance of two key parameters: cycle time C, which is the average time for a gene to complete an ON/OFF cycle, and demand D, which is the fraction of the cycle time that the gene is ON. Here we estimate nominal values for the relevant mutation rates and growth rates and apply the quantitative demand theory to the lactose and maltose operons of Escherichia coli. The results define regions of the C vs. D plot within which selection for the wild-type regulatory mechanisms is realizable, and these in turn provide the first estimates for the minimum and maximum values of demand that are required for selection of the positive and negative modes of gene control found in these systems. The ratio of mutation rate to selection coefficient is the most relevant determinant of the realizable region for selection, and the most influential parameter is the selection coefficient that reflects the reduction in growth rate when there is superfluous expression of a gene. The quantitative theory predicts the rate and extent of selection for each mode of control. It also predicts three critical values for the cycle time. The predicted maximum value for the cycle time C is consistent with the lifetime of the host. The predicted minimum value for C is consistent with the time for transit through the intestinal tract without colonization. Finally, the theory predicts an optimum value of C that is in agreement with the observed frequency for E. coli colonizing the human intestinal tract. PMID:9691028
The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A.; Kim, Sungjoon; Wilson, Christopher J.; Lehár, Joseph; Kryukov, Gregory V.; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F.; Monahan, John E.; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A.; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H.; Cheng, Jill; Yu, Guoying K.; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D.; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C.; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P.; Gabriel, Stacey B.; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E.; Weber, Barbara L.; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L.; Meyerson, Matthew; Golub, Todd R.; Morrissey, Michael P.; Sellers, William R.; Schlegel, Robert; Garraway, Levi A.
2012-01-01
The systematic translation of cancer genomic data into knowledge of tumor biology and therapeutic avenues remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacologic annotation is available1. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacologic profiles for 24 anticancer drugs across 479 of the lines, this collection allowed identification of genetic, lineage, and gene expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Altogether, our results suggest that large, annotated cell line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of “personalized” therapeutic regimens2. PMID:22460905
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A; Kim, Sungjoon; Wilson, Christopher J; Lehár, Joseph; Kryukov, Gregory V; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F; Monahan, John E; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H; Cheng, Jill; Yu, Guoying K; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P; Gabriel, Stacey B; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E; Weber, Barbara L; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L; Meyerson, Matthew; Golub, Todd R; Morrissey, Michael P; Sellers, William R; Schlegel, Robert; Garraway, Levi A
2012-03-28
The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of 'personalized' therapeutic regimens.
NASA Astrophysics Data System (ADS)
Douglas, Joanne T.
The practical implementation of gene therapy in the clinical setting mandates gene delivery vehicles, or vectors, capable of efficient gene delivery selectively to the target disease cells. The utility of adenoviral vectors for gene therapy is restricted by their dependence on the native adenoviral primary cellular receptor for cell entry. Therefore, a number of strategies have been developed to allow CAR-independent infection of specific cell types, including the use of bispecific conjugates and genetic modifications to the adenoviral capsid proteins, in particular the fibre protein. These targeted adenoviral vectors have demonstrated efficient gene transfer in vitro , correlating with a therapeutic benefit in preclinical animal models. Such vectors are predicted to possess enhanced efficacy in human clinical studies, although anatomical barriers to their use must be circumvented.
Exclusion of RAI2 as the causative gene for Nance-Horan syndrome.
Walpole, S M; Ronce, N; Grayson, C; Dessay, B; Yates, J R; Trump, D; Toutain, A
1999-05-01
Nance-Horan syndrome (NHS) is an X-linked condition characterised by congenital cataracts, microphthalmia and/or microcornea, unusual dental morphology, dysmorphic facial features, and developmental delay in some cases. Recent linkage studies have mapped the NHS disease gene to a 3.5-cM interval on Xp22.2 between DXS1053 and DXS443. We previously identified a human homologue of a mouse retinoic-acid-induced gene (RAI2) within the NHS critical flanking interval and have tested the gene as a candidate for Nance-Horan syndrome in nine NHS-affected families. Direct sequencing of the RAI2 gene and predicted promoter region has revealed no mutations in the families screened; RAI2 is therefore unlikely to be associated with NHS.
Major psychological factors affecting acceptance of gene-recombination technology.
Tanaka, Yutaka
2004-12-01
The purpose of this study was to verify the validity of a causal model that was made to predict the acceptance of gene-recombination technology. A structural equation model was used as a causal model. First of all, based on preceding studies, the factors of perceived risk, perceived benefit, and trust were set up as important psychological factors determining acceptance of gene-recombination technology in the structural equation model. An additional factor, "sense of bioethics," which I consider to be important for acceptance of biotechnology, was added to the model. Based on previous studies, trust was set up to have an indirect influence on the acceptance of gene-recombination technology through perceived risk and perceived benefit in the model. Participants were 231 undergraduate students in Japan who answered a questionnaire with a 5-point bipolar scale. The results indicated that the proposed model fits the data well, and showed that acceptance of gene-recombination technology is explained largely by four factors, that is, perceived risk, perceived benefit, trust, and sense of bioethics, whether the technology is applied to plants, animals, or human beings. However, the relative importance of the four factors was found to vary depending on whether the gene-recombination technology was applied to plants, animals, or human beings. Specifically, the factor of sense of bioethics is the most important factor in acceptance of plant gene-recombination technology and animal gene-recombination technology, and the factors of trust and perceived risk are the most important factors in acceptance of human being gene-recombination technology.
Network-based association of hypoxia-responsive genes with cardiovascular diseases
NASA Astrophysics Data System (ADS)
Wang, Rui-Sheng; Oldham, William M.; Loscalzo, Joseph
2014-10-01
Molecular oxygen is indispensable for cellular viability and function. Hypoxia is a stress condition in which oxygen demand exceeds supply. Low cellular oxygen content induces a number of molecular changes to activate regulatory pathways responsible for increasing the oxygen supply and optimizing cellular metabolism under limited oxygen conditions. Hypoxia plays critical roles in the pathobiology of many diseases, such as cancer, heart failure, myocardial ischemia, stroke, and chronic lung diseases. Although the complicated associations between hypoxia and cardiovascular (and cerebrovascular) diseases (CVD) have been recognized for some time, there are few studies that investigate their biological link from a systems biology perspective. In this study, we integrate hypoxia genes, CVD genes, and the human protein interactome in order to explore the relationship between hypoxia and cardiovascular diseases at a systems level. We show that hypoxia genes are much closer to CVD genes in the human protein interactome than that expected by chance. We also find that hypoxia genes play significant bridging roles in connecting different cardiovascular diseases. We construct a hypoxia-CVD bipartite network and find several interesting hypoxia-CVD modules with significant gene ontology similarity. Finally, we show that hypoxia genes tend to have more CVD interactors in the human interactome than in random networks of matching topology. Based on these observations, we can predict novel genes that may be associated with CVD. This network-based association study gives us a broad view of the relationships between hypoxia and cardiovascular diseases and provides new insights into the role of hypoxia in cardiovascular biology.
Zhang, Chunyu; Elkahloun, Abdel G.; Robertson, Matthew; Gills, Joell J.; Tsurutani, Junji; Shih, Joanna H.; Fukuoka, Junya; Hollander, M. Christine; Harris, Curtis C.; Travis, William D.; Jen, Jin; Dennis, Phillip A.
2011-01-01
The dismal lethality of lung cancer is due to late stage at diagnosis and inherent therapeutic resistance. The incorporation of targeted therapies has modestly improved clinical outcomes, but the identification of new targets could further improve clinical outcomes by guiding stratification of poor-risk early stage patients and individualizing therapeutic choices. We hypothesized that a sequential, combined microarray approach would be valuable to identify and validate new targets in lung cancer. We profiled gene expression signatures during lung epithelial cell immortalization and transformation, and showed that genes involved in mitosis were progressively enhanced in carcinogenesis. 28 genes were validated by immunoblotting and 4 genes were further evaluated in non-small cell lung cancer tissue microarrays. Although CDK1 was highly expressed in tumor tissues, its loss from the cytoplasm unexpectedly predicted poor survival and conferred resistance to chemotherapy in multiple cell lines, especially microtubule-directed agents. An analysis of expression of CDK1 and CDK1-associated genes in the NCI60 cell line database confirmed the broad association of these genes with chemotherapeutic responsiveness. These results have implications for personalizing lung cancer therapy and highlight the potential of combined approaches for biomarker discovery. PMID:21887332
NASA Technical Reports Server (NTRS)
Story, Michael; Stivers, David N.
2004-01-01
This project was funded as a pilot project to determine the feasibility of using gene expression profiles to characterize the response of human cells to exposure to particulate radiations such as those encountered in the spaceflight environment. We proposed to use microarray technology to examine the gene expression patterns of a bank of well-characterized human fibroblast cell cultures. These fibroblast cultures were derived from breast or head and neck cancer patients who exhibited normal, minimal, or severe normal tissue reactions following low LET radiation exposure via radiotherapy. Furthermore, determination of SF2 values from fibroblasts cultured from these individuals were predictive of risk for severe late reactions. We hypothesized that by determining the expression of thousands of genes we could identify gene expression patterns that reflect how normal tissues respond to high Z and energy (HZE) particles, that is, that there are molecular signatures for HZE exposures. We also hypothesized that individuals who are intrinsically radiosensitive may elicit a unique response. Because this was funded as a pilot project we focused our initial studies on logistics and appropriate experimental design, and then to test our hypothesis that there is a unique molecular response to specific particles, in this case C and Fe, for primary human skin fibroblasts.
ERK Oscillation-Dependent Gene Expression Patterns and Deregulation by Stress-Response
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waters, Katrina M.; Cummings, Brian S.; Shankaran, Harish
2014-09-15
Studies were undertaken to determine whether ERK oscillations regulate a unique subset of genes in human keratinocytes and subsequently, whether the p38 stress response inhibits ERK oscillations. A DNA microarray identified many genes that were unique to ERK oscillations, and network reconstruction predicted an important role for the mediator complex subunit 1 (MED1) node in mediating ERK oscillation-dependent gene expression. Increased ERK-dependent phosphorylation of MED1 was observed in oscillating cells compared to non-oscillating counterparts as validation. Treatment of keratinocytes with a p38 inhibitor (SB203580) increased ERK oscillation amplitudes and MED1 and phospho-MED1 protein levels. Bromate is a probable human carcinogenmore » that activates p38. Bromate inhibited ERK oscillations in human keratinocytes and JB6 cells and induced an increase in phospho-p38 and decrease in phospho-MED1 protein levels. Treatment of normal rat kidney cells and primary salivary gland epithelial cells with bromate decreased phospho-MED1 levels in a reversible fashion upon treatment with p38 inhibitors (SB202190; SB203580). Our results indicate that oscillatory behavior in the ERK pathway alters homeostatic gene regulation patterns and that the cellular response to perturbation may manifest differently in oscillating vs non-oscillating cells.« less
Genomic and proteomic characterization of a thermophilic Geobacillus bacteriophage GBSV1.
Liu, Bin; Zhou, Fengfeng; Wu, Suijie; Xu, Ying; Zhang, Xiaobo
2009-03-01
Phages are present wherever life is found, and play roles in many biogeochemical and ecological processes. The thermophilic bacteriophages, however, have not been well studied. In this study, phage GBSV1 was obtained from a thermophilic bacterium Geobacillus sp. 6k51 isolated from a hot spring. GBSV1 contains a double-stranded linear DNA of 34,683bp, which encodes 54 putative open reading frames (ORFs). Thirty three of these 54 ORFs exhibit sequence similarities to genes from 7 species of Geobacillus or Bacillus bacteria, as well as of bacteriophages infecting these bacteria. Twenty-two ORFs have been functionally annotated based on both their sequence similarities to known genes and predicted Pfam protein domains. Five structural proteins of the purified GBSV1 virion have been identified by proteomic analyses. Surprisingly, 7 of the GBSV1 ORFs share sequence similarities with genes from bacteria relevant to human diseases. This is the first report that genes of human disease-inducing bacteria are found in a thermophilic phage. It is suggested that thermophilic phages may be the potential evolutionary link between thermophiles and human pathogens. The characterization of GBSV1 may possibly lead to new insights into virus-host interactions and to a better understanding of gene transfers and evolution of life on earth in general.
Zhao, Ni; Wilkerson, Matthew D; Shah, Usman; Yin, Xiaoying; Wang, Anyou; Hayward, Michele C; Roberts, Patrick; Lee, Carrie B; Parsons, Alden M; Thorne, Leigh B; Haithcock, Benjamin E; Grilley-Olson, Juneko E; Stinchcombe, Thomas E; Funkhouser, William K; Wong, Kwok-Kin; Sharpless, Norman E; Hayes, D Neil
2014-11-01
Brain metastases are one of the most malignant complications of lung cancer and constitute a significant cause of cancer related morbidity and mortality worldwide. Recent years of investigation suggested a role of LKB1 in NSCLC development and progression, in synergy with KRAS alteration. In this study, we systematically analyzed how LKB1 and KRAS alteration, measured by mutation, gene expression (GE) and copy number (CN), are associated with brain metastasis in NSCLC. Patients treated at University of North Carolina Hospital from 1990 to 2009 with NSCLC provided frozen, surgically extracted tumors for analysis. GE was measured using Agilent 44,000 custom-designed arrays, CN was assessed by Affymetrix GeneChip Human Mapping 250K Sty Array or the Genome-Wide Human SNP Array 6.0 and gene mutation was detected using ABI sequencing. Integrated analysis was conducted to assess the relationship between these genetic markers and brain metastasis. A model was proposed for brain metastasis prediction using these genetic measurements. 17 of the 174 patients developed brain metastasis. LKB1 wild type tumors had significantly higher LKB1 CN (p<0.001) and GE (p=0.002) than the LKB1 mutant group. KRAS wild type tumors had significantly lower KRAS GE (p<0.001) and lower CN, although the latter failed to be significant (p=0.295). Lower LKB1 CN (p=0.039) and KRAS mutation (p=0.007) were significantly associated with more brain metastasis. The predictive model based on nodal (N) stage, patient age, LKB1 CN and KRAS mutation had a good prediction accuracy, with area under the ROC curve of 0.832 (p<0.001). LKB1 CN in combination with KRAS mutation predicted brain metastasis in NSCLC. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Saulnier, Delphine M; Santos, Filipe; Roos, Stefan; Mistretta, Toni-Ann; Spinler, Jennifer K; Molenaar, Douwe; Teusink, Bas; Versalovic, James
2011-04-29
The genomes of four Lactobacillus reuteri strains isolated from human breast milk and the gastrointestinal tract have been recently sequenced as part of the Human Microbiome Project. Preliminary genome comparisons suggested that these strains belong to two different clades, previously shown to differ with respect to antimicrobial production, biofilm formation, and immunomodulation. To explain possible mechanisms of survival in the host and probiosis, we completed a detailed genomic comparison of two breast milk-derived isolates representative of each group: an established probiotic strain (L. reuteri ATCC 55730) and a strain with promising probiotic features (L. reuteri ATCC PTA 6475). Transcriptomes of L. reuteri strains in different growth phases were monitored using strain-specific microarrays, and compared using a pan-metabolic model representing all known metabolic reactions present in these strains. Both strains contained candidate genes involved in the survival and persistence in the gut such as mucus-binding proteins and enzymes scavenging reactive oxygen species. A large operon predicted to encode the synthesis of an exopolysaccharide was identified in strain 55730. Both strains were predicted to produce health-promoting factors, including antimicrobial agents and vitamins (folate, vitamin B(12)). Additionally, a complete pathway for thiamine biosynthesis was predicted in strain 55730 for the first time in this species. Candidate genes responsible for immunomodulatory properties of each strain were identified by transcriptomic comparisons. The production of bioactive metabolites by human-derived probiotics may be predicted using metabolic modeling and transcriptomics. Such strategies may facilitate selection and optimization of probiotics for health promotion, disease prevention and amelioration.
Protein disorder in the human diseasome: unfoldomics of human genetic diseases
Midic, Uros; Oldfield, Christopher J; Dunker, A Keith; Obradovic, Zoran; Uversky, Vladimir N
2009-01-01
Background Intrinsically disordered proteins lack stable structure under physiological conditions, yet carry out many crucial biological functions, especially functions associated with regulation, recognition, signaling and control. Recently, human genetic diseases and related genes were organized into a bipartite graph (Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci U S A 104: 8685–8690). This diseasome network revealed several significant features such as the common genetic origin of many diseases. Methods and findings We analyzed the abundance of intrinsic disorder in these diseasome network proteins by means of several prediction algorithms, and we analyzed the functional repertoires of these proteins based on prior studies relating disorder to function. Our analyses revealed that (i) Intrinsic disorder is common in proteins associated with many human genetic diseases; (ii) Different disease classes vary in the IDP contents of their associated proteins; (iii) Molecular recognition features, which are relatively short loosely structured protein regions within mostly disordered sequences and which gain structure upon binding to partners, are common in the diseasome, and their abundance correlates with the intrinsic disorder level; (iv) Some disease classes have a significant fraction of genes affected by alternative splicing, and the alternatively spliced regions in the corresponding proteins are predicted to be highly disordered; and (v) Correlations were found among the various diseasome graph-related properties and intrinsic disorder. Conclusion These observations provide the basis for the construction of the human-genetic-disease-associated unfoldome. PMID:19594871
Prospective Molecular Profiling of Melanoma Metastases Suggests Classifiers of Immune Responsiveness
Wang, Ena; Miller, Lance D.; Ohnmacht, Galen A.; Mocellin, Simone; Perez-Diez, Ainhoa; Petersen, David; Zhao, Yingdong; Simon, Richard; Powell, John I.; Asaki, Esther; Alexander, H. Richard; Duray, Paul H.; Herlyn, Meenhard; Restifo, Nicholas P.; Liu, Edison T.; Rosenberg, Steven A.; Marincola, Francesco M.
2008-01-01
We amplified RNAs from 63 fine needle aspiration (FNA) samples from 37 s.c. melanoma metastases from 25 patients undergoing immunotherapy for hybridization to a 6108-gene human cDNA chip. By prospectively following the history of the lesions, we could correlate transcript patterns with clinical outcome. Cluster analysis revealed a tight relationship among autologous synchronously sampled tumors compared with unrelated lesions (average Pearson's r = 0.83 and 0.7, respectively, P < 0.0003). As reported previously, two subgroups of metastatic melanoma lesions were identified that, however, had no predictive correlation with clinical outcome. Ranking of gene expression data from pretreatment samples identified ∼30 genes predictive of clinical response (P < 0.001). Analysis of their annotations denoted that approximately half of them were related to T-cell regulation, suggesting that immune responsiveness might be predetermined by a tumor microenvironment conducive to immune recognition. PMID:12097256
Ramayo-Caldas, Yuliaxis; Ballester, Maria; Fortes, Marina R S; Esteve-Codina, Anna; Castelló, Anna; Noguera, Jose L; Fernández, Ana I; Pérez-Enciso, Miguel; Reverter, Antonio; Folch, Josep M
2014-03-26
Fatty acids (FA) play a critical role in energy homeostasis and metabolic diseases; in the context of livestock species, their profile also impacts on meat quality for healthy human consumption. Molecular pathways controlling lipid metabolism are highly interconnected and are not fully understood. Elucidating these molecular processes will aid technological development towards improvement of pork meat quality and increased knowledge of FA metabolism, underpinning metabolic diseases in humans. The results from genome-wide association studies (GWAS) across 15 phenotypes were subjected to an Association Weight Matrix (AWM) approach to predict a network of 1,096 genes related to intramuscular FA composition in pigs. To identify the key regulators of FA metabolism, we focused on the minimal set of transcription factors (TF) that the explored the majority of the network topology. Pathway and network analyses pointed towards a trio of TF as key regulators of FA metabolism: NCOA2, FHL2 and EP300. Promoter sequence analyses confirmed that these TF have binding sites for some well-know regulators of lipid and carbohydrate metabolism. For the first time in a non-model species, some of the co-associations observed at the genetic level were validated through co-expression at the transcriptomic level based on real-time PCR of 40 genes in adipose tissue, and a further 55 genes in liver. In particular, liver expression of NCOA2 and EP300 differed between pig breeds (Iberian and Landrace) extreme in terms of fat deposition. Highly clustered co-expression networks in both liver and adipose tissues were observed. EP300 and NCOA2 showed centrality parameters above average in the both networks. Over all genes, co-expression analyses confirmed 28.9% of the AWM predicted gene-gene interactions in liver and 33.0% in adipose tissue. The magnitude of this validation varied across genes, with up to 60.8% of the connections of NCOA2 in adipose tissue being validated via co-expression. Our results recapitulate the known transcriptional regulation of FA metabolism, predict gene interactions that can be experimentally validated, and suggest that genetic variants mapped to EP300, FHL2, and NCOA2 modulate lipid metabolism and control energy homeostasis in pigs.
Chan, Wen-Ling; Yang, Wen-Kuang; Huang, Hsien-Da; Chang, Jan-Gowth
2013-01-01
RNA interference (RNAi) is a gene silencing process within living cells, which is controlled by the RNA-induced silencing complex with a sequence-specific manner. In flies and mice, the pseudogene transcripts can be processed into short interfering RNAs (siRNAs) that regulate protein-coding genes through the RNAi pathway. Following these findings, we construct an innovative and comprehensive database to elucidate siRNA-mediated mechanism in human transcribed pseudogenes (TPGs). To investigate TPG producing siRNAs that regulate protein-coding genes, we mapped the TPGs to small RNAs (sRNAs) that were supported by publicly deep sequencing data from various sRNA libraries and constructed the TPG-derived siRNA-target interactions. In addition, we also presented that TPGs can act as a target for miRNAs that actually regulate the parental gene. To enable the systematic compilation and updating of these results and additional information, we have developed a database, pseudoMap, capturing various types of information, including sequence data, TPG and cognate annotation, deep sequencing data, RNA-folding structure, gene expression profiles, miRNA annotation and target prediction. As our knowledge, pseudoMap is the first database to demonstrate two mechanisms of human TPGs: encoding siRNAs and decoying miRNAs that target the parental gene. pseudoMap is freely accessible at http://pseudomap.mbc.nctu.edu.tw/. Database URL: http://pseudomap.mbc.nctu.edu.tw/
Bioinformatic prediction of leader genes in human periodontitis.
Covani, Ugo; Marconcini, Simone; Giacomelli, Luca; Sivozhelevov, Victor; Barone, Antonio; Nicolini, Claudio
2008-10-01
Genes involved in different biologic processes form complex interaction networks. However, only a few have a high number of interactions with the other genes in the network. In previous bioinformatics and experimental studies concerning the T lymphocyte cell cycle, these genes were identified and termed "leader genes." In this work, genes involved in human periodontitis were tentatively identified and ranked according to their number of interactions to obtain a preliminary, broader view of molecular mechanisms of periodontitis and plan targeted experimentation. Genes were identified with interrelated queries of several databases. The interactions among these genes were mapped and given a significance score. The weighted number of links (weighted sum of scores for every interaction in which the given gene is involved) was calculated for each gene. Genes were clustered according to this parameter. The genes in the highest cluster were termed leader genes. Sixty-one genes involved or potentially involved in periodontitis were identified. Only five were identified as leader genes, whereas 12 others were ranked in an immediately lower cluster. For 10 of 17 genes there is evidence of involvement in periodontitis; seven new genes that are potentially involved in this disease were identified. The involvement in periodontitis has been completely established for only two leader genes. We applied a validated bioinformatics algorithm to increase our knowledge of molecular mechanisms of periodontitis. Even with the limitations of this ab initio analysis, this theoretical study can suggest ad hoc experimentation targeted on significant genes and, therefore, simpler than mass-scale molecular genomics. Moreover, the identification of leader genes might suggest new potential risk factors and therapeutic targets.
Chen, Ying; Lei, Yun-Ping; Zheng, Hong-Xiang; Wang, Wei; Cheng, Hong-Bo; Zhang, Jing; Wang, Hong-Yan; Jin, Li; Li, Hong
2009-06-01
Congenital contractural arachnodactyly (Beals syndrome) is a rare autosomal dominantly inherited connective tissue disorder characterized by flexion contractures, arachnodactyly, crumpled ears, and mild muscular hypoplasia. Here, a father and son with congenital contractural arachnodactyly features were identified. After sequencing 15 exons (22 to 36) of the FBN2 gene, a novel mutation (C1425Y) was found in exon 33. This de novo mutation presented first in the father and was transmitted to his son, but not in the other 14 unaffected family members and 365 normal people. The C1425Y mutation occurs at the 19th cbEGF domain. Cysteines in this cbEGF domain are rather conserved in species, from human down to ascidian. The cbEGF12-13 in human FBN1 was employed as the template to perform homology modeling of cbEGF18-19 of human FBN2 protein. The mutation has also been evaluated by further prediction tools, for example, SIFT, Blosum62, biochemical Yu's matrice, and UMD-Predictor tool. In all analysis, the mutation is predicted to be pathogenic. Thus, the structure destabilization by C1425Y might be the cause of the disorder.
Gotoh, Aina; Nara, Misaki; Sugiyama, Yuta; Sakanaka, Mikiyasu; Yachi, Hiroyuki; Kitakata, Aya; Nakagawa, Akira; Minami, Hiromichi; Okuda, Shujiro; Katoh, Toshihiko; Katayama, Takane; Kurihara, Shin
2017-10-01
Recently, a "human gut microbial gene catalogue," which ranks the dominance of microbe genus/species in human fecal samples, was published. Most of the bacteria ranked in the catalog are currently publicly available; however, the growth media recommended by the distributors vary among species, hampering physiological comparisons among the bacteria. To address this problem, we evaluated Gifu anaerobic medium (GAM) as a standard medium. Forty-four publicly available species of the top 56 species listed in the "human gut microbial gene catalogue" were cultured in GAM, and out of these, 32 (72%) were successfully cultured. Short-chain fatty acids from the bacterial culture supernatants were then quantified, and bacterial metabolic pathways were predicted based on in silico genomic sequence analysis. Our system provides a useful platform for assessing growth properties and analyzing metabolites of dominant human gut bacteria grown in GAM and supplemented with compounds of interest.
Common features of microRNA target prediction tools
Peterson, Sarah M.; Thompson, Jeffrey A.; Ufkin, Melanie L.; Sathyanarayana, Pradeep; Liaw, Lucy; Congdon, Clare Bates
2014-01-01
The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output. PMID:24600468
Common features of microRNA target prediction tools.
Peterson, Sarah M; Thompson, Jeffrey A; Ufkin, Melanie L; Sathyanarayana, Pradeep; Liaw, Lucy; Congdon, Clare Bates
2014-01-01
The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output.
Pergola, Giulio; Selvaggi, Pierluigi; Gelao, Barbara; Di Carlo, Pasquale; Nettis, Maria Antonietta; Amico, Graziella; Felici, Valentina; Fazio, Leonardo; Rampino, Antonio; Sambataro, Fabio; Blasi, Giuseppe; Bertolino, Alessandro
2017-01-01
Abstract Background: Dopamine D2 receptors (D2R) contribute to the inverted-U shaped relationship between dopamine dorsolateral prefrontal cortex (DLPFC) and working memory (WM). Genetic variation in DRD2 coding for D2Rs modulates D2 signaling, but other genes in its pathway may be involved. In a previous work, using gene co-expression networks we identified 84 partner genes coregulated with DRD2 and eight single nucleotide polymorphisms (SNPs) predicting coexpression of the whole gene set in the human DLPFC [1]. These SNPs combined into a polygenic coexpression index (PCI) predicted WM performance and DLPFC activity in two independent samples of living healthy humans [1]. Here, we asked whether response to D2R targeting drugs is associated with this PCI. Thus, we investigated the interaction between WM behavioral/brain response to the D2R agonist Bromocriptine (BRO) and the PCI. [1] Pergola G, Di Carlo P, et al. (In press). Translational Psychiatry. Methods: Fifty healthy volunteers entered a double-blind, crossover, randomized, placebo-controlled fMRI study with BRO 1.25 mg and performed the N-Back WM task during the fMRI scanning session. We computed the PCI for all participants and investigated its association with WM-related behavior and brain activity using general linear models. Results: A PCI by drug interaction was significant on both DLPFC signal (right BA46, 242 voxels, F(1, 48) = 24; right BA9, 177 voxels, F(1, 48) = 19; P < .05 cluster-level FWE corrected) and behavioral scores, F(1, 46) = 4.6, P = .045, using a U-shaped quadratic model. The U-shaped relationship between the PCI and WM processing found on placebo was reversed on BRO. Furthermore, the increase in behavioral performance on BRO correlated with a decrease in BA46 activity, t(48) = −2.0, P = .049). Conclusion: The combined effect of multiple alleles on DRD2 coexpression covaried with drug response such that different allelic patterns were associated with similar responses, as in the inverted U-shaped model of WM. Thus, multiple genes and multiple allelic patterns are implicated in the inverted U-shaped dopamine/WM relationship. This relationship is reversed when individuals are administered BRO, suggesting that brain and behavioral response to this pharmacological challenge depends on a pleiotropic individual genetic background. Hence, pharmacogenomics in schizophrenia should take into account allelic patterns associated with molecular phenomena such as gene expression to predict drug response.
Fanconi anemia gene editing by the CRISPR/Cas9 system.
Osborn, Mark J; Gabriel, Richard; Webber, Beau R; DeFeo, Anthony P; McElroy, Amber N; Jarjour, Jordan; Starker, Colby G; Wagner, John E; Joung, J Keith; Voytas, Daniel F; von Kalle, Christof; Schmidt, Manfred; Blazar, Bruce R; Tolar, Jakub
2015-02-01
Genome engineering with designer nucleases is a rapidly progressing field, and the ability to correct human gene mutations in situ is highly desirable. We employed fibroblasts derived from a patient with Fanconi anemia as a model to test the ability of the clustered regularly interspaced short palindromic repeats/Cas9 nuclease system to mediate gene correction. We show that the Cas9 nuclease and nickase each resulted in gene correction, but the nickase, because of its ability to preferentially mediate homology-directed repair, resulted in a higher frequency of corrected clonal isolates. To assess the off-target effects, we used both a predictive software platform to identify intragenic sequences of homology as well as a genome-wide screen utilizing linear amplification-mediated PCR. We observed no off-target activity and show RNA-guided endonuclease candidate sites that do not possess low sequence complexity function in a highly specific manner. Collectively, we provide proof of principle for precision genome editing in Fanconi anemia, a DNA repair-deficient human disorder.
Experimental annotation of the human genome using microarray technology.
Shoemaker, D D; Schadt, E E; Armour, C D; He, Y D; Garrett-Engele, P; McDonagh, P D; Loerch, P M; Leonardson, A; Lum, P Y; Cavet, G; Wu, L F; Altschuler, S J; Edwards, S; King, J; Tsang, J S; Schimmack, G; Schelter, J M; Koch, J; Ziman, M; Marton, M J; Li, B; Cundiff, P; Ward, T; Castle, J; Krolewski, M; Meyer, M R; Mao, M; Burchard, J; Kidd, M J; Dai, H; Phillips, J W; Linsley, P S; Stoughton, R; Scherer, S; Boguski, M S
2001-02-15
The most important product of the sequencing of a genome is a complete, accurate catalogue of genes and their products, primarily messenger RNA transcripts and their cognate proteins. Such a catalogue cannot be constructed by computational annotation alone; it requires experimental validation on a genome scale. Using 'exon' and 'tiling' arrays fabricated by ink-jet oligonucleotide synthesis, we devised an experimental approach to validate and refine computational gene predictions and define full-length transcripts on the basis of co-regulated expression of their exons. These methods can provide more accurate gene numbers and allow the detection of mRNA splice variants and identification of the tissue- and disease-specific conditions under which genes are expressed. We apply our technique to chromosome 22q under 69 experimental condition pairs, and to the entire human genome under two experimental conditions. We discuss implications for more comprehensive, consistent and reliable genome annotation, more efficient, full-length complementary DNA cloning strategies and application to complex diseases.
Augustin, Regina; Lichtenthaler, Stefan F.; Greeff, Michael; Hansen, Jens; Wurst, Wolfgang; Trümbach, Dietrich
2011-01-01
The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases. PMID:21559189
Arboleya, Silvia; Bottacini, Francesca; O'Connell-Motherway, Mary; Ryan, C Anthony; Ross, R Paul; van Sinderen, Douwe; Stanton, Catherine
2018-01-08
Bifidobacterium longum is a common member of the human gut microbiota and is frequently present at high numbers in the gut microbiota of humans throughout life, thus indicative of a close symbiotic host-microbe relationship. Different mechanisms may be responsible for the high competitiveness of this taxon in its human host to allow stable establishment in the complex and dynamic intestinal microbiota environment. The objective of this study was to assess the genetic and metabolic diversity in a set of 20 B. longum strains, most of which had previously been isolated from infants, by performing whole genome sequencing and comparative analysis, and to analyse their carbohydrate utilization abilities using a gene-trait matching approach. We analysed their pan-genome and their phylogenetic relatedness. All strains clustered in the B. longum ssp. longum phylogenetic subgroup, except for one individual strain which was found to cluster in the B. longum ssp. suis phylogenetic group. The examined strains exhibit genomic diversity, while they also varied in their sugar utilization profiles. This allowed us to perform a gene-trait matching exercise enabling the identification of five gene clusters involved in the utilization of xylo-oligosaccharides, arabinan, arabinoxylan, galactan and fucosyllactose, the latter of which is an abundant human milk oligosaccharide (HMO). The results showed high diversity in terms of genes and predicted glycosyl-hydrolases, as well as the ability to metabolize a large range of sugars. Moreover, we corroborate the capability of B. longum ssp. longum to metabolise HMOs. Ultimately, their intraspecific genomic diversity and the ability to consume a wide assortment of carbohydrates, ranging from plant-derived carbohydrates to HMOs, may provide an explanation for the competitive advantage and persistence of B. longum in the human gut microbiome.
Predicting Viral Infection From High-Dimensional Biomarker Trajectories
Chen, Minhua; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S.; Lucas, Joseph; Dunson, David; Carin, Lawrence
2013-01-01
There is often interest in predicting an individual’s latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed with healthy human volunteers, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection. Using a nonparametric cure rate model for the latent initiation times, we allow selection of the genes in the viral response pathway, variability among individuals in infection times, and a subset of individuals who are not infected. As we demonstrate using held-out data, this statistical framework allows accurate predictions of infected individuals in advance of the development of clinical symptoms, without labeled data and even when the number of biomarkers vastly exceeds the number of individuals under study. Biological interpretation of several of the inferred pathways (factors) is provided. PMID:23704802
Testing chemical carcinogenicity by using a transcriptomics HepaRG-based model?
Doktorova, T. Y.; Yildirimman, Reha; Ceelen, Liesbeth; Vilardell, Mireia; Vanhaecke, Tamara; Vinken, Mathieu; Ates, Gamze; Heymans, Anja; Gmuender, Hans; Bort, Roque; Corvi, Raffaella; Phrakonkham, Pascal; Li, Ruoya; Mouchet, Nicolas; Chesne, Christophe; van Delft, Joost; Kleinjans, Jos; Castell, Jose; Herwig, Ralf; Rogiers, Vera
2014-01-01
The EU FP6 project carcinoGENOMICS explored the combination of toxicogenomics and in vitro cell culture models for identifying organotypical genotoxic- and non-genotoxic carcinogen-specific gene signatures. Here the performance of its gene classifier, derived from exposure of metabolically competent human HepaRG cells to prototypical non-carcinogens (10 compounds) and hepatocarcinogens (20 compounds), is reported. Analysis of the data at the gene and the pathway level by using independent biostatistical approaches showed a distinct separation of genotoxic from non-genotoxic hepatocarcinogens and non-carcinogens (up to 88 % correct prediction). The most characteristic pathway responding to genotoxic exposure was DNA damage. Interlaboratory reproducibility was assessed by blindly testing of three compounds, from the set of 30 compounds, by three independent laboratories. Subsequent classification of these compounds resulted in correct prediction of the genotoxicants. As expected, results on the non-genotoxic carcinogens and the non-carcinogens were less predictive. In conclusion, the combination of transcriptomics with the HepaRG in vitro cell model provides a potential weight of evidence approach for the evaluation of the genotoxic potential of chemical substances. PMID:26417288
TP53 mutations, expression and interaction networks in human cancers
Wang, Xiaosheng; Sun, Qingrong
2017-01-01
Although the associations of p53 dysfunction, p53 interaction networks and oncogenesis have been widely explored, a systematic analysis of TP53 mutations and its related interaction networks in various types of human cancers is lacking. Our study explored the associations of TP53 mutations, gene expression, clinical outcomes, and TP53 interaction networks across 33 cancer types using data from The Cancer Genome Atlas (TCGA). We show that TP53 is the most frequently mutated gene in a number of cancers, and its mutations appear to be early events in cancer initiation. We identified genes potentially repressed by p53, and genes whose expression correlates significantly with TP53 expression. These gene products may be especially important nodes in p53 interaction networks in human cancers. This study shows that while TP53-truncating mutations often result in decreased TP53 expression, other non-truncating TP53 mutations result in increased TP53 expression in some cancers. Survival analyses in a number of cancers show that patients with TP53 mutations are more likely to have worse prognoses than TP53-wildtype patients, and that elevated TP53 expression often leads to poor clinical outcomes. We identified a set of candidate synthetic lethal (SL) genes for TP53, and validated some of these SL interactions using data from the Cancer Cell Line Project. These predicted SL genes are promising candidates for experimental validation and the development of personalized therapeutics for patients with TP53-mutated cancers. PMID:27880943
TP53 mutations, expression and interaction networks in human cancers.
Wang, Xiaosheng; Sun, Qingrong
2017-01-03
Although the associations of p53 dysfunction, p53 interaction networks and oncogenesis have been widely explored, a systematic analysis of TP53 mutations and its related interaction networks in various types of human cancers is lacking. Our study explored the associations of TP53 mutations, gene expression, clinical outcomes, and TP53 interaction networks across 33 cancer types using data from The Cancer Genome Atlas (TCGA). We show that TP53 is the most frequently mutated gene in a number of cancers, and its mutations appear to be early events in cancer initiation. We identified genes potentially repressed by p53, and genes whose expression correlates significantly with TP53 expression. These gene products may be especially important nodes in p53 interaction networks in human cancers. This study shows that while TP53-truncating mutations often result in decreased TP53 expression, other non-truncating TP53 mutations result in increased TP53 expression in some cancers. Survival analyses in a number of cancers show that patients with TP53 mutations are more likely to have worse prognoses than TP53-wildtype patients, and that elevated TP53 expression often leads to poor clinical outcomes. We identified a set of candidate synthetic lethal (SL) genes for TP53, and validated some of these SL interactions using data from the Cancer Cell Line Project. These predicted SL genes are promising candidates for experimental validation and the development of personalized therapeutics for patients with TP53-mutated cancers.
Su, Zhiguo; Dai, Tianjiao; Tang, Yushi; Tao, Yile; Huang, Bei; Mu, Qinglin; Wen, Donghui
2018-06-01
Coastal ecosystem structures and functions are changing under natural and anthropogenic influences. In this study, surface sediment samples were collected from disturbed zone (DZ), near estuary zone (NEZ), and far estuary zone (FEZ) of Hangzhou Bay, one of the most seriously polluted bays in China. The bacterial community structures and predicted functions varied significantly in different zones. Firmicutes were found most abundantly in DZ, highlighting the impacts of anthropogenic activities. Sediment total phosphorus was most influential on the bacterial community structures. Predicted by PICRUSt analysis, DZ significantly exceeded FEZ and NEZ in the subcategory of Xenobiotics Biodegradation and Metabolism; and DZ enriched all the nitrate reduction related genes, except nrfA gene. Seawater salinity and inorganic nitrogen, respectively as the representative natural and anthropogenic factor, performed exact-oppositely in nitrogen metabolism functions. The changes of bacterial community compositions and predicted functions provide a new insight into human-induced pollution impacts on coastal ecosystem. Copyright © 2018 Elsevier Ltd. All rights reserved.
Translational systems pharmacology‐based predictive assessment of drug‐induced cardiomyopathy
Messinis, Dimitris E.; Melas, Ioannis N.; Hur, Junguk; Varshney, Navya; Alexopoulos, Leonidas G.
2018-01-01
Drug‐induced cardiomyopathy contributes to drug attrition. We compared two pipelines of predictive modeling: (1) applying elastic net (EN) to differentially expressed genes (DEGs) of drugs; (2) applying integer linear programming (ILP) to construct each drug's signaling pathway starting from its targets to downstream proteins, to transcription factors, and to its DEGs in human cardiomyocytes, and then subjecting the genes/proteins in the drugs' signaling networks to EN regression. We classified 31 drugs with availability of DEGs into 13 toxic and 18 nontoxic drugs based on a clinical cardiomyopathy incidence cutoff of 0.1%. The ILP‐augmented modeling increased prediction accuracy from 79% to 88% (sensitivity: 88%; specificity: 89%) under leave‐one‐out cross validation. The ILP‐constructed signaling networks of drugs were better predictors than DEGs. Per literature, the microRNAs that reportedly regulate expression of our six top predictors are of diagnostic value for natural heart failure or doxorubicin‐induced cardiomyopathy. This translational predictive modeling might uncover potential biomarkers. PMID:29341478
Origin and Functional Prediction of Pollen Allergens in Plants1[OPEN
Chen, Miaolin; Xu, Jie; Ren, Kang; Searle, Iain
2016-01-01
Pollen allergies have long been a major pandemic health problem for human. However, the evolutionary events and biological function of pollen allergens in plants remain largely unknown. Here, we report the genome-wide prediction of pollen allergens and their biological function in the dicotyledonous model plant Arabidopsis (Arabidopsis thaliana) and the monocotyledonous model plant rice (Oryza sativa). In total, 145 and 107 pollen allergens were predicted from rice and Arabidopsis, respectively. These pollen allergens are putatively involved in stress responses and metabolic processes such as cell wall metabolism during pollen development. Interestingly, these putative pollen allergen genes were derived from large gene families and became diversified during evolution. Sequence analysis across 25 plant species from green alga to angiosperms suggest that about 40% of putative pollen allergenic proteins existed in both lower and higher plants, while other allergens emerged during evolution. Although a high proportion of gene duplication has been observed among allergen-coding genes, our data show that these genes might have undergone purifying selection during evolution. We also observed that epitopes of an allergen might have a biological function, as revealed by comprehensive analysis of two known allergens, expansin and profilin. This implies a crucial role of conserved amino acid residues in both in planta biological function and allergenicity. Finally, a model explaining how pollen allergens were generated and maintained in plants is proposed. Prediction and systematic analysis of pollen allergens in model plants suggest that pollen allergens were evolved by gene duplication and then functional specification. This study provides insight into the phylogenetic and evolutionary scenario of pollen allergens that will be helpful to future characterization and epitope screening of pollen allergens. PMID:27436829
Origin and Functional Prediction of Pollen Allergens in Plants.
Chen, Miaolin; Xu, Jie; Devis, Deborah; Shi, Jianxin; Ren, Kang; Searle, Iain; Zhang, Dabing
2016-09-01
Pollen allergies have long been a major pandemic health problem for human. However, the evolutionary events and biological function of pollen allergens in plants remain largely unknown. Here, we report the genome-wide prediction of pollen allergens and their biological function in the dicotyledonous model plant Arabidopsis (Arabidopsis thaliana) and the monocotyledonous model plant rice (Oryza sativa). In total, 145 and 107 pollen allergens were predicted from rice and Arabidopsis, respectively. These pollen allergens are putatively involved in stress responses and metabolic processes such as cell wall metabolism during pollen development. Interestingly, these putative pollen allergen genes were derived from large gene families and became diversified during evolution. Sequence analysis across 25 plant species from green alga to angiosperms suggest that about 40% of putative pollen allergenic proteins existed in both lower and higher plants, while other allergens emerged during evolution. Although a high proportion of gene duplication has been observed among allergen-coding genes, our data show that these genes might have undergone purifying selection during evolution. We also observed that epitopes of an allergen might have a biological function, as revealed by comprehensive analysis of two known allergens, expansin and profilin. This implies a crucial role of conserved amino acid residues in both in planta biological function and allergenicity. Finally, a model explaining how pollen allergens were generated and maintained in plants is proposed. Prediction and systematic analysis of pollen allergens in model plants suggest that pollen allergens were evolved by gene duplication and then functional specification. This study provides insight into the phylogenetic and evolutionary scenario of pollen allergens that will be helpful to future characterization and epitope screening of pollen allergens. © 2016 American Society of Plant Biologists. All rights reserved.
Yu, Hui; Aleman-Meza, Boanerges; Gharib, Shahla; Labocha, Marta K; Cronin, Christopher J; Sternberg, Paul W; Zhong, Weiwei
2013-07-16
Genetic screens have been widely applied to uncover genetic mechanisms of movement disorders. However, most screens rely on human observations of qualitative differences. Here we demonstrate the application of an automatic imaging system to conduct a quantitative screen for genes regulating the locomotive behavior in Caenorhabditis elegans. Two hundred twenty-seven neuronal signaling genes with viable homozygous mutants were selected for this study. We tracked and recorded each animal for 4 min and analyzed over 4,400 animals of 239 genotypes to obtain a quantitative, 10-parameter behavioral profile for each genotype. We discovered 87 genes whose inactivation causes movement defects, including 50 genes that had never been associated with locomotive defects. Computational analysis of the high-content behavioral profiles predicted 370 genetic interactions among these genes. Network partition revealed several functional modules regulating locomotive behaviors, including sensory genes that detect environmental conditions, genes that function in multiple types of excitable cells, and genes in the signaling pathway of the G protein Gαq, a protein that is essential for animal life and behavior. We developed quantitative epistasis analysis methods to analyze the locomotive profiles and validated the prediction of the γ isoform of phospholipase C as a component in the Gαq pathway. These results provided a system-level understanding of how neuronal signaling genes coordinate locomotive behaviors. This study also demonstrated the power of quantitative approaches in genetic studies.
Immune Recognition of Gene Transfer Vectors: Focus on Adenovirus as a Paradigm
Aldhamen, Yasser Ali; Seregin, Sergey S.; Amalfitano, Andrea
2011-01-01
Recombinant Adenovirus (Ad) based vectors have been utilized extensively as a gene transfer platform in multiple pre-clinical and clinical applications. These applications are numerous, and inclusive of both gene therapy and vaccine based approaches to human or animal diseases. The widespread utilization of these vectors in both animal models, as well as numerous human clinical trials (Ad-based vectors surpass all other gene transfer vectors relative to numbers of patients treated, as well as number of clinical trials overall), has shed light on how this virus vector interacts with both the innate and adaptive immune systems. The ability to generate and administer large amounts of this vector likely contributes not only to their ability to allow for highly efficient gene transfer, but also their elicitation of host immune responses to the vector and/or the transgene the vector expresses in vivo. These facts, coupled with utilization of several models that allow for full detection of these responses has predicted several observations made in human trials, an important point as lack of similar capabilities by other vector systems may prevent detection of such responses until only after human trials are initiated. Finally, induction of innate or adaptive immune responses by Ad vectors may be detrimental in one setting (i.e., gene therapy) and be entirely beneficial in another (i.e., prophylactic or therapeutic vaccine based applications). Herein, we review the current understanding of innate and adaptive immune responses to Ad vectors, as well some recent advances that attempt to capitalize on this understanding so as to further broaden the safe and efficient use of Ad-based gene transfer therapies in general. PMID:22566830
Schroeder, Diane I.; Jayashankar, Kartika; Douglas, Kory C.; Thirkill, Twanda L.; York, Daniel; Dickinson, Pete J.; Williams, Lawrence E.; Samollow, Paul B.; Ross, Pablo J.; Bannasch, Danika L.; Douglas, Gordon C.; LaSalle, Janine M.
2015-01-01
Over the last 20-80 million years the mammalian placenta has taken on a variety of morphologies through both divergent and convergent evolution. Recently we have shown that the human placenta genome has a unique epigenetic pattern of large partially methylated domains (PMDs) and highly methylated domains (HMDs) with gene body DNA methylation positively correlating with level of gene expression. In order to determine the evolutionary conservation of DNA methylation patterns and transcriptional regulatory programs in the placenta, we performed a genome-wide methylome (MethylC-seq) analysis of human, rhesus macaque, squirrel monkey, mouse, dog, horse, and cow placentas as well as opossum extraembryonic membrane. We found that, similar to human placenta, mammalian placentas and opossum extraembryonic membrane have globally lower levels of methylation compared to somatic tissues. Higher relative gene body methylation was the conserved feature across all mammalian placentas, despite differences in PMD/HMDs and absolute methylation levels. Specifically, higher methylation over the bodies of genes involved in mitosis, vesicle-mediated transport, protein phosphorylation, and chromatin modification was observed compared with the rest of the genome. As in human placenta, higher methylation is associated with higher gene expression and is predictive of genic location across species. Analysis of DNA methylation in oocytes and preimplantation embryos shows a conserved pattern of gene body methylation similar to the placenta. Intriguingly, mouse and cow oocytes and mouse early embryos have PMD/HMDs but their placentas do not, suggesting that PMD/HMDs are a feature of early preimplantation methylation patterns that become lost during placental development in some species and following implantation of the embryo. PMID:26241857
Rozman, Vita; Kunej, Tanja
2018-05-10
Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).
Ifeonu, Olukemi O.; Simon, Raphael; Tennant, Sharon M.; Sheoran, Abhineet S.; Daly, Maria C.; Felix, Victor; Kissinger, Jessica C.; Widmer, Giovanni; Levine, Myron M.; Tzipori, Saul; Silva, Joana C.
2016-01-01
Human cryptosporidiosis, caused primarily by Cryptosporidium hominis and a subset of Cryptosporidium parvum, is a major cause of moderate-to-severe diarrhea in children under 5 years of age in developing countries and can lead to nutritional stunting and death. Cryptosporidiosis is particularly severe and potentially lethal in immunocompromised hosts. Biological and technical challenges have impeded traditional vaccinology approaches to identify novel targets for the development of vaccines against C. hominis, the predominant species associated with human disease. We deemed that the existence of genomic resources for multiple species in the genus, including a much-improved genome assembly and annotation for C. hominis, makes a reverse vaccinology approach feasible. To this end, we sought to generate a searchable online resource, termed C. hominis gene catalog, which registers all C. hominis genes and their properties relevant for the identification and prioritization of candidate vaccine antigens, including physical attributes, properties related to antigenic potential and expression data. Using bioinformatic approaches, we identified ∼400 C. hominis genes containing properties typical of surface-exposed antigens, such as predicted glycosylphosphatidylinositol (GPI)-anchor motifs, multiple transmembrane motifs and/or signal peptides targeting the encoded protein to the secretory pathway. This set can be narrowed further, e.g. by focusing on potential GPI-anchored proteins lacking homologs in the human genome, but with homologs in the other Cryptosporidium species for which genomic data are available, and with low amino acid polymorphism. Additional selection criteria related to recombinant expression and purification include minimizing predicted post-translation modifications and potential disulfide bonds. Forty proteins satisfying these criteria were selected from 3745 proteins in the updated C. hominis annotation. The immunogenic potential of a few of these is currently being tested. Database URL: http://cryptogc.igs.umaryland.edu PMID:28095366
Josset, Laurence; Menachery, Vineet D.; Gralinski, Lisa E.; Agnihothram, Sudhakar; Sova, Pavel; Carter, Victoria S.; Yount, Boyd L.; Graham, Rachel L.; Baric, Ralph S.; Katze, Michael G.
2013-01-01
ABSTRACT A novel human coronavirus (HCoV-EMC) was recently identified in the Middle East as the causative agent of a severe acute respiratory syndrome (SARS) resembling the illness caused by SARS coronavirus (SARS-CoV). Although derived from the CoV family, the two viruses are genetically distinct and do not use the same receptor. Here, we investigated whether HCoV-EMC and SARS-CoV induce similar or distinct host responses after infection of a human lung epithelial cell line. HCoV-EMC was able to replicate as efficiently as SARS-CoV in Calu-3 cells and similarly induced minimal transcriptomic changes before 12 h postinfection. Later in infection, HCoV-EMC induced a massive dysregulation of the host transcriptome, to a much greater extent than SARS-CoV. Both viruses induced a similar activation of pattern recognition receptors and the interleukin 17 (IL-17) pathway, but HCoV-EMC specifically down-regulated the expression of several genes within the antigen presentation pathway, including both type I and II major histocompatibility complex (MHC) genes. This could have an important impact on the ability of the host to mount an adaptive host response. A unique set of 207 genes was dysregulated early and permanently throughout infection with HCoV-EMC, and was used in a computational screen to predict potential antiviral compounds, including kinase inhibitors and glucocorticoids. Overall, HCoV-EMC and SARS-CoV elicit distinct host gene expression responses, which might impact in vivo pathogenesis and could orient therapeutic strategies against that emergent virus. PMID:23631916
Culture–gene coevolution of individualism–collectivism and the serotonin transporter gene
Chiao, Joan Y.; Blizinsky, Katherine D.
2010-01-01
Culture–gene coevolutionary theory posits that cultural values have evolved, are adaptive and influence the social and physical environments under which genetic selection operates. Here, we examined the association between cultural values of individualism–collectivism and allelic frequency of the serotonin transporter functional polymorphism (5-HTTLPR) as well as the role this culture–gene association may play in explaining global variability in prevalence of pathogens and affective disorders. We found evidence that collectivistic cultures were significantly more likely to comprise individuals carrying the short (S) allele of the 5-HTTLPR across 29 nations. Results further show that historical pathogen prevalence predicts cultural variability in individualism–collectivism owing to genetic selection of the S allele. Additionally, cultural values and frequency of S allele carriers negatively predict global prevalence of anxiety and mood disorder. Finally, mediation analyses further indicate that increased frequency of S allele carriers predicted decreased anxiety and mood disorder prevalence owing to increased collectivistic cultural values. Taken together, our findings suggest culture–gene coevolution between allelic frequency of 5-HTTLPR and cultural values of individualism–collectivism and support the notion that cultural values buffer genetically susceptible populations from increased prevalence of affective disorders. Implications of the current findings for understanding culture–gene coevolution of human brain and behaviour as well as how this coevolutionary process may contribute to global variation in pathogen prevalence and epidemiology of affective disorders, such as anxiety and depression, are discussed. PMID:19864286
Levine, Douglas A.; Mankoo, Parminder; Schultz, Nikolaus; Du, Ying; Zhang, Yiqun; Larsson, Erik; Sheridan, Robert; Xiao, Weimin; Spellman, Paul T.; Getz, Gad; Wheeler, David A.; Perou, Charles M.; Gibbs, Richard A.; Sander, Chris; Hayes, D. Neil; Gunaratne, Preethi H.
2012-01-01
Background The Cancer Genome Atlas (TCGA) Network recently comprehensively catalogued the molecular aberrations in 487 high-grade serous ovarian cancers, with much remaining to be elucidated regarding the microRNAs (miRNAs). Here, using TCGA ovarian data, we surveyed the miRNAs, in the context of their predicted gene targets. Methods and Results Integration of miRNA and gene patterns yielded evidence that proximal pairs of miRNAs are processed from polycistronic primary transcripts, and that intronic miRNAs and their host gene mRNAs derive from common transcripts. Patterns of miRNA expression revealed multiple tumor subtypes and a set of 34 miRNAs predictive of overall patient survival. In a global analysis, miRNA:mRNA pairs anti-correlated in expression across tumors showed a higher frequency of in silico predicted target sites in the mRNA 3′-untranslated region (with less frequency observed for coding sequence and 5′-untranslated regions). The miR-29 family and predicted target genes were among the most strongly anti-correlated miRNA:mRNA pairs; over-expression of miR-29a in vitro repressed several anti-correlated genes (including DNMT3A and DNMT3B) and substantially decreased ovarian cancer cell viability. Conclusions This study establishes miRNAs as having a widespread impact on gene expression programs in ovarian cancer, further strengthening our understanding of miRNA biology as it applies to human cancer. As with gene transcripts, miRNAs exhibit high diversity reflecting the genomic heterogeneity within a clinically homogeneous disease population. Putative miRNA:mRNA interactions, as identified using integrative analysis, can be validated. TCGA data are a valuable resource for the identification of novel tumor suppressive miRNAs in ovarian as well as other cancers. PMID:22479643
Ab initio gene identification in metagenomic sequences
Zhu, Wenhan; Lomsadze, Alexandre; Borodovsky, Mark
2010-01-01
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes. PMID:20403810
Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé; Kichaev, Gleb; Gusev, Alexander; Pasaniuc, Bogdan
2017-03-02
Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. Here, we introduce a method for estimating the local genetic correlation between gene expression and a complex trait and utilize it to estimate the genetic correlation due to predicted expression between pairs of traits. We integrated gene expression measurements from 45 expression panels with summary GWAS data to perform 30 multi-tissue transcriptome-wide association studies (TWASs). We identified 1,196 genes whose expression is associated with these traits; of these, 168 reside more than 0.5 Mb away from any previously reported GWAS significant variant. We then used our approach to find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, eight were not found through genetic correlation at the SNP level. Finally, we used bi-directional regression to find evidence that BMI causally influences triglyceride levels and that triglyceride levels causally influence low-density lipoprotein. Together, our results provide insight into the role of gene expression in the susceptibility of complex traits and diseases. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun
2010-01-01
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
Liu, Jia; Fu, Jing; Duan, Yan; Wang, Guang
2017-01-01
Graves’ disease (GD) is one of the most common endocrine diseases. Antithyroid drugs (ATDs) treatment is frequently used as the first-choice therapy for GD patients in most countries due to the superiority in safety and tolerance. However, GD patients treated with ATD have a relatively high recurrence rate after drug withdrawal, which is a main limitation for ATD treatment. It is of great importance to identify some predictors of the higher recurrence risk for GD patients, which may facilitate an appropriate therapeutic approach for a given patient at the time of GD diagnosis. The genetic factor was widely believed to be an important pathogenesis for GD. Increasing studies were conducted to investigate the relationship between gene polymorphisms and the recurrence risk in GD patients. In this article, we updated the current literatures to highlight the predictive value of gene polymorphisms on recurrence risk in GD patients after ATD withdrawal. Some gene polymorphisms, such as CTLA4 rs231775, human leukocyte antigen polymorphisms (DRB1*03, DQA1*05, and DQB1*02) might be associated with the high recurrence risk in GD patients. Further prospective studies on patients of different ethnicities, especially studies with large sample sizes, and long-term follow-up, should be conducted to confirm the predictive roles of gene polymorphism. PMID:29085334
The Genetic Basis for Variation in Sensitivity to Lead Toxicity in Drosophila melanogaster.
Zhou, Shanshan; Morozova, Tatiana V; Hussain, Yasmeen N; Luoma, Sarah E; McCoy, Lenovia; Yamamoto, Akihiko; Mackay, Trudy F C; Anholt, Robert R H
2016-07-01
Lead toxicity presents a worldwide health problem, especially due to its adverse effects on cognitive development in children. However, identifying genes that give rise to individual variation in susceptibility to lead toxicity is challenging in human populations. Our goal was to use Drosophila melanogaster to identify evolutionarily conserved candidate genes associated with individual variation in susceptibility to lead exposure. To identify candidate genes associated with variation in susceptibility to lead toxicity, we measured effects of lead exposure on development time, viability and adult activity in the Drosophila melanogaster Genetic Reference Panel (DGRP) and performed genome-wide association analyses to identify candidate genes. We used mutants to assess functional causality of candidate genes and constructed a genetic network associated with variation in sensitivity to lead exposure, on which we could superimpose human orthologs. We found substantial heritabilities for all three traits and identified candidate genes associated with variation in susceptibility to lead exposure for each phenotype. The genetic architectures that determine variation in sensitivity to lead exposure are highly polygenic. Gene ontology and network analyses showed enrichment of genes associated with early development and function of the nervous system. Drosophila melanogaster presents an advantageous model to study the genetic underpinnings of variation in susceptibility to lead toxicity. Evolutionary conservation of cellular pathways that respond to toxic exposure allows predictions regarding orthologous genes and pathways across phyla. Thus, studies in the D. melanogaster model system can identify candidate susceptibility genes to guide subsequent studies in human populations. Zhou S, Morozova TV, Hussain YN, Luoma SE, McCoy L, Yamamoto A, Mackay TF, Anholt RR. 2016. The genetic basis for variation in sensitivity to lead toxicity in Drosophila melanogaster. Environ Health Perspect 124:1062-1070; http://dx.doi.org/10.1289/ehp.1510513.
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...
2014-10-02
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
The genetic landscape of a physical interaction
Diss, Guillaume
2018-01-01
A key question in human genetics and evolutionary biology is how mutations in different genes combine to alter phenotypes. Efforts to systematically map genetic interactions have mostly made use of gene deletions. However, most genetic variation consists of point mutations of diverse and difficult to predict effects. Here, by developing a new sequencing-based protein interaction assay – deepPCA – we quantified the effects of >120,000 pairs of point mutations on the formation of the AP-1 transcription factor complex between the products of the FOS and JUN proto-oncogenes. Genetic interactions are abundant both in cis (within one protein) and trans (between the two molecules) and consist of two classes – interactions driven by thermodynamics that can be predicted using a three-parameter global model, and structural interactions between proximally located residues. These results reveal how physical interactions generate quantitatively predictable genetic interactions. PMID:29638215
Prediction of Aggressive Human Prostate Cancer by Cathepsin B
2008-03-01
Cancer Res 2004;10(12 Pt 1):4118-4124. 28. Munoz E, Gomez F, Paz JI, Casado I, Silva JM, Corcuera MT, Alonso MJ. Ki-67 immunolabeling in pre...detected prostate cancer. J Pathol 2002;197(2):148-154. 34. Claudio PP, Zamparelli A, Garcia FU, Claudio L, Ammirati G, Farina A, Bovicelli A, Russo G...JA. Distinct roles for cysteine cathepsin genes in multistage tumorigenesis. Genes Dev 2006;20(5):543-556. 47. Fernandez PL, Farre X, Nadal A
2013-01-09
specificity. The majority of the top 50 predictive genes contained in each factor are known to characterize host response to viral infection, and include...RSAD2, the OAS family, multiple interferon response elements, the myxovirus- resistance gene MX1, cytokine response pathways and others [16,17,18]. Many...antiviral pathways (Fig. s4). Furthermore, the high degree of similarity and cross- applicability of the two signatures permit the mathematical
Association between osteopontin and human abdominal aortic aneurysm.
Golledge, Jonathan; Muller, Juanita; Shephard, Neil; Clancy, Paula; Smallwood, Linda; Moran, Corey; Dear, Anthony E; Palmer, Lyle J; Norman, Paul E
2007-03-01
In vitro and animal studies have implicated osteopontin (OPN) in the pathogenesis of aortic aneurysm. The relationship between serum concentration of OPN and variants of the OPN gene with human abdominal aortic aneurysm (AAA) was investigated. OPN genotypes were examined in 4227 subjects in which aortic diameter and clinical risk factors were measured. Serum OPN was measured by ELISA in two cohorts of 665 subjects. The concentration of serum OPN was independently associated with the presence of AAA. Odds ratios (and 95% confidence intervals) for upper compared with lower OPN tertiles in predicting presence of AAA were 2.23 (1.29 to 3.85, P=0.004) for the population cohort and 4.08 (1.67 to 10.00, P=0.002) for the referral cohort after adjusting for other risk factors. In 198 patients with complete follow-up of aortic diameter at 3 years, initial serum OPN predicted AAA growth after adjustment for other risk factors (standardized coefficient 0.24, P=0.001). The concentration of OPN in the aortic wall was greater in patients with small AAAs (30 to 50 mm) than those with aortic occlusive disease alone. There was no association between five single nucleotide polymorphisms or haplotypes of the OPN gene and aortic diameter or AAA expansion. Serum and tissue concentrations of OPN are associated with human AAA. We found no relationship between variation of the OPN gene and AAA. OPN may be a useful biomarker for AAA presence and growth.
Global and disease-associated genetic variation in the human Fanconi anemia gene family
Rogers, Kai J.; Fu, Wenqing; Akey, Joshua M.; Monnat, Raymond J.
2014-01-01
Fanconi anemia (FA) is a human recessive genetic disease resulting from inactivating mutations in any of 16 FANC (Fanconi) genes. Individuals with FA are at high risk of developmental abnormalities, early bone marrow failure and leukemia. These are followed in the second and subsequent decades by a very high risk of carcinomas of the head and neck and anogenital region, and a small continuing risk of leukemia. In order to characterize base pair-level disease-associated (DA) and population genetic variation in FANC genes and the segregation of this variation in the human population, we identified 2948 unique FANC gene variants including 493 FA DA variants across 57 240 potential base pair variation sites in the 16 FANC genes. We then analyzed the segregation of this variation in the 7578 subjects included in the Exome Sequencing Project (ESP) and the 1000 Genomes Project (1KGP). There was a remarkably high frequency of FA DA variants in ESP/1KGP subjects: at least 1 FA DA variant was identified in 78.5% (5950 of 7578) individuals included in these two studies. Six widely used functional prediction algorithms correctly identified only a third of the known, DA FANC missense variants. We also identified FA DA variants that may be good candidates for different types of mutation-specific therapies. Our results demonstrate the power of direct DNA sequencing to detect, estimate the frequency of and follow the segregation of deleterious genetic variation in human populations. PMID:25104853
White noise and synchronization shaping the age structure of the human population
NASA Astrophysics Data System (ADS)
Cebrat, Stanislaw; Biecek, Przemyslaw; Bonkowska, Katarzyna; Kula, Mateusz
2007-06-01
We have modified the standard diploid Penna model of ageing in such a way that instead of threshold of defective loci resulting in genetic death of individuals, the fluctuation of environment and "personal" fluctuations of individuals were introduced. The sum of the both fluctuations describes the health status of the individual. While environmental fluctuations are the same for all individuals in the population, the personal component of fluctuations is composed of fluctuations corresponding to each physiological function (gene, genetic locus). It is rather accepted hypothesis that physiological parameters of any organism fluctuate highly nonlinearly. Transition to the synchronized behaviors could be a very strong diagnostic signal of the life threatening disorder. Thus, in our model, mutations of genes change the chaotic fluctuations representing the function of a wild gene to the synchronized signals generated by mutated genes. Genes are switched on chronologically, like in the standard Penna model. Accumulation of defective genes predicted by Medawar's theory of ageing leads to the replacement of uncorrelated white noise corresponding to the healthy organism by the correlated signals of defective functions. As a result we have got the age distribution of population corresponding to the human demographic data.
Mutation detection in the human HSP70B′ gene by denaturing high-performance liquid chromatography
Hecker, Karl H.; Asea, Alexzander; Kobayashi, Kaoru; Green, Stacy; Tang, Dan; Calderwood, Stuart K.
2000-01-01
Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B′ gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKERTM software. Four overlapping amplicons, which span the complete coding region of the HSP70B′ gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B′ gene on the WAVE® Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed. PMID:11189446
Mutation detection in the human HSP7OB' gene by denaturing high-performance liquid chromatography.
Hecker, K H; Asea, A; Kobayashi, K; Green, S; Tang, D; Calderwood, S K
2000-11-01
Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B' gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKER software. Four overlapping amplicons, which span the complete coding region of the HSP70B' gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B' gene on the WAVE Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed.
Genomic insight into pathogenicity of dematiaceous fungus Corynespora cassiicola
Looi, Hong Keat; Toh, Yue Fen; Yew, Su Mei; Na, Shiang Ling; Tan, Yung-Chie; Chong, Pei-Sin; Khoo, Jia-Shiun; Yee, Wai-Yan; Ng, Kee Peng
2017-01-01
Corynespora cassiicola is a common plant pathogen that causes leaf spot disease in a broad range of crop, and it heavily affect rubber trees in Malaysia (Hsueh, 2011; Nghia et al., 2008). The isolation of UM 591 from a patient’s contact lens indicates the pathogenic potential of this dematiaceous fungus in human. However, the underlying factors that contribute to the opportunistic cross-infection have not been fully studied. We employed genome sequencing and gene homology annotations in attempt to identify these factors in UM 591 using data obtained from publicly available bioinformatics databases. The assembly size of UM 591 genome is 41.8 Mbp, and a total of 13,531 (≥99 bp) genes have been predicted. UM 591 is enriched with genes that encode for glycoside hydrolases, carbohydrate esterases, auxiliary activity enzymes and cell wall degrading enzymes. Virulent genes comprising of CAZymes, peptidases, and hypervirulence-associated cutinases were found to be present in the fungal genome. Comparative analysis result shows that UM 591 possesses higher number of carbohydrate esterases family 10 (CE10) CAZymes compared to other species of fungi in this study, and these enzymes hydrolyses wide range of carbohydrate and non-carbohydrate substrates. Putative melanin, siderophore, ent-kaurene, and lycopene biosynthesis gene clusters are predicted, and these gene clusters denote that UM 591 are capable of protecting itself from the UV and chemical stresses, allowing it to adapt to different environment. Putative sterigmatocystin, HC-toxin, cercosporin, and gliotoxin biosynthesis gene cluster are predicted. This finding have highlighted the necrotrophic and invasive nature of UM 591. PMID:28149676
Swainsonine Biosynthesis Genes in Diverse Symbiotic and Pathogenic Fungi
Cook, Daniel; Donzelli, Bruno G. G.; Creamer, Rebecca; Baucom, Deana L.; Gardner, Dale R.; Pan, Juan; Moore, Neil; Krasnoff, Stuart B.; Jaromczyk, Jerzy W.; Schardl, Christopher L.
2017-01-01
Swainsonine—a cytotoxic fungal alkaloid and a potential cancer therapy drug—is produced by the insect pathogen and plant symbiont Metarhizium robertsii, the clover pathogen Slafractonia leguminicola, locoweed symbionts belonging to Alternaria sect. Undifilum, and a recently discovered morning glory symbiont belonging to order Chaetothyriales. Genome sequence analyses revealed that these fungi share orthologous gene clusters, designated “SWN,” which included a multifunctional swnK gene comprising predicted adenylylation and acyltransferase domains with their associated thiolation domains, a β-ketoacyl synthase domain, and two reductase domains. The role of swnK was demonstrated by inactivating it in M. robertsii through homologous gene replacement to give a ∆swnK mutant that produced no detectable swainsonine, then complementing the mutant with the wild-type gene to restore swainsonine biosynthesis. Other SWN cluster genes were predicted to encode two putative hydroxylases and two reductases, as expected to complete biosynthesis of swainsonine from the predicted SwnK product. SWN gene clusters were identified in six out of seven sequenced genomes of Metarhzium species, and in all 15 sequenced genomes of Arthrodermataceae, a family of fungi that cause athlete’s foot and ringworm diseases in humans and other mammals. Representative isolates of all of these species were cultured, and all Metarhizium spp. with SWN clusters, as well as all but one of the Arthrodermataceae, produced swainsonine. These results suggest a new biosynthetic hypothesis for this alkaloid, extending the known taxonomic breadth of swainsonine producers to at least four orders of Ascomycota, and suggest that swainsonine has roles in mutualistic symbioses and diseases of plants and animals. PMID:28381497
Swainsonine Biosynthesis Genes in Diverse Symbiotic and Pathogenic Fungi.
Cook, Daniel; Donzelli, Bruno G G; Creamer, Rebecca; Baucom, Deana L; Gardner, Dale R; Pan, Juan; Moore, Neil; Krasnoff, Stuart B; Jaromczyk, Jerzy W; Schardl, Christopher L
2017-06-07
Swainsonine-a cytotoxic fungal alkaloid and a potential cancer therapy drug-is produced by the insect pathogen and plant symbiont Metarhizium robertsii , the clover pathogen Slafractonia leguminicola , locoweed symbionts belonging to Alternaria sect. Undifilum , and a recently discovered morning glory symbiont belonging to order Chaetothyriales. Genome sequence analyses revealed that these fungi share orthologous gene clusters, designated " SWN ," which included a multifunctional swnK gene comprising predicted adenylylation and acyltransferase domains with their associated thiolation domains, a β-ketoacyl synthase domain, and two reductase domains. The role of swnK was demonstrated by inactivating it in M. robertsii through homologous gene replacement to give a ∆ swnK mutant that produced no detectable swainsonine, then complementing the mutant with the wild-type gene to restore swainsonine biosynthesis. Other SWN cluster genes were predicted to encode two putative hydroxylases and two reductases, as expected to complete biosynthesis of swainsonine from the predicted SwnK product. SWN gene clusters were identified in six out of seven sequenced genomes of Metarhzium species, and in all 15 sequenced genomes of Arthrodermataceae, a family of fungi that cause athlete's foot and ringworm diseases in humans and other mammals. Representative isolates of all of these species were cultured, and all Metarhizium spp. with SWN clusters, as well as all but one of the Arthrodermataceae, produced swainsonine. These results suggest a new biosynthetic hypothesis for this alkaloid, extending the known taxonomic breadth of swainsonine producers to at least four orders of Ascomycota, and suggest that swainsonine has roles in mutualistic symbioses and diseases of plants and animals. Copyright © 2017 Cook et al.
YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.
Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina
2015-01-01
MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.
Edwards, Stefan M.; Sørensen, Izel F.; Sarup, Pernille; Mackay, Trudy F. C.; Sørensen, Peter
2016-01-01
Predicting individual quantitative trait phenotypes from high-resolution genomic polymorphism data is important for personalized medicine in humans, plant and animal breeding, and adaptive evolution. However, this is difficult for populations of unrelated individuals when the number of causal variants is low relative to the total number of polymorphisms and causal variants individually have small effects on the traits. We hypothesized that mapping molecular polymorphisms to genomic features such as genes and their gene ontology categories could increase the accuracy of genomic prediction models. We developed a genomic feature best linear unbiased prediction (GFBLUP) model that implements this strategy and applied it to three quantitative traits (startle response, starvation resistance, and chill coma recovery) in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel. Our results indicate that subsetting markers based on genomic features increases the predictive ability relative to the standard genomic best linear unbiased prediction (GBLUP) model. Both models use all markers, but GFBLUP allows differential weighting of the individual genetic marker relationships, whereas GBLUP weighs the genetic marker relationships equally. Simulation studies show that it is possible to further increase the accuracy of genomic prediction for complex traits using this model, provided the genomic features are enriched for causal variants. Our GFBLUP model using prior information on genomic features enriched for causal variants can increase the accuracy of genomic predictions in populations of unrelated individuals and provides a formal statistical framework for leveraging and evaluating information across multiple experimental studies to provide novel insights into the genetic architecture of complex traits. PMID:27235308
Holmes, Roger S
2010-03-01
BLAT (BLAST-Like Alignment Tool) analyses of the opossum (Monodelphis domestica) and zebrafish (Danio rerio) genomes were undertaken using amino acid sequences of the acylglycerol acyltransferase (AGAT) superfamily. Evidence is reported for 8 opossum monoacylglycerol acyltransferase-like (MGAT) (E.C. 2.3.1.22) and diacylglycerol acyltransferase-like (DGAT) (E.C. 2.3.1.20) genes and proteins, including DGAT1, DGAT2, DGAT2L6 (DGAT2-like protein 6), AWAT1 (acyl CoA wax alcohol acyltransferase 1), AWAT2, MGAT1, MGAT2 and MGAT3. Three of these genes (AWAT1, AWAT2 and DGAT2L6) are closely localized on the opossum X chromosome. Evidence is also reported for six zebrafish MGAT- and DGAT-like genes, including two DGAT1-like genes, as well as DGAT2-, MGAT1-, MGAT2- and MGAT3-like genes and proteins. Predicted primary, secondary and transmembrane structures for the opossum and zebrafish MGAT-, AWAT- and DGAT-like subunits and the intron-exon boundaries for genes encoding these enzymes showed a high degree of similarity with other members of the AGAT superfamily, which play major roles in triacylglyceride (DGAT), diacylglyceride (MGAT) and wax ester (AWAT) biosynthesis. Alignments of predicted opossum, zebrafish and other vertebrate DGAT1, DGAT2, other DGAT2-like and MGAT-like amino acid sequences with known human and mouse enzymes demonstrated conservation of residues which are likely to play key roles in catalysis, lipid binding or in maintaining structure. Phylogeny studies of the human, mouse, opossum, zebrafish and pufferfish MGAT- and DGAT-like enzymes indicated that the common ancestors for these genes predated the appearance of bony fish during vertebrate evolution whereas the AWAT- and DGAT2L6-like genes may have appeared more recently prior to the appearance of marsupial and eutherian mammals. Copyright 2009 Elsevier Inc. All rights reserved.
Fernández-Cadenas, Israel; Mendióroz, Maite; Giralt, Dolors; Nafria, Cristina; Garcia, Elena; Carrera, Caty; Gallego-Fabrega, Cristina; Domingues-Montanari, Sophie; Delgado, Pilar; Ribó, Marc; Castellanos, Mar; Martínez, Sergi; Freijo, Marimar; Jiménez-Conde, Jordi; Rubiera, Marta; Alvarez-Sabín, José; Molina, Carlos A; Font, Maria Angels; Grau Olivares, Marta; Palomeras, Ernest; Perez de la Ossa, Natalia; Martinez-Zabaleta, Maite; Masjuan, Jaime; Moniche, Francisco; Canovas, David; Piñana, Carlos; Purroy, Francisco; Cocho, Dolores; Navas, Inma; Tejero, Carlos; Aymerich, Nuria; Cullell, Natalia; Muiño, Elena; Serena, Joaquín; Rubio, Francisco; Davalos, Antoni; Roquer, Jaume; Arenillas, Juan Francisco; Martí-Fábregas, Joan; Keene, Keith; Chen, Wei-Min; Worrall, Bradford; Sale, Michele; Arboix, Adrià; Krupinski, Jerzy; Montaner, Joan
2017-05-01
Vascular recurrence occurs in 11% of patients during the first year after ischemic stroke (IS) or transient ischemic attack. Clinical scores do not predict the whole vascular recurrence risk; therefore, we aimed to find genetic variants associated with recurrence that might improve the clinical predictive models in IS. We analyzed 256 polymorphisms from 115 candidate genes in 3 patient cohorts comprising 4482 IS or transient ischemic attack patients. The discovery cohort was prospectively recruited and included 1494 patients, 6.2% of them developed a new IS during the first year of follow-up. Replication analysis was performed in 2988 patients using SNPlex or HumanOmni1-Quad technology. We generated a predictive model using Cox regression (GRECOS score [Genotyping Reurrence Risk of Stroke]) and generated risk groups using a classification tree method. The analyses revealed that rs1800801 in the MGP gene (hazard ratio, 1.33; P =9×10 - 03 ), a gene related to artery calcification, was associated with new IS during the first year of follow-up. This polymorphism was replicated in a Spanish cohort (n=1.305); however, it was not significantly associated in a North American cohort (n=1.683). The GRECOS score predicted new IS ( P =3.2×10 - 09 ) and could classify patients, from low risk of stroke recurrence (1.9%) to high risk (12.6%). Moreover, the addition of genetic risk factors to the GRECOS score improves the prediction compared with previous Stroke Prognosis Instrument-II score ( P =0.03). The use of genetics could be useful to estimate vascular recurrence risk after IS. Genetic variability in the MGP gene was associated with vascular recurrence in the Spanish population. © 2017 American Heart Association, Inc.
GRECOS project. The use of genetics to predict the vascular recurrence after stroke
Fernández-Cadenas, Israel; Mendióroz, Maite; Giralt, Dolors; Nafria, Cristina; Garcia, Elena; Carrera, Caty; Gallego-Fabrega, Cristina; Domingues-Montanari, Sophie; Delgado, Pilar; Ribó, Marc; Castellanos, Mar; Martínez, Sergi; Freijo, Mari Mar; Jiménez-Conde, Jordi; Rubiera, Marta; Alvarez-Sabín, José; Molina, Carlos A.; Font, Maria Angels; Olivares, Marta Grau; Palomeras, Ernest; de la Ossa, Natalia Perez; Martinez-Zabaleta, Maite; Masjuan, Jaime; Moniche, Francisco; Canovas, David; Piñana, Carlos; Purroy, Francisco; Cocho, Dolores; Navas, Inma; Tejero, Carlos; Aymerich, Nuria; Cullell, Natalia; Muiño, Elena; Serena, Joaquín; Rubio, Francisco; Davalos, Antoni; Roquer, Jaume; Arenillas, Juan Francisco; Martí-Fábregas, Joan; Keene, Keith; Chen, Wei-Min; Worrall, Bradford; Sale, Michele; Arboix, Adrià; Krupinski, Jerzy; Montaner, Joan
2017-01-01
Background and Purpose Vascular recurrence occurs in 11% of patients during the first year after ischemic stroke (IS) or transient ischemic attack (TIA). Clinical scores do not predict the whole vascular recurrence risk, therefore we aimed to find genetic variants associated with recurrence that might improve the clinical predictive models in IS. Methods We analyzed 256 polymorphisms from 115 candidate genes in three patient cohorts comprising 4,482 IS or TIA patients. The discovery cohort was prospectively recruited and included 1,494 patients, 6.2% of them developed a new IS during the first year of follow-up. Replication analysis was performed in 2,988 patients using SNPlex or HumanOmni1-Quad technology. We generated a predictive model using Cox regression (GRECOS score), and generated risk groups using a classification tree method. Results The analyses revealed that rs1800801 in the MGP gene (HR: 1.33, p= 9×10−03), a gene related to artery calcification, was associated with new IS during the first year of follow-up. This polymorphism was replicated in a Spanish cohort (n=1.305), however it was not significantly associated in a North American cohort (n=1.683). The GRECOS score predicted new IS (p= 3.2×10−09) and could classify patients, from low risk of stroke recurrence (1.9%) to high risk (12.6%). Moreover, the addition of genetic risk factors to the GRECOS score improves the prediction compared to previous SPI-II score (p=0.03). Conclusions The use of genetics could be useful to estimate vascular recurrence risk after IS. Genetic variability in the MGP gene was associated with vascular recurrence in the Spanish population. PMID:28411264
Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor; Faria, Nuno; Bielejec, Filip; Baele, Guy; Russell, Colin A; Smith, Derek J; Pybus, Oliver G; Brockmann, Dirk; Suchard, Marc A
2014-02-01
Information on global human movement patterns is central to spatial epidemiological models used to predict the behavior of influenza and other infectious diseases. Yet it remains difficult to test which modes of dispersal drive pathogen spread at various geographic scales using standard epidemiological data alone. Evolutionary analyses of pathogen genome sequences increasingly provide insights into the spatial dynamics of influenza viruses, but to date they have largely neglected the wealth of information on human mobility, mainly because no statistical framework exists within which viral gene sequences and empirical data on host movement can be combined. Here, we address this problem by applying a phylogeographic approach to elucidate the global spread of human influenza subtype H3N2 and assess its ability to predict the spatial spread of human influenza A viruses worldwide. Using a framework that estimates the migration history of human influenza while simultaneously testing and quantifying a range of potential predictive variables of spatial spread, we show that the global dynamics of influenza H3N2 are driven by air passenger flows, whereas at more local scales spread is also determined by processes that correlate with geographic distance. Our analyses further confirm a central role for mainland China and Southeast Asia in maintaining a source population for global influenza diversity. By comparing model output with the known pandemic expansion of H1N1 during 2009, we demonstrate that predictions of influenza spatial spread are most accurate when data on human mobility and viral evolution are integrated. In conclusion, the global dynamics of influenza viruses are best explained by combining human mobility data with the spatial information inherent in sampled viral genomes. The integrated approach introduced here offers great potential for epidemiological surveillance through phylogeographic reconstructions and for improving predictive models of disease control.
Tye-Din, J A; Cameron, D J S; Daveson, A J; Day, A S; Dellsperger, P; Hogan, C; Newnham, E D; Shepherd, S J; Steele, R H; Wienholt, L; Varney, M D
2015-01-01
The past decade has seen human leukocyte antigen (HLA) typing emerge as a remarkably popular test for the diagnostic work-up of coeliac disease with high patient acceptance. Although limited in its positive predictive value for coeliac disease, the strong disease association with specific HLA genes imparts exceptional negative predictive value to HLA typing, enabling a negative result to exclude coeliac disease confidently. In response to mounting evidence that the clinical use and interpretation of HLA typing often deviates from best practice, this article outlines an evidence-based approach to guide clinically appropriate use of HLA typing, and establishes a reporting template for pathology providers to improve communication of results. PMID:25827511
2016-01-01
Herpesviridae family is one of the significant viral families which comprises major pathogens of a wide range of hosts. This family includes at least eight species of viruses which are known to infect humans. This family has evolved 180–220 million years ago and the present study highlights that it is still evolving and more genes can be added to the repertoire of this family. In addition, its core-genome includes important viral proteins including glycoprotein B and helicase. Most of the infections caused by human herpesviruses have no definitive cure; thus, search for new therapeutic strategies is necessary. The present study finds core-genome of human herpesviruses that differs from that of Herpesviridae family and nonhuman herpes strains of this family and might be a putative target for vaccine development. The phylogenetic reconstruction based upon the protein sequences of core gene set of Herpesviridae family reveals the sharp splits of its different subfamilies and supports the hypothesis of coevolution of viruses with their hosts. In addition, data mining for cis-elements in the genomes of human herpesviruses results in the prediction of numerous regulatory elements which can be used for regulating the expression of viral based vectors implicated in gene therapies. PMID:27314006
Ponting, C P; Mott, R; Bork, P; Copley, R R
2001-12-01
Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.
Xue, Ruidan; Lynes, Matthew D; Dreyfuss, Jonathan M; Shamsi, Farnaz; Schulz, Tim J; Zhang, Hongbin; Huang, Tian Lian; Townsend, Kristy L; Li, Yiming; Takahashi, Hirokazu; Weiner, Lauren S; White, Andrew P; Lynes, Maureen S; Rubin, Lee L; Goodyear, Laurie J; Cypess, Aaron M; Tseng, Yu-Hua
2015-07-01
Targeting brown adipose tissue (BAT) content or activity has therapeutic potential for treating obesity and the metabolic syndrome by increasing energy expenditure. However, both inter- and intra-individual differences contribute to heterogeneity in human BAT and potentially to differential thermogenic capacity in human populations. Here we generated clones of brown and white preadipocytes from human neck fat and characterized their adipogenic and thermogenic differentiation. We combined an uncoupling protein 1 (UCP1) reporter system and expression profiling to define novel sets of gene signatures in human preadipocytes that could predict the thermogenic potential of the cells once they were maturated. Knocking out the positive UCP1 regulators, PREX1 and EDNRB, in brown preadipocytes using CRISPR-Cas9 markedly abolished the high level of UCP1 in brown adipocytes differentiated from the preadipocytes. Finally, we were able to prospectively isolate adipose progenitors with great thermogenic potential using the cell surface marker CD29. These data provide new insights into the cellular heterogeneity in human fat and offer potential biomarkers for identifying thermogenically competent preadipocytes.
Baranova, Ancha; Hammarsund, Marianne; Ivanov, Dmitry; Skoblov, Mikhail; Sangfelt, Olle; Corcoran, Martin; Borodina, Tatiana; Makeeva, Natalia; Pestova, Anna; Tyazhelova, Tatiana; Nazarenko, Svetlana; Gorreta, Francesco; Alsheddi, Tariq; Schlauch, Karen; Nikitin, Eugene; Kapanadze, Bagrat; Shagin, Dmitry; Poltaraus, Andrey; Ivanovich Vorobiev, Andrey; Zabarovsky, Eugene; Lukianov, Sergey; Chandhoke, Vikas; Ibbotson, Rachel; Oscier, David; Einhorn, Stefan; Grander, Dan; Yankovsky, Nick
2003-12-04
In the present study, we describe the human and mouse RFP2 gene structure, multiple RFP2 mRNA isoforms in the two species that have different 5' UTRs and a human-specific antisense transcript RFP2OS. Since the human RFP2 5' UTR is not conserved in mouse, these findings might indicate a different regulation of RFP2 in the two species. The predicted human and mouse RFP2 proteins are shown to contain a tripartite RING finger-B-box-coiled-coil domain (RBCC), also known as a TRIM domain, and therefore belong to a subgroup of RING finger proteins that are often involved in developmental and tumorigenic processes. Because homozygous deletions of chromosomal region 13q14.3 are found in a number of malignancies, including chronic lymphocytic leukemia (CLL) and multiple myeloma (MM), we suggest that RFP2 might be involved in tumor development. This study provides necessary information for evaluation of the role of RFP2 in malignant transformation and other biological processes.
Bohne, Felix; Martínez-Llordella, Marc; Lozano, Juan-José; Miquel, Rosa; Benítez, Carlos; Londoño, María-Carlota; Manzia, Tommaso-María; Angelico, Roberta; Swinkels, Dorine W.; Tjalsma, Harold; López, Marta; Abraldes, Juan G.; Bonaccorsi-Riani, Eliano; Jaeckel, Elmar; Taubert, Richard; Pirenne, Jacques; Rimola, Antoni; Tisone, Giuseppe; Sánchez-Fueyo, Alberto
2011-01-01
Following organ transplantation, lifelong immunosuppressive therapy is required to prevent the host immune system from destroying the allograft. This can cause severe side effects and increased recipient morbidity and mortality. Complete cessation of immunosuppressive drugs has been successfully accomplished in selected transplant recipients, providing proof of principle that operational allograft tolerance is attainable in clinical transplantation. The intra-graft molecular pathways associated with successful drug withdrawal, however, are not well defined. In this study, we analyzed sequential blood and liver tissue samples collected from liver transplant recipients enrolled in a prospective multicenter immunosuppressive drug withdrawal clinical trial. Before initiation of drug withdrawal, operationally tolerant and non-tolerant recipients differed in the intra-graft expression of genes involved in the regulation of iron homeostasis. Furthermore, as compared with non-tolerant recipients, operationally tolerant patients exhibited higher serum levels of hepcidin and ferritin and increased hepatocyte iron deposition. Finally, liver tissue gene expression measurements accurately predicted the outcome of immunosuppressive withdrawal in an independent set of patients. These results point to a critical role for iron metabolism in the regulation of intra-graft alloimmune responses in humans and provide a set of biomarkers to conduct drug-weaning trials in liver transplantation. PMID:22156196
Zhao, Meirong; Zhang, Ying; Zhuang, Shulin; Zhang, Quan; Lu, Chengsheng; Liu, Weiping
2014-07-15
Endocrine-disrupting chemicals (EDCs) can interfere with normal hormone signaling to increase health risks to the maternal-fetal system, yet few studies have been conducted on the currently used chiral EDCs. This work tested the hypothesis that pyrethroids could enantioselectively interfere with trophoblast cells. Cell viability, hormone secretion, and steroidogenesis gene expression of a widely used pyrethroid, bifenthrin (BF), were evaluated in vitro, and the interactions of BF enantiomers with estrogen receptor (ER) were predicted. At low or noncytotoxic concentrations, both progesterone and human chorionic gonadotropin secretion were induced. The expression levels of progesterone receptor and human leukocyte antigen G genes were significantly stimulated. The key regulators of the hormonal cascade, GnRH type-I and its receptor, were both upregulated. The expression levels of selected steroidogenic genes were also significantly altered. Moreover, a consistent enantioselective interference of hormone signaling was observed, and S-BF had greater effects than R-BF. Using molecular docking, the enantioselective endocrine disruption of BF was predicted to be partially due to enantiospecific ER binding affinity. Thus, BF could act through ER to enantioselectively disturb the hormonal network in trophoblast cells. These converging results suggest that the currently used chiral pesticides are of significant concern with respect to maternal-fetal health.
2010-01-01
Background Corynebacterium pseudotuberculosis is generally regarded as an important animal pathogen that rarely infects humans. Clinical strains are occasionally recovered from human cases of lymphadenitis, such as C. pseudotuberculosis FRC41 that was isolated from the inguinal lymph node of a 12-year-old girl with necrotizing lymphadenitis. To detect potential virulence factors and corresponding gene-regulatory networks in this human isolate, the genome sequence of C. pseudotuberculosis FCR41 was determined by pyrosequencing and functionally annotated. Results Sequencing and assembly of the C. pseudotuberculosis FRC41 genome yielded a circular chromosome with a size of 2,337,913 bp and a mean G+C content of 52.2%. Specific gene sets associated with iron and zinc homeostasis were detected among the 2,110 predicted protein-coding regions and integrated into a gene-regulatory network that is linked with both the central metabolism and the oxidative stress response of FRC41. Two gene clusters encode proteins involved in the sortase-mediated polymerization of adhesive pili that can probably mediate the adherence to host tissue to facilitate additional ligand-receptor interactions and the delivery of virulence factors. The prominent virulence factors phospholipase D (Pld) and corynebacterial protease CP40 are encoded in the genome of this human isolate. The genome annotation revealed additional serine proteases, neuraminidase H, nitric oxide reductase, an invasion-associated protein, and acyl-CoA carboxylase subunits involved in mycolic acid biosynthesis as potential virulence factors. The cAMP-sensing transcription regulator GlxR plays a key role in controlling the expression of several genes contributing to virulence. Conclusion The functional data deduced from the genome sequencing and the extended knowledge of virulence factors indicate that the human isolate C. pseudotuberculosis FRC41 is equipped with a distinct gene set promoting its survival under unfavorable environmental conditions encountered in the mammalian host. PMID:21192786
Templeton, A R; Robertson, R J; Brisson, J; Strasburg, J
2001-05-08
Humans affect biodiversity at the genetic, species, community, and ecosystem levels. This impact on genetic diversity is critical, because genetic diversity is the raw material of evolutionary change, including adaptation and speciation. Two forces affecting genetic variation are genetic drift (which decreases genetic variation within but increases genetic differentiation among local populations) and gene flow (which increases variation within but decreases differentiation among local populations). Humans activities often augment drift and diminish gene flow for many species, which reduces genetic variation in local populations and prevents the spread of adaptive complexes outside their population of origin, thereby disrupting adaptive processes both locally and globally within a species. These impacts are illustrated with collared lizards (Crotaphytus collaris) in the Missouri Ozarks. Forest fire suppression has reduced habitat and disrupted gene flow in this lizard, thereby altering the balance toward drift and away from gene flow. This balance can be restored by managed landscape burns. Some have argued that, although human-induced fragmentation disrupts adaptation, it will also ultimately produce new species through founder effects. However, population genetic theory and experiments predict that most fragmentation events caused by human activities will facilitate not speciation, but local extinction. Founder events have played an important role in the macroevolution of certain groups, but only when ecological opportunities are expanding rather than contracting. The general impact of human activities on genetic diversity disrupts or diminishes the capacity for adaptation, speciation, and macroevolutionary change. This impact will ultimately diminish biodiversity at all levels.
Bartlett, Thomas E.; Jones, Allison; Goode, Ellen L.; Fridley, Brooke L.; Cunningham, Julie M.; Berns, Els M. J. J.; Wik, Elisabeth; Salvesen, Helga B.; Davidson, Ben; Trope, Claes G.; Lambrechts, Sandrina; Vergote, Ignace; Widschwendter, Martin
2015-01-01
We introduce a novel per-gene measure of intra-gene DNA methylation variability (IGV) based on the Illumina Infinium HumanMethylation450 platform, which is prognostic independently of well-known predictors of clinical outcome. Using IGV, we derive a robust gene-panel prognostic signature for ovarian cancer (OC, n = 221), which validates in two independent data sets from Mayo Clinic (n = 198) and TCGA (n = 358), with significance of p = 0.004 in both sets. The OC prognostic signature gene-panel is comprised of four gene groups, which represent distinct biological processes. We show the IGV measurements of these gene groups are most likely a reflection of a mixture of intra-tumour heterogeneity and transcription factor (TF) binding/activity. IGV can be used to predict clinical outcome in patients individually, providing a surrogate read-out of hard-to-measure disease processes. PMID:26629914
Bartlett, Thomas E; Jones, Allison; Goode, Ellen L; Fridley, Brooke L; Cunningham, Julie M; Berns, Els M J J; Wik, Elisabeth; Salvesen, Helga B; Davidson, Ben; Trope, Claes G; Lambrechts, Sandrina; Vergote, Ignace; Widschwendter, Martin
2015-01-01
We introduce a novel per-gene measure of intra-gene DNA methylation variability (IGV) based on the Illumina Infinium HumanMethylation450 platform, which is prognostic independently of well-known predictors of clinical outcome. Using IGV, we derive a robust gene-panel prognostic signature for ovarian cancer (OC, n = 221), which validates in two independent data sets from Mayo Clinic (n = 198) and TCGA (n = 358), with significance of p = 0.004 in both sets. The OC prognostic signature gene-panel is comprised of four gene groups, which represent distinct biological processes. We show the IGV measurements of these gene groups are most likely a reflection of a mixture of intra-tumour heterogeneity and transcription factor (TF) binding/activity. IGV can be used to predict clinical outcome in patients individually, providing a surrogate read-out of hard-to-measure disease processes.
Blood gene expression profiling of an early acetaminophen response.
Bushel, P R; Fannin, R D; Gerrish, K; Watkins, P B; Paules, R S
2017-06-01
Acetaminophen can adversely affect the liver especially when overdosed. We used whole blood as a surrogate to identify genes as potential early indicators of an acetaminophen-induced response. In a clinical study, healthy human subjects were dosed daily with 4 g of either acetaminophen or placebo pills for 7 days and evaluated over the course of 14 days. Alanine aminotransferase (ALT) levels for responders to acetaminophen increased between days 4 and 9 after dosing, and 12 genes were detected with expression profiles significantly altered within 24 h. The early responsive genes separated the subjects by class and dose period. In addition, the genes clustered patients who overdosed on acetaminophen apart from controls and also predicted the exposure classifications with 100% accuracy. The responsive genes serve as early indicators of an acetaminophen exposure, and their gene expression profiles can potentially be evaluated as molecular indicators for further consideration.
Blood Gene Expression Profiling of an Early Acetaminophen Response
Bushel, Pierre R.; Fannin, Rick D.; Gerrish, Kevin; Watkins, Paul B.; Paules, Richard S.
2018-01-01
Acetaminophen can adversely affect the liver especially when overdosed. We used whole blood as a surrogate to identify genes as potential early indicators of an acetaminophen-induced response. In a clinical study, healthy human subjects were dosed daily with 4g of either acetaminophen or placebo pills for 7 days and evaluated over the course of 14 days. Alanine aminotransferase (ALT) levels for responders to acetaminophen increased between days 4 and 9 after dosing and 12 genes were detected with expression profiles significantly altered within 24 hrs. The early responsive genes separated the subjects by class and dose period. In addition, the genes clustered patients who overdosed on acetaminophen apart from controls and also predicted the exposure classifications with 100% accuracy. The responsive genes serve as early indicators of an acetaminophen exposure and their gene expression profiles can potentially be evaluated as molecular indicators for further consideration. PMID:26927286
Liu, Xuewu; Huang, Yuxiao; Liang, Jiao; Zhang, Shuai; Li, Yinghui; Wang, Jun; Shen, Yan; Xu, Zhikai; Zhao, Ya
2014-11-30
The invasion of red blood cells (RBCs) by malarial parasites is an essential step in the life cycle of Plasmodium falciparum. Human-parasite surface protein interactions play a critical role in this process. Although several interactions between human and parasite proteins have been discovered, the mechanism related to invasion remains poorly understood because numerous human-parasite protein interactions have not yet been identified. High-throughput screening experiments are not feasible for malarial parasites due to difficulty in expressing the parasite proteins. Here, we performed computational prediction of the PPIs involved in malaria parasite invasion to elucidate the mechanism by which invasion occurs. In this study, an expectation maximization algorithm was used to estimate the probabilities of domain-domain interactions (DDIs). Estimates of DDI probabilities were then used to infer PPI probabilities. We found that our prediction performance was better than that based on the information of D. melanogaster alone when information related to the six species was used. Prediction performance was assessed using protein interaction data from S. cerevisiae, indicating that the predicted results were reliable. We then used the estimates of DDI probabilities to infer interactions between 490 parasite and 3,787 human membrane proteins. A small-scale dataset was used to illustrate the usability of our method in predicting interactions between human and parasite proteins. The positive predictive value (PPV) was lower than that observed in S. cerevisiae. We integrated gene expression data to improve prediction accuracy and to reduce false positives. We identified 80 membrane proteins highly expressed in the schizont stage by fast Fourier transform method. Approximately 221 erythrocyte membrane proteins were identified using published mass spectral datasets. A network consisting of 205 interactions was predicted. Results of network analysis suggest that SNARE proteins of parasites and APP of humans may function in the invasion of RBCs by parasites. We predicted a small-scale PPI network that may be involved in parasite invasion of RBCs by integrating DDI information and expression profiles. Experimental studies should be conducted to validate the predicted interactions. The predicted PPIs help elucidate the mechanism of parasite invasion and provide directions for future experimental investigations.
Mingo, Rebecca; Zhang, Shu; Long, Courtney P; LaConte, Leslie E W; McDonald, Sarah M
2017-08-24
Rotaviruses (RVs) can evolve through the process of reassortment, whereby the 11 double-stranded RNA genome segments are exchanged among strains during co-infection. However, reassortment is limited in cases where the genes or encoded proteins of co-infecting strains are functionally incompatible. In this study, we employed a helper virus-based reverse genetics system to identify NSP2 gene regions that correlate with restricted reassortment into simian RV strain SA11. We show that SA11 reassortants with NSP2 genes from human RV strains Wa or DS-1 were efficiently rescued and exhibit no detectable replication defects. However, we could not rescue an SA11 reassortant with a human RV strain AU-1 NSP2 gene, which differs from that of SA11 by 186 nucleotides (36 amino acids). To map restriction determinants, we engineered viruses to contain chimeric NSP2 genes in which specific regions of AU-1 sequence were substituted with SA11 sequence. We show that a region spanning AU-1 NSP2 gene nucleotides 784-820 is critical for the observed restriction; yet additional determinants reside in other gene regions. In silico and in vitro analyses were used to predict how the 784-820 region may impact NSP2 gene/protein function, thereby informing an understanding of the reassortment restriction mechanism.
Variation in Glucose Homeostasis Traits Associated With P2RX7 Polymorphisms in Mice and Humans
Todd, Jennifer N.; Poon, Wenny; Lyssenko, Valeriya; Groop, Leif; Nichols, Brendan; Wilmot, Michael; Robson, Simon; Enjyoji, Keiichi; Herman, Mark A.; Hu, Cheng; Zhang, Rong; Jia, Weiping; Ma, Ronald
2015-01-01
Context: Extracellular nucleotide receptors are expressed in pancreatic B-cells. Purinergic signaling via these receptors may regulate pancreatic B-cell function. Objective: We hypothesized that purinergic signaling might influence glucose regulation and sought evidence in human studies of glycemic variation and a mouse model of purinergic signaling dysfunction. Design: In humans, we mined genome-wide meta-analysis data sets to examine purinergic signaling genes for association with glycemic traits and type 2 diabetes. We performed additional testing in two genomic regions (P2RX4/P2RX7 and P2RY1) in a cohort from the Prevalence, Prediction, and Prevention of Diabetes in Botnia (n = 3504), which includes more refined measures of glucose homeostasis. In mice, we generated a congenic model of purinergic signaling dysfunction by crossing the naturally hypomorphic C57BL6 P2rx7 allele onto the 129SvJ background. Results: Variants in five genes were associated with glycemic traits and in three genes with diabetes risk. In the Prevalence, Prediction, and Prevention of Diabetes in Botnia study, the minor allele in the missense functional variant rs1718119 (A348T) in P2RX7 was associated with increased insulin sensitivity and secretion, consistent with its known effect on increased pore function. Both male and female P2x7-C57 mice demonstrated impaired glucose tolerance compared with matched P2x7-129 mice. Insulin tolerance testing showed that P2x7-C57 mice were also less responsive to insulin than P2x7-129 mice. Conclusions: We show association of the purinergic signaling pathway in general and hypofunctioning P2X7 variants in particular with impaired glucose homeostasis in both mice and humans. PMID:25719930
Predicting Rat and Human Pregnane X Receptor Activators Using Bayesian Classification Models.
AbdulHameed, Mohamed Diwan M; Ippolito, Danielle L; Wallqvist, Anders
2016-10-17
The pregnane X receptor (PXR) is a ligand-activated transcription factor that acts as a master regulator of metabolizing enzymes and transporters. To avoid adverse drug-drug interactions and diseases such as steatosis and cancers associated with PXR activation, identifying drugs and chemicals that activate PXR is of crucial importance. In this work, we developed ligand-based predictive computational models for both rat and human PXR activation, which allowed us to identify potentially harmful chemicals and evaluate species-specific effects of a given compound. We utilized a large publicly available data set of nearly 2000 compounds screened in cell-based reporter gene assays to develop Bayesian quantitative structure-activity relationship models using physicochemical properties and structural descriptors. Our analysis showed that PXR activators tend to be hydrophobic and significantly different from nonactivators in terms of their physicochemical properties such as molecular weight, logP, number of rings, and solubility. Our Bayesian models, evaluated by using 5-fold cross-validation, displayed a sensitivity of 75% (76%), specificity of 76% (75%), and accuracy of 89% (89%) for human (rat) PXR activation. We identified structural features shared by rat and human PXR activators as well as those unique to each species. We compared rat in vitro PXR activation data to in vivo data by using DrugMatrix, a large toxicogenomics database with gene expression data obtained from rats after exposure to diverse chemicals. Although in vivo gene expression data pointed to cross-talk between nuclear receptor activators that is captured only by in vivo assays, overall we found broad agreement between in vitro and in vivo PXR activation. Thus, the models developed here serve primarily as efficient initial high-throughput in silico screens of in vitro activity.
Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried
2016-02-24
Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.
Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.
Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi
2007-10-04
In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
Parham, Fred; Portier, Christopher J.; Chang, Xiaoqing; Mevissen, Meike
2016-01-01
Using in vitro data in human cell lines, several research groups have investigated changes in gene expression in cellular systems following exposure to extremely low frequency (ELF) and radiofrequency (RF) electromagnetic fields (EMF). For ELF EMF, we obtained five studies with complete microarray data and three studies with only lists of significantly altered genes. Likewise, for RF EMF, we obtained 13 complete microarray datasets and 5 limited datasets. Plausible linkages between exposure to ELF and RF EMF and human diseases were identified using a three-step process: (a) linking genes associated with classes of human diseases to molecular pathways, (b) linking pathways to ELF and RF EMF microarray data, and (c) identifying associations between human disease and EMF exposures where the pathways are significantly similar. A total of 60 pathways were associated with human diseases, mostly focused on basic cellular functions like JAK–STAT signaling or metabolic functions like xenobiotic metabolism by cytochrome P450 enzymes. ELF EMF datasets were sporadically linked to human diseases, but no clear pattern emerged. Individual datasets showed some linkage to cancer, chemical dependency, metabolic disorders, and neurological disorders. RF EMF datasets were not strongly linked to any disorders but strongly linked to changes in several pathways. Based on these analyses, the most promising area for further research would be to focus on EMF and neurological function and disorders. PMID:27656641
Raymond, Frédéric; Boisvert, Sébastien; Roy, Gaétan; Ritt, Jean-François; Légaré, Danielle; Isnard, Amandine; Stanke, Mario; Olivier, Martin; Tremblay, Michel J.; Papadopoulou, Barbara; Ouellette, Marc; Corbeil, Jacques
2012-01-01
The Leishmania tarentolae Parrot-TarII strain genome sequence was resolved to an average 16-fold mean coverage by next-generation DNA sequencing technologies. This is the first non-pathogenic to humans kinetoplastid protozoan genome to be described thus providing an opportunity for comparison with the completed genomes of pathogenic Leishmania species. A high synteny was observed between all sequenced Leishmania species. A limited number of chromosomal regions diverged between L. tarentolae and L. infantum, while remaining syntenic to L. major. Globally, >90% of the L. tarentolae gene content was shared with the other Leishmania species. We identified 95 predicted coding sequences unique to L. tarentolae and 250 genes that were absent from L. tarentolae. Interestingly, many of the latter genes were expressed in the intracellular amastigote stage of pathogenic species. In addition, genes coding for products involved in antioxidant defence or participating in vesicular-mediated protein transport were underrepresented in L. tarentolae. In contrast to other Leishmania genomes, two gene families were expanded in L. tarentolae, namely the zinc metallo-peptidase surface glycoprotein GP63 and the promastigote surface antigen PSA31C. Overall, L. tarentolae's gene content appears better adapted to the promastigote insect stage rather than the amastigote mammalian stage. PMID:21998295
Kosti, Adam; Harry Chen, Hung-I; Mohan, Sumathy; Liang, Sitai; Chen, Yidong; Habib, Samy L.
2015-01-01
Recent study from our laboratory showed that patients with diabetes are at a higher risk of developing kidney cancer. In the current study, we have screened whole human DNA genome from healthy control, patients with diabetes or renal cell carcinoma (RCC) or RCC+diabetes. We found that 883 genes gain/163 genes loss of copy number in RCC+diabetes group, 669 genes gain/307 genes loss in RCC group and 458 genes gain/38 genes loss of copy number in diabetes group, after removing gain/loss genes obtained from healthy control group. Data analyzed for functional annotation enrichment pathways showed that control group had the highest number (280) of enriched pathways, 191 in diabetes+RCC group, 148 in RCC group, and 81 in diabetes group. The overlap GO pathways between RCC+diabetes and RCC groups showed that nine were enriched, between RCC+diabetes and diabetes groups was four and between diabetes and RCC groups was eight GO pathways. Overall, we observed majority of DNA alterations in patients from RCC+diabetes group. Interestingly, insulin receptor (INSR) is highly expressed and had gains in copy number in RCC+diabetes and diabetes groups. The changes in INSR copy number may use as a biomarker for predicting RCC development in diabetic patients. PMID:25821562
Summerfield, Taryn L.; Yu, Lianbo; Gulati, Parul; Zhang, Jie; Huang, Kun; Romero, Roberto; Kniss, Douglas A.
2011-01-01
A majority of the studies examining the molecular regulation of human labor have been conducted using single gene approaches. While the technology to produce multi-dimensional datasets is readily available, the means for facile analysis of such data are limited. The objective of this study was to develop a systems approach to infer regulatory mechanisms governing global gene expression in cytokine-challenged cells in vitro, and to apply these methods to predict gene regulatory networks (GRNs) in intrauterine tissues during term parturition. To this end, microarray analysis was applied to human amnion mesenchymal cells (AMCs) stimulated with interleukin-1β, and differentially expressed transcripts were subjected to hierarchical clustering, temporal expression profiling, and motif enrichment analysis, from which a GRN was constructed. These methods were then applied to fetal membrane specimens collected in the absence or presence of spontaneous term labor. Analysis of cytokine-responsive genes in AMCs revealed a sterile immune response signature, with promoters enriched in response elements for several inflammation-associated transcription factors. In comparison to the fetal membrane dataset, there were 34 genes commonly upregulated, many of which were part of an acute inflammation gene expression signature. Binding motifs for nuclear factor-κB were prominent in the gene interaction and regulatory networks for both datasets; however, we found little evidence to support the utilization of pathogen-associated molecular pattern (PAMP) signaling. The tissue specimens were also enriched for transcripts governed by hypoxia-inducible factor. The approach presented here provides an uncomplicated means to infer global relationships among gene clusters involved in cellular responses to labor-associated signals. PMID:21655103
Clare, Susan E; Gupta, Akash; Choi, MiRan; Ranjan, Manish; Lee, Oukseub; Wang, Jun; Ivancic, David Z; Kim, J Julie; Khan, Seema A
2016-05-23
The synthesis of specific, potent progesterone antagonists adds potential agents to the breast cancer prevention and treatment armamentarium. The identification of individuals who will benefit from these agents will be a critical factor for their clinical success. We utilized telapristone acetate (TPA; CDB-4124) to understand the effects of progesterone receptor (PR) blockade on proliferation, apoptosis, promoter binding, cell cycle progression, and gene expression. We then identified a set of genes that overlap with human breast luteal-phase expressed genes and signify progesterone activity in both normal breast cells and breast cancer cell lines. TPA administration to T47D cells results in a 30 % decrease in cell number at 24 h, which is maintained over 72 h only in the presence of estradiol. Blockade of progesterone signaling by TPA for 24 h results in fewer cells in G2/M, attributable to decreased expression of genes that facilitate the G2/M transition. Gene expression data suggest that TPA affects several mechanisms that progesterone utilizes to control gene expression, including specific post-translational modifications, and nucleosomal organization and higher order chromatin structure, which regulate access of PR to its DNA binding sites. By comparing genes induced by the progestin R5020 in T47D cells with those increased in the luteal-phase normal breast, we have identified a set of genes that predict functional progesterone signaling in tissue. These data will facilitate an understanding of the ways in which drugs such as TPA may be utilized for the prevention, and possibly the therapy, of human breast cancer.
Pan, Weiran; Li, Gang; Yang, Xiaoxiao; Miao, Jinming
2015-04-01
This study aims to explore the potential mechanism of glioma through bioinformatic approaches. The gene expression profile (GSE4290) of glioma tumor and non-tumor samples was downloaded from Gene Expression Omnibus database. A total of 180 samples were available, including 23 non-tumor and 157 tumor samples. Then the raw data were preprocessed using robust multiarray analysis, and 8,890 differentially expressed genes (DEGs) were identified by using t-test (false discovery rate < 0.0005). Furthermore, 16 known glioma related genes were abstracted from Genetic Association Database. After mapping 8,890 DEGs and 16 known glioma related genes to Human Protein Reference Database, a glioma associated protein-protein interaction network (GAPN) was constructed. In addition, 51 sub-networks in GAPN were screened out through Molecular Complex Detection (score ≥ 1), and sub-network 1 was found to have the closest interaction (score = 3). What' more, for the top 10 sub-networks, Gene Ontology (GO) enrichment analysis (p value < 0.05) was performed, and DEGs involved in sub-network 1 and 2, such as BRMS1L and CCNA1, were predicted to regulate cell growth, cell cycle, and DNA replication via interacting with known glioma related genes. Finally, the overlaps of DEGs and human essential, housekeeping, tissue-specific genes were calculated (p value = 1.0, 1.0, and 0.00014, respectively) and visualized by Venn Diagram package in R. About 61% of human tissue-specific genes were DEGs as well. This research shed new light on the pathogenesis of glioma based on DEGs and GAPN, and our findings might provide potential targets for clinical glioma treatment.
Rickettsia Phylogenomics: Unwinding the Intricacies of Obligate Intracellular Life
Gillespie, Joseph J.; Williams, Kelly; Shukla, Maulik; Snyder, Eric E.; Nordberg, Eric K.; Ceraul, Shane M.; Dharmanolla, Chitti; Rainey, Daphne; Soneja, Jeetendra; Shallom, Joshua M.; Vishnubhat, Nataraj Dongre; Wattam, Rebecca; Purkayastha, Anjan; Czar, Michael; Crasta, Oswald; Setubal, Joao C.; Azad, Abdu F.; Sobral, Bruno S.
2008-01-01
Background Completed genome sequences are rapidly increasing for Rickettsia, obligate intracellular α-proteobacteria responsible for various human diseases, including epidemic typhus and Rocky Mountain spotted fever. In light of phylogeny, the establishment of orthologous groups (OGs) of open reading frames (ORFs) will distinguish the core rickettsial genes and other group specific genes (class 1 OGs or C1OGs) from those distributed indiscriminately throughout the rickettsial tree (class 2 OG or C2OGs). Methodology/Principal Findings We present 1823 representative (no gene duplications) and 259 non-representative (at least one gene duplication) rickettsial OGs. While the highly reductive (∼1.2 MB) Rickettsia genomes range in predicted ORFs from 872 to 1512, a core of 752 OGs was identified, depicting the essential Rickettsia genes. Unsurprisingly, this core lacks many metabolic genes, reflecting the dependence on host resources for growth and survival. Additionally, we bolster our recent reclassification of Rickettsia by identifying OGs that define the AG (ancestral group), TG (typhus group), TRG (transitional group), and SFG (spotted fever group) rickettsiae. OGs for insect-associated species, tick-associated species and species that harbor plasmids were also predicted. Through superimposition of all OGs over robust phylogeny estimation, we discern between C1OGs and C2OGs, the latter depicting genes either decaying from the conserved C1OGs or acquired laterally. Finally, scrutiny of non-representative OGs revealed high levels of split genes versus gene duplications, with both phenomena confounding gene orthology assignment. Interestingly, non-representative OGs, as well as OGs comprised of several gene families typically involved in microbial pathogenicity and/or the acquisition of virulence factors, fall predominantly within C2OG distributions. Conclusion/Significance Collectively, we determined the relative conservation and distribution of 14354 predicted ORFs from 10 rickettsial genomes across robust phylogeny estimation. The data, available at PATRIC (PathoSystems Resource Integration Center), provide novel information for unwinding the intricacies associated with Rickettsia pathogenesis, expanding the range of potential diagnostic, vaccine and therapeutic targets. PMID:19194535
A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.
Baur, Brittany; Bozdag, Serdar
2016-01-01
DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
Characterization and machine learning prediction of allele-specific DNA methylation.
He, Jianlin; Sun, Ming-an; Wang, Zhong; Wang, Qianfei; Li, Qing; Xie, Hehuang
2015-12-01
A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events. Published by Elsevier Inc.
MouSensor: A Versatile Genetic Platform to Create Super Sniffer Mice for Studying Human Odor Coding.
D'Hulst, Charlotte; Mina, Raena B; Gershon, Zachary; Jamet, Sophie; Cerullo, Antonio; Tomoiaga, Delia; Bai, Li; Belluscio, Leonardo; Rogers, Matthew E; Sirotin, Yevgeniy; Feinstein, Paul
2016-07-26
Typically, ∼0.1% of the total number of olfactory sensory neurons (OSNs) in the main olfactory epithelium express the same odorant receptor (OR) in a singular fashion and their axons coalesce into homotypic glomeruli in the olfactory bulb. Here, we have dramatically increased the total number of OSNs expressing specific cloned OR coding sequences by multimerizing a 21-bp sequence encompassing the predicted homeodomain binding site sequence, TAATGA, known to be essential in OR gene choice. Singular gene choice is maintained in these "MouSensors." In vivo synaptopHluorin imaging of odor-induced responses by known M71 ligands shows functional glomerular activation in an M71 MouSensor. Moreover, a behavioral avoidance task demonstrates that specific odor detection thresholds are significantly decreased in multiple transgenic lines, expressing mouse or human ORs. We have developed a versatile platform to study gene choice and axon identity, to create biosensors with great translational potential, and to finally decode human olfaction. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Alvarez-Pérez, Marco Antonio; Narayanan, Sampath; Zeichner-David, Margarita; Rodríguez Carmona, Bruno; Arzate, Higinio
2006-03-01
Cementum is a unique mineralized connective tissue that covers the root surfaces of the teeth. The cementum is critical for appropriate maturation of the periodontium, both during development as well as that associated with regeneration of periodontal tissues, IU; however, one major impediment to understand the molecular mechanisms that regulate periodontal regeneration is the lack of cementum markers. Here we report on the identification and characterization of one such differentially human expressed gene, termed "cementum protein-23" (CP-23) that appears to be periodontal ligament and cementum-specific. We screened human cementum tumor-derived cDNA libraries by transient expression in COS-7 cells and "panning" with a rabbit polyclonal antibody against a cementoblastoma conditioned media-derived protein (CP). One isolated cDNA, CP-23, was expressed in E. coli and polyclonal antibodies against the recombinant human CP-23 were produced. Expression of CP-23 protein by cells of the periodontium was examined by Northern blot and in situ hybridization. Expression of CP-23 transcripts in human cementoblastoma-derived cells, periodontal ligament cells, human gingival fibroblasts and alveolar bone-derived cells was determined by RT-PCR. Our results show that we have isolated a 1374-bp human cDNA containing an open reading frame that encodes a polypeptide with 247 amino acid residues, with a predicted molecular mass of 25.9 kDa that represents CP species. The recombinant human CP-23 protein cross-reacted with antibodies against CP and type X collagen. Immunoscreening of human periodontal tissues revealed that CP-23 gene product is localized to the cementoid matrix of cementum and cementoblasts throughout the entire surface of the root, cell subpopulations of the periodontal ligament as well as cells located paravascularly to the blood vessels into the periodontal ligament. Furthermore, 98% of putative cementoblasts and 15% of periodontal ligament cells cultured in vitro expressed CP-23 gene product. Cementoblastoma cells and periodontal ligament cells contained a 5.0 kb CP-23 mRNA. In situ hybridization showed strong expression of CP-23 mRNA on cementoblast, cell subpopulations of the periodontal ligament and cells located around blood vessels into the periodontal ligament. Our results demonstrate that CP-23 represents a novel, tissue-specific-gene product being expressed by periodontal ligament subpopulations and cementoblasts. These findings offer the possibility to determine the cellular and molecular events that regulate the cementogenesis process during root development. Furthermore, it might provide new venues for the design of translational studies aimed at achieving predictable new cementogenesis and regeneration of the periodontal tissues.
DYT1 dystonia increases risk taking in humans.
Arkadir, David; Radulescu, Angela; Raymond, Deborah; Lubarr, Naomi; Bressman, Susan B; Mazzoni, Pietro; Niv, Yael
2016-06-01
It has been difficult to link synaptic modification to overt behavioral changes. Rodent models of DYT1 dystonia, a motor disorder caused by a single gene mutation, demonstrate increased long-term potentiation and decreased long-term depression in corticostriatal synapses. Computationally, such asymmetric learning predicts risk taking in probabilistic tasks. Here we demonstrate abnormal risk taking in DYT1 dystonia patients, which is correlated with disease severity, thereby supporting striatal plasticity in shaping choice behavior in humans.
The Impact of Epithelial-Stromal Interactions on Human Breast Tumor Heterogeneity
2014-10-01
Triple - Negative (TN) breast cancer cases. In addition to the intrinsic molecular characteristics of the tumor...associated with TN breast cancer . 15. SUBJECT TERMS Triple - negative breast cancer , epithelium, stroma, gene expression, microRNA, laser capture...expression signatures in human stroma can predict outcome of breast cancer patients independently of clinical parameters and molecular subtypes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.
2008-10-20
One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasetsmore » having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic value for both ER-positive and ER-negative breast cancer. The signature was selected using a novel biological approach and hence holds promise to represent the key biological processes of breast cancer.« less
NF-κB gene signature predicts prostate cancer progression
Jin, Renjie; Yi, Yajun; Yull, Fiona E.; Blackwell, Timothy S.; Clark, Peter E.; Koyama, Tatsuki; Smith, Joseph A.; Matusik, Robert J.
2014-01-01
In many prostate cancer (PCa) patients, the cancer will be recurrent and eventually progress to lethal metastatic disease after primary treatment, such as surgery or radiation therapy. Therefore, it would be beneficial to better predict which patients with early-stage PCa would progress or recur after primary definitive treatment. In addition, many studies indicate that activation of NF-κB signaling correlates with PCa progression; however, the precise underlying mechanism is not fully understood. Our studies show that activation of NF-κB signaling via deletion of one allele of its inhibitor, IκBα, did not induce prostatic tumorigenesis in our mouse model. However, activation of NF-κB signaling did increase the rate of tumor progression in the Hi-Myc mouse PCa model when compared to Hi-Myc alone. Using the non-malignant NF-κB activated androgen depleted mouse prostate, a NF-κB Activated Recurrence Predictor 21 (NARP21) gene signature was generated. The NARP21 signature successfully predicted disease-specific survival and distant metastases-free survival in patients with PCa. This transgenic mouse model derived gene signature provides a useful and unique molecular profile for human PCa prognosis, which could be used on a prostatic biopsy to predict indolent versus aggressive behavior of the cancer after surgery. PMID:24686169
In silico study of breast cancer associated gene 3 using LION Target Engine and other tools.
León, Darryl A; Cànaves, Jaume M
2003-12-01
Sequence analysis of individual targets is an important step in annotation and validation. As a test case, we investigated human breast cancer associated gene 3 (BCA3) with LION Target Engine and with other bioinformatics tools. LION Target Engine confirmed that the BCA3 gene is located on 11p15.4 and that the two most likely splice variants (lacking exon 3 and exons 3 and 5, respectively) exist. Based on our manual curation of sequence data, it is proposed that an additional variant (missing only exon 5) published in a public sequence repository, is a prediction artifact. A significant number of new orthologs were also identified, and these were the basis for a high-quality protein secondary structure prediction. Moreover, our research confirmed several distinct functional domains as described in earlier reports. Sequence conservation from multiple sequence alignments, splice variant identification, secondary structure predictions, and predicted phosphorylation sites suggest that the removal of interaction sites through alternative splicing might play a modulatory role in BCA3. This in silico approach shows the depth and relevance of an analysis that can be accomplished by including a variety of publicly available tools with an integrated and customizable life science informatics platform.
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T.; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-01-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). PMID:12634390
In silico pattern-based analysis of the human cytomegalovirus genome.
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-04-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).
Hughes, Stephanie N; Greig, Denise J; Miller, Woutrina A; Byrne, Barbara A; Gulland, Frances M D; Harvey, James T
2013-05-01
Given their coastal site fidelity and opportunistic foraging behavior, harbor seals (Phoca vitulina) may serve as sentinels for coastal ecosystem health. Seals using urbanized coastal habitat can acquire enteric bacteria, including Vibrio that may affect their health. To understand Vibrio dynamics in seals, demographic and environmental factors were tested for predicting potentially virulent Vibrio in free-ranging and stranded Pacific harbor seals (Phoca vitulina richardii) off California. Vibrio prevalence did not vary with season and was greater in free-ranging seals (29 %, n = 319) compared with stranded seals (17 %, n = 189). Of the factors tested, location, turbidity, and/or salinity best predicted Vibrio prevalence in free-ranging seals. The relationship of environmental factors with Vibrio prevalence differed by location and may be related to oceanographic or terrestrial contributions to water quality. Vibrio parahaemolyticus, Vibrio alginolyticus, and Vibrio cholerae were observed in seals, with V. cholerae found almost exclusively in stranded pups and yearlings. Additionally, virulence genes (trh and tdh) were detected in V. parahaemolyticus isolates. Vibrio cholerae isolates lacked targeted virulence genes, but were hemolytic. Three out of four stranded pups with V. parahaemolyticus (trh+ and/or tdh+) died in rehabilitation, but the role of Vibrio in causing mortality is unclear, and Vibrio expression of virulence genes should be investigated. Considering that humans share the environment and food resources with seals, potentially virulent Vibrio observed in seals also may be of concern to human health.
Lee, Imchang; Chalita, Mauricio; Ha, Sung-Min; Na, Seong-In; Yoon, Seok-Hwan; Chun, Jongsik
2017-06-01
Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.
Yoo, Seungyeul; Takikawa, Sachiko; Geraghty, Patrick; Argmann, Carmen; Campbell, Joshua; Lin, Luan; Huang, Tao; Tu, Zhidong; Foronjy, Robert F; Feronjy, Robert; Spira, Avrum; Schadt, Eric E; Powell, Charles A; Zhu, Jun
2015-01-01
Chronic Obstructive Pulmonary Disease (COPD) is a complex disease. Genetic, epigenetic, and environmental factors are known to contribute to COPD risk and disease progression. Therefore we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity. EPAS1 is distinct in comparison with other key regulators in terms of methylation profile and downstream target genes. Genes predicted to be regulated by EPAS1 were enriched for biological processes including signaling, cell communications, and system development. We confirmed that EPAS1 protein levels are lower in human COPD lung tissue compared to non-disease controls and that Epas1 gene expression is reduced in mice chronically exposed to cigarette smoke. As EPAS1 downstream genes were significantly enriched for hypoxia responsive genes in endothelial cells, we tested EPAS1 function in human endothelial cells. EPAS1 knockdown by siRNA in endothelial cells impacted genes that significantly overlapped with EPAS1 downstream genes in lung tissue including hypoxia responsive genes, and genes associated with emphysema severity. Our first integrative analysis of genome-wide DNA methylation and gene expression profiles illustrates that not only does DNA methylation play a 'causal' role in the molecular pathophysiology of COPD, but it can be leveraged to directly identify novel key mediators of this pathophysiology.
Cloning and sequencing of Staphylococcus aureus murC, a gene essential for cell wall biosynthesis.
Lowe, A M; Deresiewicz, R L
1999-01-01
Staphylococcus aureus is a major human pathogen that is increasingly resistant to clinically useful antimicrobial agents. While screening for S. aureus genes expressed during mammalian infection, we isolated murC. This gene encodes UDP-N-acetylmuramoyl-L-alanine synthetase, an enzyme essential for cell wall biosynthesis in a number of bacteria. S. aureus MurC has a predicted mass 49,182 Da and complements the temperature-sensitive murC mutation of E. coli ST222. Sequence data on the DNA flanking staphylococcal murC suggests that the local gene organization there parallels that found in B. subtilis, but differs from that found in gram-negative bacterial pathogens. MurC proteins represent promising targets for broad spectrum antimicrobial drug development.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.
Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W
2018-05-31
In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Feline hypersomatotropism and acromegaly tumorigenesis: a potential role for the AIP gene.
Scudder, C J; Niessen, S J; Catchpole, B; Fowkes, R C; Church, D B; Forcada, Y
2017-04-01
Acromegaly in humans is usually sporadic, however up to 20% of familial isolated pituitary adenomas are caused by germline sequence variants of the aryl-hydrocarbon-receptor interacting protein (AIP) gene. Feline acromegaly has similarities to human acromegalic families with AIP mutations. The aim of this study was to sequence the feline AIP gene, identify sequence variants and compare the AIP gene sequence between feline acromegalic and control cats, and in acromegalic siblings. The feline AIP gene was amplified through PCR using whole blood genomic DNA from 10 acromegalic and 10 control cats, and 3 sibling pairs affected by acromegaly. PCR products were sequenced and compared with the published predicted feline AIP gene. A single nonsynonymous SNP was identified in exon 1 (AIP:c.9T > G) of two acromegalic cats and none of the control cats, as well as both members of one sibling pair. The region of this SNP is considered essential for the interaction of the AIP protein with its receptor. This sequence variant has not previously been reported in humans. Two additional synonymous sequence variants were identified (AIP:c.481C > T and AIP:c.826C > T). This is the first molecular study to investigate a potential genetic cause of feline acromegaly and identified a nonsynonymous AIP single nucleotide polymorphism in 20% of the acromegalic cat population evaluated, as well as in one of the sibling pairs evaluated. Copyright © 2016 Elsevier Inc. All rights reserved.
Tenascin-X, Collagen, Elastin and the Ehlers-Danlos Syndrome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bristow, James; Carey, William; Schalkwijk, Joost
2005-08-31
Tenascin-X is an extracellular matrix protein initially identified because of its overlap with the human CYP21B gene. Because studies of gene and protein function of other tenascins had been poorly predictive of essential functions in vivo, we used a genetic approach that critically relied on an understanding of the genomic locus to uncover an association between inactivating tenascin-X mutations and novel recessive and dominant forms of Ehlers-Danlos syndrome. Tenascin-X provides the first example of a gene outside of the fibrillar collagens and their processing enzymes that causes Ehlers-Danlos syndrome. Tenascin-X null mice recapitulate the skin findings of the human disease,more » confirming a causative role for this gene in Ehlers-Danlos syndrome. Further evaluation of these mice showed that tenascin-X is an important regulator of collagen deposition in vivo, suggesting a novel mechanism of disease in this form of Ehlers-Danlos syndrome. Further studies suggest that tenascin-X may do this through both direct and indirect interactions with the collagen fibril. Recent studies show that TNX effects on matrix extend beyond the collagen to the elastogenic pathway and matrix remodeling enzymes. Tenascin-X serves as a compelling example of how human experiments of nature can guide us to an understanding of genes whose function may not be evident from their sequence or in vitro studies of their encoded proteins.« less
Intrafamily and intragenomic conflicts in human warfare
2017-01-01
Recent years have seen an explosion of multidisciplinary interest in ancient human warfare. Theory has emphasized a key role for kin-selected cooperation, modulated by sex-specific demography, in explaining intergroup violence. However, conflicts of interest remain a relatively underexplored factor in the evolutionary-ecological study of warfare, with little consideration given to which parties influence the decision to go to war and how their motivations may differ. We develop a mathematical model to investigate the interplay between sex-specific demography and human warfare, showing that: the ecology of warfare drives the evolution of sex-biased dispersal; sex-biased dispersal modulates intrafamily and intragenomic conflicts in relation to warfare; intragenomic conflict drives parent-of-origin-specific patterns of gene expression—i.e. ‘genomic imprinting’—in relation to warfare phenotypes; and an ecological perspective of conflicts at the levels of the gene, individual, and social group yields novel predictions as to pathologies associated with mutations and epimutations at loci underpinning human violence. PMID:28228515
Intrafamily and intragenomic conflicts in human warfare.
Micheletti, Alberto J C; Ruxton, Graeme D; Gardner, Andy
2017-02-22
Recent years have seen an explosion of multidisciplinary interest in ancient human warfare. Theory has emphasized a key role for kin-selected cooperation, modulated by sex-specific demography, in explaining intergroup violence. However, conflicts of interest remain a relatively underexplored factor in the evolutionary-ecological study of warfare, with little consideration given to which parties influence the decision to go to war and how their motivations may differ. We develop a mathematical model to investigate the interplay between sex-specific demography and human warfare, showing that: the ecology of warfare drives the evolution of sex-biased dispersal; sex-biased dispersal modulates intrafamily and intragenomic conflicts in relation to warfare; intragenomic conflict drives parent-of-origin-specific patterns of gene expression-i.e. 'genomic imprinting'-in relation to warfare phenotypes; and an ecological perspective of conflicts at the levels of the gene, individual, and social group yields novel predictions as to pathologies associated with mutations and epimutations at loci underpinning human violence. © 2017 The Authors.
Saccharomyces genome database informs human biology.
Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Hellerstedt, Sage T; Engel, Stacia R; Karra, Kalpana; Weng, Shuai; Sheppard, Travis K; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Cherry, J Michael
2018-01-04
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity.
Saleheen, Danish; Natarajan, Pradeep; Armean, Irina M; Zhao, Wei; Rasheed, Asif; Khetarpal, Sumeet A; Won, Hong-Hee; Karczewski, Konrad J; O'Donnell-Luria, Anne H; Samocha, Kaitlin E; Weisburd, Benjamin; Gupta, Namrata; Zaidi, Mozzam; Samuel, Maria; Imran, Atif; Abbas, Shahid; Majeed, Faisal; Ishaq, Madiha; Akhtar, Saba; Trindade, Kevin; Mucksavage, Megan; Qamar, Nadeem; Zaman, Khan Shah; Yaqoob, Zia; Saghir, Tahir; Rizvi, Syed Nadeem Hasan; Memon, Anis; Hayyat Mallick, Nadeem; Ishaq, Mohammad; Rasheed, Syed Zahed; Memon, Fazal-Ur-Rehman; Mahmood, Khalid; Ahmed, Naveeduddin; Do, Ron; Krauss, Ronald M; MacArthur, Daniel G; Gabriel, Stacey; Lander, Eric S; Daly, Mark J; Frossard, Philippe; Danesh, John; Rader, Daniel J; Kathiresan, Sekar
2017-04-12
A major goal of biomedicine is to understand the function of every gene in the human genome. Loss-of-function mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such 'human knockouts' can provide insight into gene function. Consanguineous unions are more likely to result in offspring carrying homozygous loss-of-function mutations. In Pakistan, consanguinity rates are notably high. Here we sequence the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS), designed to understand the determinants of cardiometabolic diseases in individuals from South Asia. We identified individuals carrying homozygous predicted loss-of-function (pLoF) mutations, and performed phenotypic analysis involving more than 200 biochemical and disease traits. We enumerated 49,138 rare (<1% minor allele frequency) pLoF mutations. These pLoF mutations are estimated to knock out 1,317 genes, each in at least one participant. Homozygosity for pLoF mutations at PLA2G7 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signalling. Heterozygous deficiency of APOC3 has been shown to protect against coronary heart disease; we identified APOC3 homozygous pLoF carriers in our cohort. We recruited these human knockouts and challenged them with an oral fat load. Compared with family members lacking the mutation, individuals with APOC3 knocked out displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a 'human knockout project', a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans.
Li, Xiaobo; Zhang, Chengcheng; Bian, Qian; Gao, Na; Zhang, Xin; Meng, Qingtao; Wu, Shenshen; Wang, Shizhi; Xia, Yankai; Chen, Rui
2016-09-01
Gene expression profiling has developed rapidly in recent years and it can predict and define mechanisms underlying chemical toxicity. Here, RNA microarray and computational technology were used to show that aluminum oxide nanoparticles (Al2O3 NPs) were capable of triggering up-regulation of genes related to the cell cycle and cell death in a human A549 lung adenocarcinoma cell line. Gene expression levels were validated in Al2O3 NPs exposed A549 cells and mice lung tissues, most of which showed consistent trends in regulation. Gene-transcription factor network analysis coupled with cell- and animal-based assays demonstrated that the genes encoding PTPN6, RTN4, BAX and IER play a role in the biological responses induced by the nanoparticle exposure, which caused cell death and cell cycle arrest in the G2/S phase. Further, down-regulated PTPN6 expression demonstrated a core role in the network, thus expression level of PTPN6 was rescued by plasmid transfection, which showed ameliorative effects of A549 cells against cell death and cell cycle arrest. These results demonstrate the feasibility of using gene expression profiling to predict cellular responses induced by nanomaterials, which could be used to develop a comprehensive knowledge of nanotoxicity.
Basic Helix-Loop-Helix Transcription Factor Gene Family Phylogenetics and Nomenclature
Skinner, Michael K.; Rawls, Alan; Wilson-Rawls, Jeanne; Roalson, Eric H.
2010-01-01
A phylogenetic analysis of the basic helix-loop-helix (bHLH) gene superfamily was performed using seven different species (human, mouse, rat, worm, fly, yeast, and plant Arabidopsis) and involving over 600 bHLH genes [1]. All bHLH genes were identified in the genomes of the various species, including expressed sequence tags, and the entire coding sequence was used in the analysis. Nearly 15% of the gene family has been updated or added since the original publication. A super-tree involving six clades and all structural relationships was established and is now presented for four of the species. The wealth of functional data available for members of the bHLH gene superfamily provides us with the opportunity to use this exhaustive phylogenetic tree to predict potential functions of uncharacterized members of the family. This phylogenetic and genomic analysis of the bHLH gene family has revealed unique elements of the evolution and functional relationships of the different genes in the bHLH gene family. PMID:20219281
Neville, B. Anne; Sheridan, Paul O.; Harris, Hugh M. B.; Coughlan, Simone; Flint, Harry J.; Duncan, Sylvia H.; Jeffery, Ian B.; Claesson, Marcus J.; Ross, R. Paul; Scott, Karen P.; O'Toole, Paul W.
2013-01-01
Some Eubacterium and Roseburia species are among the most prevalent motile bacteria present in the intestinal microbiota of healthy adults. These flagellate species contribute “cell motility” category genes to the intestinal microbiome and flagellin proteins to the intestinal proteome. We reviewed and revised the annotation of motility genes in the genomes of six Eubacterium and Roseburia species that occur in the human intestinal microbiota and examined their respective locus organization by comparative genomics. Motility gene order was generally conserved across these loci. Five of these species harbored multiple genes for predicted flagellins. Flagellin proteins were isolated from R. inulinivorans strain A2-194 and from E. rectale strains A1-86 and M104/1. The amino-termini sequences of the R. inulinivorans and E. rectale A1-86 proteins were almost identical. These protein preparations stimulated secretion of interleukin-8 (IL-8) from human intestinal epithelial cell lines, suggesting that these flagellins were pro-inflammatory. Flagellins from the other four species were predicted to be pro-inflammatory on the basis of alignment to the consensus sequence of pro-inflammatory flagellins from the β- and γ- proteobacteria. Many fliC genes were deduced to be under the control of σ28. The relative abundance of the target Eubacterium and Roseburia species varied across shotgun metagenomes from 27 elderly individuals. Genes involved in the flagellum biogenesis pathways of these species were variably abundant in these metagenomes, suggesting that the current depth of coverage used for metagenomic sequencing (3.13–4.79 Gb total sequence in our study) insufficiently captures the functional diversity of genomes present at low (≤1%) relative abundance. E. rectale and R. inulinivorans thus appear to synthesize complex flagella composed of flagellin proteins that stimulate IL-8 production. A greater depth of sequencing, improved evenness of sequencing and improved metagenome assembly from short reads will be required to facilitate in silico analyses of complete complex biochemical pathways for low-abundance target species from shotgun metagenomes. PMID:23935906
Saleh, Ali Jason; Soltani, Bahram M; Dokanehiifard, Sadat; Medlej, Abdallah; Tavalaei, Mahmoud; Mowla, Seyed Javad
2016-10-01
PI3K/AKT signaling is involved in cell survival, proliferation, and migration. In this pathway, PI3Kα enzyme is composed of a regulatory protein encoded by p85 gene and a catalytic protein encoded by PIK3CA gene. Human PIK3CA locus is amplified in several cancers including lung and colorectal cancer (CRC). Therefore, microRNAs (miRNAs) that are encoded within the PIK3CA gene might have a role in cancer development. Here, we report a novel microRNA named PIK3CA-miR1 (EBI accession no. LN626315), which is located within PIK3CA gene. A DNA segment corresponding to PIK3CA-premir1 sequence was transfected in human cell lines that resulted in generation of mature exogenous PIK3CA-miR1. Following the overexpression of PIK3CA-miR1, its predicted target genes (APPL1 and TrkC) were significantly downregulated in the CRC-originated HCT116 and SW480 cell lines, detected by qRT-PCR. Then, dual luciferase assay supported the interaction of PIK3CA-miR1 with APPL1 and TrkC transcripts. Endogenous PIK3CA-miR1 expression was also detected in several cell lines (highly in HCT116 and SW480) and highly in CRC specimens. Consistently, overexpression of PIK3CA-premir1 in HCT116 and SW480 cells resulted in significant reduction of the sub-G1 cell distribution and apoptotic cell rate, as detected by flowcytometry, and resulted in increased cell proliferation, as detected by 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay. PIK3CA-miR1 overexpression also resulted in Wnt signaling upregulation detected by Top/Fop assay. Overall, accumulative evidences indicated the presence of a bona fide novel onco-miRNA encoded within the PIK3CA oncogene, which is highly expressed in colorectal cancer and has a survival effect in CRC-originated cells.
Evolutionary trends and functional anatomy of the human expanded autophagy network
Till, Andreas; Saito, Rintaro; Merkurjev, Daria; Liu, Jing-Jing; Syed, Gulam Hussain; Kolnik, Martin; Siddiqui, Aleem; Glas, Martin; Scheffler, Björn; Ideker, Trey; Subramani, Suresh
2015-01-01
All eukaryotic cells utilize autophagy for protein and organelle turnover, thus assuring subcellular quality control, homeostasis, and survival. In order to address recent advances in identification of human autophagy associated genes, and to describe autophagy on a system-wide level, we established an autophagy-centered gene interaction network by merging various primary data sets and by retrieving respective interaction data. The resulting network (‘AXAN’) was analyzed with respect to subnetworks, e.g. the prime gene subnetwork (including the core machinery, signaling pathways and autophagy receptors) and the transcription subnetwork. To describe aspects of evolution within this network, we assessed the presence of protein orthologs across 99 eukaryotic model organisms. We visualized evolutionary trends for prime gene categories and evolutionary tracks for selected AXAN genes. This analysis confirms the eukaryotic origin of autophagy core genes while it points to a diverse evolutionary history of autophagy receptors. Next, we used module identification to describe the functional anatomy of the network at the level of pathway modules. In addition to obvious pathways (e.g., lysosomal degradation, insulin signaling) our data unveil the existence of context-related modules such as Rho GTPase signaling. Last, we used a tripartite, image-based RNAi – screen to test candidate genes predicted to play a role in regulation of autophagy. We verified the Rho GTPase, CDC42, as a novel regulator of autophagy-related signaling. This study emphasizes the applicability of system-wide approaches to gain novel insights into a complex biological process and to describe the human autophagy pathway at a hitherto unprecedented level of detail. PMID:26103419
Li, Edward B; Truong, Dawn; Hallett, Shawn A; Mukherjee, Kusumika; Schutte, Brian C; Liao, Eric C
2017-09-01
Large-scale sequencing efforts have captured a rapidly growing catalogue of genetic variations. However, the accurate establishment of gene variant pathogenicity remains a central challenge in translating personal genomics information to clinical decisions. Interferon Regulatory Factor 6 (IRF6) gene variants are significant genetic contributors to orofacial clefts. Although approximately three hundred IRF6 gene variants have been documented, their effects on protein functions remain difficult to interpret. Here, we demonstrate the protein functions of human IRF6 missense gene variants could be rapidly assessed in detail by their abilities to rescue the irf6 -/- phenotype in zebrafish through variant mRNA microinjections at the one-cell stage. The results revealed many missense variants previously predicted by traditional statistical and computational tools to be loss-of-function and pathogenic retained partial or full protein function and rescued the zebrafish irf6 -/- periderm rupture phenotype. Through mRNA dosage titration and analysis of the Exome Aggregation Consortium (ExAC) database, IRF6 missense variants were grouped by their abilities to rescue at various dosages into three functional categories: wild type function, reduced function, and complete loss-of-function. This sensitive and specific biological assay was able to address the nuanced functional significances of IRF6 missense gene variants and overcome many limitations faced by current statistical and computational tools in assigning variant protein function and pathogenicity. Furthermore, it unlocked the possibility for characterizing yet undiscovered human IRF6 missense gene variants from orofacial cleft patients, and illustrated a generalizable functional genomics paradigm in personalized medicine.
Liu, Zhi-Ping; Wu, Canglin; Miao, Hongyu; Wu, Hulin
2015-01-01
Transcriptional and post-transcriptional regulation of gene expression is of fundamental importance to numerous biological processes. Nowadays, an increasing amount of gene regulatory relationships have been documented in various databases and literature. However, to more efficiently exploit such knowledge for biomedical research and applications, it is necessary to construct a genome-wide regulatory network database to integrate the information on gene regulatory relationships that are widely scattered in many different places. Therefore, in this work, we build a knowledge-based database, named ‘RegNetwork’, of gene regulatory networks for human and mouse by collecting and integrating the documented regulatory interactions among transcription factors (TFs), microRNAs (miRNAs) and target genes from 25 selected databases. Moreover, we also inferred and incorporated potential regulatory relationships based on transcription factor binding site (TFBS) motifs into RegNetwork. As a result, RegNetwork contains a comprehensive set of experimentally observed or predicted transcriptional and post-transcriptional regulatory relationships, and the database framework is flexibly designed for potential extensions to include gene regulatory networks for other organisms in the future. Based on RegNetwork, we characterized the statistical and topological properties of genome-wide regulatory networks for human and mouse, we also extracted and interpreted simple yet important network motifs that involve the interplays between TF-miRNA and their targets. In summary, RegNetwork provides an integrated resource on the prior information for gene regulatory relationships, and it enables us to further investigate context-specific transcriptional and post-transcriptional regulatory interactions based on domain-specific experimental data. Database URL: http://www.regnetworkweb.org PMID:26424082
Schiroli, Giulia; Ferrari, Samuele; Conway, Anthony; Jacob, Aurelien; Capo, Valentina; Albano, Luisa; Plati, Tiziana; Castiello, Maria C; Sanvito, Francesca; Gennery, Andrew R; Bovolenta, Chiara; Palchaudhuri, Rahul; Scadden, David T; Holmes, Michael C; Villa, Anna; Sitia, Giovanni; Lombardo, Angelo; Genovese, Pietro; Naldini, Luigi
2017-10-11
Targeted genome editing in hematopoietic stem/progenitor cells (HSPCs) is an attractive strategy for treating immunohematological diseases. However, the limited efficiency of homology-directed editing in primitive HSPCs constrains the yield of corrected cells and might affect the feasibility and safety of clinical translation. These concerns need to be addressed in stringent preclinical models and overcome by developing more efficient editing methods. We generated a humanized X-linked severe combined immunodeficiency (SCID-X1) mouse model and evaluated the efficacy and safety of hematopoietic reconstitution from limited input of functional HSPCs, establishing thresholds for full correction upon different types of conditioning. Unexpectedly, conditioning before HSPC infusion was required to protect the mice from lymphoma developing when transplanting small numbers of progenitors. We then designed a one-size-fits-all IL2RG (interleukin-2 receptor common γ-chain) gene correction strategy and, using the same reagents suitable for correction of human HSPC, validated the edited human gene in the disease model in vivo, providing evidence of targeted gene editing in mouse HSPCs and demonstrating the functionality of the IL2RG -edited lymphoid progeny. Finally, we optimized editing reagents and protocol for human HSPCs and attained the threshold of IL2RG editing in long-term repopulating cells predicted to safely rescue the disease, using clinically relevant HSPC sources and highly specific zinc finger nucleases or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR-associated protein 9). Overall, our work establishes the rationale and guiding principles for clinical translation of SCID-X1 gene editing and provides a framework for developing gene correction for other diseases. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Zadora, Julianna; Singh, Manvendra; Herse, Florian; Przybyl, Lukasz; Haase, Nadine; Golic, Michaela; Yung, Hong Wa; Huppertz, Berthold; Cartwright, Judith E.; Whitley, Guy; Johnsen, Guro M.; Levi, Giovanni; Isbruch, Annette; Schulz, Herbert; Luft, Friedrich C.; Müller, Dominik N.; Staff, Anne Cathrine
2017-01-01
Background: Preeclampsia is a complex and common human-specific pregnancy syndrome associated with placental pathology. The human specificity provides both intellectual and methodological challenges, lacking a robust model system. Given the role of imprinted genes in human placentation and the vulnerability of imprinted genes to loss of imprinting changes, there has been extensive speculation, but no robust evidence, that imprinted genes are involved in preeclampsia. Our study aims to investigate whether disturbed imprinting contributes to preeclampsia. Methods: We first aimed to confirm that preeclampsia is a disease of the placenta by generating and analyzing genome-wide molecular data on well-characterized patient material. We performed high-throughput transcriptome analyses of multiple placenta samples from healthy controls and patients with preeclampsia. Next, we identified differentially expressed genes in preeclamptic placentas and intersected them with the list of human imprinted genes. We used bioinformatics/statistical analyses to confirm association between imprinting and preeclampsia and to predict biological processes affected in preeclampsia. Validation included epigenetic and cellular assays. In terms of human specificity, we established an in vitro invasion-differentiation trophoblast model. Our comparative phylogenetic analysis involved single-cell transcriptome data of human, macaque, and mouse preimplantation embryogenesis. Results: We found disturbed placental imprinting in preeclampsia and revealed potential candidates, including GATA3 and DLX5, with poorly explored imprinted status and no prior association with preeclampsia. As a result of loss of imprinting, DLX5 was upregulated in 69% of preeclamptic placentas. Levels of DLX5 correlated with classic preeclampsia markers. DLX5 is expressed in human but not in murine trophoblast. The DLX5high phenotype resulted in reduced proliferation, increased metabolism, and endoplasmic reticulum stress-response activation in trophoblasts in vitro. The transcriptional profile of such cells mimics the transcriptome of preeclamptic placentas. Pan-mammalian comparative analysis identified DLX5 as part of the human-specific regulatory network of trophoblast differentiation. Conclusions: Our analysis provides evidence of a true association among disturbed imprinting, gene expression, and preeclampsia. As a result of disturbed imprinting, the upregulated DLX5 affects trophoblast proliferation. Our in vitro model might fill a vital niche in preeclampsia research. Human-specific regulatory circuitry of DLX5 might help explain certain aspects of preeclampsia. PMID:28904069